How I set up ClusterODM - Documentation

and-viceversa · March 19, 2021, 7:17pm

ODM is great. I’ve had a lot of success with WebODM and adding additional processing NodeODMs on my network. The logical next step is to use ClusterODM. However, setting up the cluster proved frustrating - especially if you are learning ODM + Docker + networking all at the same time. I’m writing this to consolidate my experience, forum posts read, and the OpenDroneMap Missing Guide (highly recommend) info into one place because no single source was enough to get me going. The main confusion with ClusterODM setup is networking. Specifically, how Docker interacts with its host machine and across the network. Any corrections or improvements are highly desired. I’m just trying to figure this out.

NOTE: I have n number of Windows 10 machines connected via ethernet cable, a simple network switch, and static IPs. They are not connected to the internet. All shell commands are run using Anaconda Powershell but GitBash works as well.

1. Start ClusterODM on your primary machine

docker run --rm -ti -p 3000:3000 -p 10000:10000 -p 8080:8080 opendronemap/clusterodm

You should be able to see the cluster status at localhost:10000. You should be able to see the a NodeODM instance at localhost:3000.

2. Start a NodeODM instance on primary machine

docker run -p 3001:3000 opendronemap/nodeodm -q 19 --max-concurrency 19 --max-images 1000000

When you start ClusterODM, it comes with its own NodeODM instance you saw at localhost:3000 above. This instance does not do your work. It’s part of ClusterODM’s scheduling and load balancing or other functionality. You must start the separate NodeODM instance in this step on your primary machine in order to do work. You must use a separate port, in this case 3001, because the default 3000 is already occupied by ClusterODM.

3. Start NodeODM instances on all your other machines

docker run -p 3000:3000 opendronemap/nodeodm -q 19 --max-concurrency 19 --max-images 300

These are the workers used for parallel processing in addition to your primary machine. Note how the port is the standard 3000. The other flags are not critical and are documented elsewhere.

4. Return to your primary machine and open Microsoft Telnet prompt

The official ODM docs use telnet which can be enabled on Windows by running pkgmgr /iu:"TelnetClient" in your shell. You can also turn this on in Control Panel > Turn Windows Features on or off. Other users have found success using puTTY instead.

Start telnet by running telnet -e \ localhost 8080 to connect to the ClusterODM CLI. The -e flag allows you to set an escape key. You need this to switch between the actual Microsoft Telnet prompt and the prompt that displays network return. You can set the escape key to whatever you want, but I found the blackslash convenient. If you are typing at the #> prompt and the shell tells you “INVALID INVALID INVALID” simply press the escape key to return to the Microsoft Telnet> prompt.

Now, from the Microsoft Telnet> prompt run send HELP and press enter twice to see help options. send COMMAND [options] is the basic format.

5. Add all your NodeODM instances to ClusterODM

This is where things get tricky due to Docker networking. First, add the NodeODM instance from your primary machine. You must use this node’s Docker IP address. Do not use the machine’s actual IP address.

Run docker ps. You should see the a Docker image called opendronemap/nodeodm. This is the node instance that you started on your primary machine in step 2. Note the CONTAINER ID for that process. Run docker inspect -f "{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}" first_four_digits_of_container_id to get the container’s IP.

Mine looked like docker inspect -f "{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}" 4e54

Use the IP it gives you as input into telnet like send NODE ADD 172.17.0.3 3000

Why port 3000 instead of 3001? This is a mystery, but it worked.

Finally, add the worker machines to your cluster using the true machine’s IP like
send NODE ADD some_ip1 3000
send NODE ADD some_ip2 3000

And finally send NODE LIST to see all your nodes in telnet.

6. Start up WebODM and use your cluster.

By default WebODM starts with a NodeODM instance. We don’t want that because we already built a cluster. cd into your WebODM directory and ./webodm.sh start --default-nodes 0 to avoid building the unnecessary node. Open localhost:8000 and WebODM will prompt you to add a Processing Node because it can’t see the cluster by default. The IP address is the true IP of your primary machine where you ran ClusterODM in step 1 and the port is 3000. This seems totally counter to everything that I read. The ODM Missing Guide and various forum posts seem to indicate that you should use the IP of the Docker container, but that just didn’t work for me.

You should now see your ClusterODM IP:Port under Processing Nodes in the WebODM interface.

Edit: Forgot to mention, use the same ClusterODM IP:Port for the sm-cluster flag for the tasks that you want to point to the cluster. I haven’t tested if WebODM can use the cluster without this option explicitly.

Edit 2: Updated ClusterODM launch command to include cluster status. Thanks smathermather-cm!

smathermather · March 19, 2021, 7:22pm

Great summary. I’ll give it a closer look later, but one quick thought:

Change the docker command to:
docker run --rm -ti -p 3000:3000 -p 10000:10000 -p 8080:8080 opendronemap/clusterodm

This then ensures the 10000 port is exposed on the host machine.

pierotofy · March 19, 2021, 7:55pm

Awesome summary! Thanks for sharing.

Any chance you could add this to the docs as a new page? GitHub - OpenDroneMap/docs: 🎉 Contribute to OpenDroneMap's documentation! Read how below! 🎉

and-viceversa · March 19, 2021, 10:26pm

Happy to do it. However, I’d prefer to wait for some more community comments and battle test my own setup.

Specifically concerned about:

Why does send NODE ADD primary_machine_docker_ip 3000 require port 3000 even though NodeODM was run on 3001?
Why does WebODM see the cluster at the primary machine true IP instead of the ClusterODM container IP?

Referring to the “gotcha” in ODM The Missing Guide page 212. I had to do docker ps for the NodeODM instance running on the same computer as ClusterODM, but not for ClusterODM itself.

Edit: You literally wrote the book on the topic so I’m not sure who else I’d ask. Just going for clarity.

smathermather · March 19, 2021, 10:50pm

He wrote the code for the software too…

pierotofy · March 19, 2021, 11:47pm

Mm, this is strange. If NodeODM is running on port 3000, then you should use port 3000 (not 3001). It’s easy to lose track of which port is mapped to where with docker, but if the service on NodeODM is exposed and reachable from port 3000, then you need to use port 3000.

Similarly to above, if the service is exposed on a certain host/IP (reachable by you from a web browser on that certain host/IP), then WebODM will need to use that. The internal addresses that docker allocates are guaranteed to work only when you run services under a docker-compose command (perhaps here lies the confusion).

If you do not use docker-compose, then you always need to use the machine IP and port.

Docker can be so much fun!

and-viceversa · March 26, 2021, 9:18pm

Should be noted that flags are delimited by _ underscore and not a - dash.

--max_concurrency and --max_images are correct usage.

and-viceversa · March 29, 2021, 5:05pm

Underscore versus dash typos solved here. Thanks pierotofy!

system · April 28, 2021, 5:05pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.