ODM Cluster Oddities

Hi All,

I’m fairly new to the ODM world and have been struggling for a few days with setting up a successful cluster as the dataset I have is just breaking any single instance as well as the lightning network.

I have 969 images all at 20MP each. I’m trying to build a 3D model and ortho image on high settings. I am using Hetzner Cloud for the hosting.

I setup an instance to run WebODM and a another with ClusterODM and a dummy node which I locked. That all works fine. I then setup 3 further nodes each with 16vCPUs and 32GB Memory running nodeODM (on ubuntu 20.x) which I then pointed to the cluster and they all talked together fine.

I told the task to split into 300 image submodels with a 150 image overlap. When I watched the activity on the nodes, it successfully split into 3 submodels and sent one to each node. After the feature extraction however it then completely ignored nodes 1 and 2 and just used nodes 3 which then crashed at the 3d mesh part. Any reason why it wouldn’t keep splitting the workload across all 3 nodes?

I’ve now setup a single node instance running 48vCPU and 192GB ram with 192GB swap and this time I keep getting the submodels being uploaded. After each one finishes the next one begins uploading again. This is behaviour I didn’t see when I was running 3 nodes in the cluster. Any ideas?

Essentially, I’m struggling to work out what the workflow in a cluster should look like as each time I try something I get a different behaviour! Also…if this current test using 192GB of RAM fails for 969 images…I’m going to cry as this is a ‘small’ dataset compared to others I have coming up!

Any advice would be much appreciated.


Hm, that does sound odd. I’m just getting started with ClusterODM and and split/merge so I can’t really be of much help but I’ll try.

The submodel-reload-repeat behavior is surprising. I don’t recall seeing that when I use local split. 192 GB ram should be plenty for several hundred photos.

Maybe try:

  • Lower your quality settings so you can speed up your test cycles, then move to higher resolution settings once you get the split/merge/cluster stuff dialed in
  • Process one 300-image submodel on its own. Just pick 300 proximal images and pretend that’s your whole dataset. Remove all the fiddling with split/merge and just try to run it. If successful, you can return to the split/merge complexity.

I think you said 150 image overlap, but fwiw the --split-overlap flag specifies a radius, not number of photos. Probably doesn’t matter for what you’re doing, but just something to know.


This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.