ODM Cluster Oddities

Hi All,

I’m fairly new to the ODM world and have been struggling for a few days with setting up a successful cluster as the dataset I have is just breaking any single instance as well as the lightning network.

I have 969 images all at 20MP each. I’m trying to build a 3D model and ortho image on high settings. I am using Hetzner Cloud for the hosting.

I setup an instance to run WebODM and a another with ClusterODM and a dummy node which I locked. That all works fine. I then setup 3 further nodes each with 16vCPUs and 32GB Memory running nodeODM (on ubuntu 20.x) which I then pointed to the cluster and they all talked together fine.

I told the task to split into 300 image submodels with a 150 image overlap. When I watched the activity on the nodes, it successfully split into 3 submodels and sent one to each node. After the feature extraction however it then completely ignored nodes 1 and 2 and just used nodes 3 which then crashed at the 3d mesh part. Any reason why it wouldn’t keep splitting the workload across all 3 nodes?

I’ve now setup a single node instance running 48vCPU and 192GB ram with 192GB swap and this time I keep getting the submodels being uploaded. After each one finishes the next one begins uploading again. This is behaviour I didn’t see when I was running 3 nodes in the cluster. Any ideas?

Essentially, I’m struggling to work out what the workflow in a cluster should look like as each time I try something I get a different behaviour! Also…if this current test using 192GB of RAM fails for 969 images…I’m going to cry as this is a ‘small’ dataset compared to others I have coming up!

Any advice would be much appreciated.

3 Likes

Hm, that does sound odd. I’m just getting started with ClusterODM and and split/merge so I can’t really be of much help but I’ll try.

The submodel-reload-repeat behavior is surprising. I don’t recall seeing that when I use local split. 192 GB ram should be plenty for several hundred photos.

Maybe try:

  • Lower your quality settings so you can speed up your test cycles, then move to higher resolution settings once you get the split/merge/cluster stuff dialed in
  • Process one 300-image submodel on its own. Just pick 300 proximal images and pretend that’s your whole dataset. Remove all the fiddling with split/merge and just try to run it. If successful, you can return to the split/merge complexity.

I think you said 150 image overlap, but fwiw the --split-overlap flag specifies a radius, not number of photos. Probably doesn’t matter for what you’re doing, but just something to know.

2 Likes