I’m fairly new to the ODM world and have been struggling for a few days with setting up a successful cluster as the dataset I have is just breaking any single instance as well as the lightning network.
I have 969 images all at 20MP each. I’m trying to build a 3D model and ortho image on high settings. I am using Hetzner Cloud for the hosting.
I setup an instance to run WebODM and a another with ClusterODM and a dummy node which I locked. That all works fine. I then setup 3 further nodes each with 16vCPUs and 32GB Memory running nodeODM (on ubuntu 20.x) which I then pointed to the cluster and they all talked together fine.
I told the task to split into 300 image submodels with a 150 image overlap. When I watched the activity on the nodes, it successfully split into 3 submodels and sent one to each node. After the feature extraction however it then completely ignored nodes 1 and 2 and just used nodes 3 which then crashed at the 3d mesh part. Any reason why it wouldn’t keep splitting the workload across all 3 nodes?
I’ve now setup a single node instance running 48vCPU and 192GB ram with 192GB swap and this time I keep getting the submodels being uploaded. After each one finishes the next one begins uploading again. This is behaviour I didn’t see when I was running 3 nodes in the cluster. Any ideas?
Essentially, I’m struggling to work out what the workflow in a cluster should look like as each time I try something I get a different behaviour! Also…if this current test using 192GB of RAM fails for 969 images…I’m going to cry as this is a ‘small’ dataset compared to others I have coming up!
Any advice would be much appreciated.