Issues with ClusterODM passing task to localhost node

Howdy,

I have two computers I’m trying to get set up for distributed processing with ClusterODM. Currently, I have both showing as running nodes in ClusterODM @ localhost:10000. However, the node on the computer running ClusterODM will not work when a job is pushed. I get the following info in CLI for ClusterODM:

warn: Attempted to forward task 857d8c45-d86e-4f62-8d20-db02a1637ec2 to processing node 192.168.1.18:3000 but failed with: set-uuid did not match, 6da88088-cd58-4892-b14e-abe5520a529f !== 857d8c45-d86e-4f62-8d20-db02a1637ec2, attempting again (retry: 1)
warn: Switched 857d8c45-d86e-4f62-8d20-db02a1637ec2 to 192.168.1.18:3000
warn: Attempted to forward task 857d8c45-d86e-4f62-8d20-db02a1637ec2 to processing node 192.168.1.18:3000 but failed with: set-uuid did not match, 521b98b1-1b71-4535-9799-c8cacc646989 !== 857d8c45-d86e-4f62-8d20-db02a1637ec2, attempting again (retry: 2)
warn: Switched 857d8c45-d86e-4f62-8d20-db02a1637ec2 to 192.168.1.18:3000
warn: Attempted to forward task 857d8c45-d86e-4f62-8d20-db02a1637ec2 to processing node 192.168.1.18:3000 but failed with: set-uuid did not match, e3b4a320-d805-42bf-a087-f1db538a8612 !== 857d8c45-d86e-4f62-8d20-db02a1637ec2, attempting again (retry: 3)
warn: Switched 857d8c45-d86e-4f62-8d20-db02a1637ec2 to 192.168.1.18:3000
warn: Attempted to forward task 857d8c45-d86e-4f62-8d20-db02a1637ec2 to processing node 192.168.1.18:3000 but failed with: set-uuid did not match, c3970283-36cd-4c41-a558-c27b95160afa !== 857d8c45-d86e-4f62-8d20-db02a1637ec2, attempting again (retry: 4)
warn: Switched 857d8c45-d86e-4f62-8d20-db02a1637ec2 to 192.168.1.18:3000
warn: Attempted to forward task 857d8c45-d86e-4f62-8d20-db02a1637ec2 to processing node 192.168.1.18:3000 but failed with: set-uuid did not match, 2159e487-c0ec-4b8e-a92a-c9a3b763283b !== 857d8c45-d86e-4f62-8d20-db02a1637ec2, attempting again (retry: 5)
warn: Switched 857d8c45-d86e-4f62-8d20-db02a1637ec2 to 192.168.1.18:3000
warn: Cannot forward task 857d8c45-d86e-4f62-8d20-db02a1637ec2 to processing node 192.168.1.18:3000: Failed to forward task to processing node after 5 attempts. Try again later.

Restarting WebODM w/ ./webodm.sh down then start, I get that job’s status window:
Invalid route for taskId 0c4b1ef4-f1c0-4db4-a023-0f5f1ddfc39c:info, no task table entry.

If I lock out the local troublesome node in ClusterODM Telnet, it fully and correctly shoots the job to the remote node and begins processing. Any ideas on where to look?

Alrighty, so I’ve got each NodeODM instance on each computer to run correctly on its own,as well as through WebODM on each. Now I’m having issues with ClusterODM trying to bind to port 3000, which is already bound by the NodeODM instance…

Turns out I was a doofus :slight_smile:

When working through it all, I ran into both ClusterODM and NodeODM trying to bind to the same port, and THOUGHT I had tossed an argument in to change them up to 3000 & 3001. Turns out I wasn’t using the parameters correctly, and thought I had another issue.

So here’s my basic rundown of this experience:

-With WebODM installed on both Windows 10 computers, I then installed NodeODM on both. They will default to port 3000.
-***At this point, I highly recommend those just getting started to run some data through each Node’s web interface to confirm each Node actually is working correctly. I had my local node on the ClusterODM workstation showing online in the web interface and WebODM, but would not run correctly because of something goofy going on with the ports. Wasted some time later on thinking the problem was with ClusterODM before realizing what was going on. Had I tested each Node immediately after install, I would have caught the mistake.
-Then I installed ClusterODM on my main controlling workstation; I had trouble running with the Docker CLI commands, so I use Node.js command line to run “node index.js --port 3001” to set ClusterODM’s managing port one higher than the Node port.
-Ran into issues with Windows Telnet spitting out ‘invalid’ for every key stroke, swapped to Putty and it worked like a charm (localhost / 8080 / Telnet).

At this point, it was pretty much working. Now I’m just learning to drive. I have a 667 image dataset running with --split-merge at 450-split and 150-split-overlap at the moment, and realized that the much weaker machine appears to have received 667 images (and two listed submodels to work on) while the bigger machine only received 539 and no submodels. I suspect it’s because the beefy workstation was listed as node #2 in ClusterODM and the laptop is node #1. Not sure if the node order affects queuing or not, gotta look into that next.

1 Like