My setup is in AWS and I’ve stood up a permanent machine running ClusterODM on port 3000 with one locked node on port 3001 and set up auto scaling. I’ve transferred my image set to the ClusterODM machine and am running the ODM command line tool to kick off the job on localhost:3000. If I run the following command:
odm 2020-11-29 --debug --dsm --pc-las
ClusterODM launches a new instance and the job runs (until it crashes due to memory issues about 24 hours later, but that’s a different problem than what I’m trying to solve here) without any issues.
However, if I run the following command to kick off the job:
odm 2020-11-29 --debug --dsm --pc-las --split 400 --split-overlap 100
The files get uploaded and a new node is launched. Once the node comes up and after a few minutes I get the following error:
Launching… please wait! This can take a few minutes.
File “/code/run.py”, line 21, in
args = config.config()
File “/code/opendm/config.py”, line 795, in config
Node.from_url(args.sm_cluster).info()
File “/usr/local/lib/python3.8/dist-packages/pyodm/api.py”, line 177, in info
return NodeInfo(self.get(‘/info’))
File “/usr/local/lib/python3.8/dist-packages/pyodm/api.py”, line 134, in get
raise NodeResponseError(result[‘error’])
pyodm.exceptions.NodeResponseError: Invalid authentication token: token does not match.
In both cases, node info from the API on 8080 shows a token set of a random string like “wvLAYesjyoUh49CeERdUqp”
Some additional info, Node 1 is the locked node, Node 2 is the node that was created when the split/merge job failed, and Node 3 is the one running without split/merge:
#> node list
- localhost:3001 [online] [0/1] <engine: odm 2.3.3> <API: 2.1.1> [L]
- :3000 [online] [0/1] <engine: odm 2.3.3> <API: 2.1.1> [A]
- :3000 [online] [1/1] <engine: odm 2.3.3> <API: 2.1.1> [A]
#> node info 2
{
“hostname”: “”,
“port”: 3000,
“token”: “wvLAYesjyoUh49CeERdUqp”,
“info”: {
“version”: “2.1.1”,
“taskQueueCount”: 0,
“totalMemory”: 66721939456,
“availableMemory”: 65703120896,
“cpuCores”: 8,
“maxImages”: null,
“maxParallelTasks”: 1,
“engineVersion”: “2.3.3”,
“engine”: “odm”
},
“lastRefreshed”: 1608319460324,
“dockerMachine”: {
“name”: “clusterodm-1797-sRChQGw1LSazFZcw9M5PYY”,
“created”: 1608316189234,
“maxRuntime”: -1,
“maxUploadTime”: -1
}
}
#> node info 3
{
“hostname”: “”,
“port”: 3000,
“token”: “49jPmKkGjJA3upEvyL3Te”,
“info”: {
“version”: “2.1.1”,
“taskQueueCount”: 1,
“totalMemory”: 66721939456,
“availableMemory”: 65509462016,
“cpuCores”: 8,
“maxImages”: null,
“maxParallelTasks”: 1,
“engineVersion”: “2.3.3”,
“engine”: “odm”
},
“lastRefreshed”: 1608319460325,
“dockerMachine”: {
“name”: “clusterodm-1797-2QDy4MDNrbr5xcdGVxEHSP”,
“created”: 1608317356575,
“maxRuntime”: -1,
“maxUploadTime”: -1
}
}