This appears to still be a problem, and there are no GCPs in use on the one I have. The platform was previously updated in January and there have been 2 or 3 instances of hanging at compressing all.zip, and re-running does not solve the problem. I updated to the most recent in git yesterday but the problem still exists even after completely removing the tasks and re-uploading all photos. Rough rule of thumb for processing on these nodes is 1 hour per 100 photos at High Resolution setting, so I would have expected this to be completed in around 6 hours. We’re at almost 24 hours for this job. Also I noticed that the timer keeps resetting in the UI itself to what it was when it started the zip process.
console: https://data.nwk1.com/s/FwgYso4Bz56oYE4
It looks like the process gets stuck, not killed:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29159 odm 20 0 7909552 4.249g 32236 R 101.0 5.9 259:51.87 /webodm/python3-venv/bin/python3 /webodm/python3-venv/bin/celery -A worker worker --autoscale 8,2 --max-tasks-per-child 1000 --loglevel=warn
3829 odm 20 0 8101964 4.432g 32012 R 100.7 6.2 569:13.96 /webodm/python3-venv/bin/python3 /webodm/python3-venv/bin/celery -A worker worker --autoscale 8,2 --max-tasks-per-child 1000 --loglevel=warn
2294 odm 20 0 1304448 125220 44292 S 26.5 0.2 32:12.59 /webodm/python3-venv/bin/python3 /webodm/python3-venv/bin/celery -A worker worker --autoscale 8,2 --max-tasks-per-child 1000 --loglevel=warn
19103 odm 20 0 1359668 117352 20028 S 10.3 0.2 0:00.61 /webodm/python3-venv/bin/python3 /webodm/python3-venv/bin/celery -A worker worker --autoscale 8,2 --max-tasks-per-child 1000 --loglevel=warn
19242 odm 20 0 1343688 101064 19900 S 3.3 0.1 0:00.10 /webodm/python3-venv/bin/python3 /webodm/python3-venv/bin/celery -A worker worker --autoscale 8,2 --max-tasks-per-child 1000 --loglevel=warn
19240 odm 20 0 1343684 101028 19900 S 2.3 0.1 0:00.07 /webodm/python3-venv/bin/python3 /webodm/python3-venv/bin/celery -A worker worker --autoscale 8,2 --max-tasks-per-child 1000 --loglevel=warn
19243 odm 20 0 1346220 103364 19900 S 1.3 0.1 0:00.04 /webodm/python3-venv/bin/python3 /webodm/python3-venv/bin/celery -A worker worker --autoscale 8,2 --max-tasks-per-child 1000 --loglevel=warn
1522 redis 20 0 130244 91564 2680 S 1.0 0.1 4:23.05 /usr/bin/redis-server 127.0.0.1:6379
4344 odm 20 0 3013076 217868 33676 S 1.0 0.3 1:18.04 /webodm/python3-venv/bin/python3 /webodm/python3-venv/bin/gunicorn webodm.wsgi --bind unix:/webodm/gunicorn.sock --workers 24 --threads 24 --timeout 600000 --max-re+
13344 odm 20 0 3012820 226352 17376 S 1.0 0.3 0:39.20 /webodm/python3-venv/bin/python3 /webodm/python3-venv/bin/gunicorn webodm.wsgi --bind unix:/webodm/gunicorn.sock --workers 24 --threads 24 --timeout 600000 --max-re+
8 root 20 0 0 0 0 S 0.7 0.0 2:50.42 [rcu_sched]
2299 odm 20 0 1734600 139760 43996 S 0.7 0.2 2:40.03 /webodm/python3-venv/bin/python3 /webodm/python3-venv/bin/celery -A worker beat --pidfile=
I updated to the latest yesterday of everything - WebODM, NodeODM and ClusterODM. The jobs are distributed via ClusterODM, though there’s no split-merging going on. The process seems to hang on the WebODM node. All nodes have in excess of 200GB RAM, including the WebODM node, and there’s no indication that the process OOMs or similar.
Strace of 29159 (seems to be stuck in a loop):
read(38, “II*\0\224\257\0270\21\0\0\1\3\0\1\0\0\0\333|\0\0\1\1\3\0\1\0\0\0\25s”…, 4194304) = 4194304
lseek(38, 4194304, SEEK_SET) = 4194304
lseek(38, 1514143744, SEEK_SET) = 1514143744
read(38, “3\377YRG\377eS\377<:,\377DB5\377XTF\377TPA\377>7,\377\35\30"..., 4194304) = 4194304 lseek(38, 1518338048, SEEK_SET) = 1518338048 lseek(38, 0, SEEK_SET) = 0 read(38, "II*\0\224\257\0270\21\0\0\1\3\0\1\0\0\0\333|\0\0\1\1\3\0\1\0\0\0\25s"..., 4194304) = 4194304 lseek(38, 4194304, SEEK_SET) = 4194304 lseek(38, 1514143744, SEEK_SET) = 1514143744 read(38, "3\377YRG\377e
S\377<:,\377DB5\377XTF\377TPA\377>7,\377\35\30”…, 4194304) = 4194304
lseek(38, 1518338048, SEEK_SET) = 1518338048
lseek(38, 0, SEEK_SET) = 0
read(38, “II*\0\224\257\0270\21\0\0\1\3\0\1\0\0\0\333|\0\0\1\1\3\0\1\0\0\0\25s”…, 4194304) = 4194304
lseek(38, 4194304, SEEK_SET) = 4194304
lseek(38, 1514143744, SEEK_SET) = 1514143744
read(38, “3\377YRG\377eS\377<:,\377DB5\377XTF\377TPA\377>7,\377\35\30"..., 4194304) = 4194304 lseek(38, 1518338048, SEEK_SET) = 1518338048 lseek(38, 0, SEEK_SET) = 0 read(38, "II*\0\224\257\0270\21\0\0\1\3\0\1\0\0\0\333|\0\0\1\1\3\0\1\0\0\0\25s"..., 4194304) = 4194304 lseek(38, 4194304, SEEK_SET) = 4194304 lseek(38, 1514143744, SEEK_SET) = 1514143744 read(38, "3\377YRG\377e
S\377<:,\377DB5\377XTF\377TPA\377>7,\377\35\30”…, 4194304) = 4194304
lseek(38, 1518338048, SEEK_SET) = 1518338048
lseek(38, 0, SEEK_SET) = 0
read(38, “II*\0\224\257\0270\21\0\0\1\3\0\1\0\0\0\333|\0\0\1\1\3\0\1\0\0\0\25s”…, 4194304) = 4194304
lseek(38, 4194304, SEEK_SET) = 4194304
lseek(38, 1514143744, SEEK_SET) = 1514143744
read(38, “3\377YRG\377e`S\377<:,\377DB5\377XTF\377TPA\377>7,\377\35\30”…, 4194304) = 4194304
lseek(38, 1518338048, SEEK_SET) = 1518338048
lseek(38, 0, SEEK_SET) = 0
read(38, “II*\0\224\257\0270\21\0\0\1\3\0\1\0\0\0\333|\0\0\1\1\3\0\1\0\0\0\25s”…, 4194304) = 4194304
lseek(38, 4194304, SEEK_SET) = 4194304
lseek(38, 1514143744, SEEK_SET) = 1514143744
read(38, ^Cstrace: Process 29159 detached
<detached …>