Processing hangs at Compressing all.zip

RyanF · April 10, 2019, 8:00pm

I have had pretty good luck with ODM for a few months on my server but recently I have had a number of failed tasks because ODM appears to hang at the ‘Compressing all.zip’ stage. I have also used the Lightning network and had similar results. This is happening whenever I add GCPs to the dataset to be processed.

I am using WebODM through Docker on a Dell R710 and only recently started having issues after an update about a week ago. I can see a Celery job running when it gets stuck but even after waiting several hours it fails out.

The test images I am using have been only about 120 images or less. My server has approximately 1TB of storage remaining and 80 GB of RAM available for ODM to abuse as necessary.

I am also not sure if it’s related but the elapsed processing timer also gets stuck when it reaches this stage. If I refresh the page when it reaches the compressing stage, the timer resets to the elapsed time at which it began the compressing stage.

Terminal output:

conversion finished 
55,569,731 points were processed and 55,569,731 points ( 100% ) were written to the output. 
duration: 383.984s 

Postprocessing: done (•̀ᴗ•́)و! 

Compressing all.zip

I have tried updating everything a couple times and still have had no luck. My next step will probably be to remove everything and reinstall from scratch but I don’t know how effective that will be. I will upload the dataset to a drive when I get a chance.

Here are some related threads which I have looked at but not found an effective solution from:

pierotofy · April 10, 2019, 8:17pm

Hey @RyanF could you share your dataset images and GCP file with us?

RyanF · April 10, 2019, 8:20pm

Yep! I’ll get it uploaded sometime over the next couple days. Google drive runs slowly when uploading large amounts of data.

RyanF · April 10, 2019, 8:28pm

Actually scrap that. I already have it on my drive and just had to move it to the shared folder. You can find the dataset here:
https://drive.google.com/drive/folders/1ghpCsdAoc02VtgxjXmJDyvCWVGhEIJph?usp=sharing

pierotofy · April 10, 2019, 8:49pm

Will test in the next few days and report my findings here.

pierotofy · April 10, 2019, 9:02pm

First thing I noticed: EPSG codes are not supported.

EPSG:6345

Have you tried using proj4 instead?

+proj=utm +zone=16 +ellps=GRS80 +units=m +no_defs

https://docs.opendronemap.org/using.html#ground-control-points

RyanF · April 10, 2019, 10:11pm

I was screwing around with it a lot earlier and it accepted the EPSG code in ODM. I couldn’t get it to process with proj4 so I instead converted to EPSG when going from the GCP interface to the ODM processing interface

pierotofy · April 9, 2020, 12:48am

Mm, if this still happens, in this order:

Check if the task completed on NodeODM (by querying the API directly, check the status code, is it running or completed?)
If NodeODM has completed, then the problem is in WebODM. Check that at least one worker process is still running (can you run a volume measurement?) as well as the worker scheduler process (are the processing nodes online when creating a new task? If the worker scheduler died, then all processing nodes will be reported offline after 30 seconds).
Is the redis process running?
Is there a lock on the task that is stuck? Locks can be queried via the redis CLI by looking up the appropriate “task_lock_|taskId|” key (replace |taskId| with the WebODM task ID (which is not the NodeODM UUID). There should either be no lock, or a lock should continue to be updated after a some time (it will change timestamp value).
WebODM/tasks.py at master · OpenDroneMap/WebODM · GitHub
If the lock updates, then the only place I can think of that this process might be stuck is at the downloading stage, here: PyODM/api.py at master · OpenDroneMap/PyODM · GitHub this would only happen if ClusterODM is configured to use S3 storage and S3 storage is reporting the wrong byte counts for HTTP range requests (or the OS is reporting the wrong number of bytes written, for some reason?)

If there’s a problem and you can find a fix, send a pull request

ITWarrior · April 9, 2020, 3:20am

Yes, both stuck jobs report 100% as far as ClusterODM is concerned:

#> task list
[
{
“uuid”: “0fa6ba8c-07e0-49a1-9a52-93b039ac2799”,
“name”: “…”,
“dateCreated”: 1586315507091,
“processingTime”: 39822400,
“status”: {
“code”: 40
},
“options”: [
{
“name”: “opensfm-depthmap-min-patch-sd”,
“value”: 1
},
{
“name”: “mesh-octree-depth”,
“value”: 9
},
{
“name”: “min-num-features”,
“value”: 8000
},
{
“name”: “resize-to”,
“value”: -1
},
{
“name”: “use-3dmesh”,
“value”: true
},
{
“name”: “texturing-nadir-weight”,
“value”: 16
},
{
“name”: “orthophoto-resolution”,
“value”: 2
},
{
“name”: “dtm”,
“value”: true
},
{
“name”: “dem-resolution”,
“value”: 2
},
{
“name”: “mesh-size”,
“value”: 100000
},
{
“name”: “use-opensfm-dense”,
“value”: false
},
{
“name”: “fast-orthophoto”,
“value”: false
},
{
“name”: “crop”,
“value”: 10
},
{
“name”: “dsm”,
“value”: true
},
{
“name”: “matcher-neighbors”,
“value”: 8
},
{
“name”: “depthmap-resolution”,
“value”: 800
},
{
“name”: “texturing-data-term”,
“value”: “gmi”
},
{
“name”: “use-fixed-camera-params”,
“value”: true
},
{
“name”: “use-hybrid-bundle-adjustment”,
“value”: false
}
],
“imagesCount”: 543,
“progress”: 100
},
{
“uuid”: “d85f047f-1dbb-433c-bbaf-030c3d012b23”,
“name”: “…”,
“dateCreated”: 1586315941465,
“processingTime”: 67294608,
“status”: {
“code”: 40
},
“options”: [
{
“name”: “opensfm-depthmap-min-patch-sd”,
“value”: 1
},
{
“name”: “mesh-octree-depth”,
“value”: 9
},
{
“name”: “min-num-features”,
“value”: 8000
},
{
“name”: “resize-to”,
“value”: -1
},
{
“name”: “use-3dmesh”,
“value”: true
},
{
“name”: “texturing-nadir-weight”,
“value”: 16
},
{
“name”: “orthophoto-resolution”,
“value”: 2
},
{
“name”: “dtm”,
“value”: true
},
{
“name”: “dem-resolution”,
“value”: 2
},
{
“name”: “mesh-size”,
“value”: 100000
},
{
“name”: “use-opensfm-dense”,
“value”: false
},
{
“name”: “fast-orthophoto”,
“value”: false
},
{
“name”: “crop”,
“value”: 10
},
{
“name”: “dsm”,
“value”: true
},
{
“name”: “matcher-neighbors”,
“value”: 8
},
{
“name”: “depthmap-resolution”,
“value”: 800
},
{
“name”: “texturing-data-term”,
“value”: “gmi”
},
{
“name”: “use-fixed-camera-params”,
“value”: true
},
{
“name”: “use-hybrid-bundle-adjustment”,
“value”: false
}
],
“imagesCount”: 924,
“progress”: 100
},

Yes:

Screenshot from 2020-04-09 15-09-08733×529 290 KB
Yes:

root@survey:~# ps aux | grep redis
redis 1522 0.3 0.1 134340 81976 ? Ssl Apr08 5:11 /usr/bin/redis-server 127.0.0.1:6379
root 3427 0.0 0.0 14220 928 pts/0 S+ 15:09 0:00 grep --color=auto redis

There are no tasks locks as far as I can tell, unless I’m driving this thing wrong. Just thousands of these:

“celery-task-meta-c9af61b7-dc6b-4679-a4c1-3af76327c045”

“celery-task-meta-61a314bc-f839-4f36-9a90-49dafd5e4a0f”

“celery-task-meta-c3aa44aa-d0a4-4184-b4e0-f7f4f75bec0d”

“celery-task-meta-4210619c-a4d4-4fe2-a6e7-1c480fe6ab0f”

“celery-task-meta-0d89dc19-e3a9-49d3-8ffa-72337b1b50f3”

“celery-task-meta-02ec1b69-445c-4a36-952c-4cf8624c7345”

I’m not using S3 storage. It’s a 3/2 Ceph cluster of 2TB or 4TB Micro 5100 Enterprise SSDs. Network is (more or less) all 10G.

pierotofy · April 9, 2020, 3:31pm

I’m out of ideas.

ITWarrior · April 9, 2020, 10:39pm

Here’s a head-scratcher. I took a copy of the images and created two brand new jobs and ran them with the Default and High Resolution parameters. The Default one completed successfully, the HiRes one gets stuck on compressing.

These are a collection of moorland aerials photos and other Hi Res tasks have been completed successfully with similar photos in batches up to 1500 in size. This one is a comparatively small batch at 550.

This is what I have for each:
Default:
Options: use-3dmesh: true, dsm: true, dtm: true, camera-lens: brown
High Resolution:
Options: fast-orthophoto: false, use-hybrid-bundle-adjustment: false, use-fixed-camera-params: true, resize-to: -1, crop: 10, dem-resolution: 2, orthophoto-resolution: 2, depthmap-resolution: 800, opensfm-depthmap-min-patch-sd: 1, min-num-features: 8000, matcher-neighbors: 8, texturing-data-term: gmi, use-opensfm-dense: false, mesh-size: 100000, mesh-octree-depth: 9, texturing-nadir-weight: 16, use-3dmesh: true, dsm: true, dtm: true, camera-lens: brown

sheneman · July 20, 2020, 11:37pm

Just wanted to report that we appear to have the same problem. This is with WebODM 1.4.1 and the last version of NodeODM. Our params are "default + optimize disk space + png download.

In our case, it appears that WebODM is hung (for days) here:

Adding 0 - odm_georeferencing/odm_georeferenced_model.laz

** Pushes complete - joining…**

Some datapoints:

The processing node is up and responsive via the API and appears to be finished with this task. This appears to be stuck entirely in WebODM.
WebODM is up and running and responsive in other ways (volume measurements, starting new tasks)
Redis is up and running
I don’t know if there is a stuck lock on the task. All I see is about 40K of these in redis (and growing):

“celery-task-meta-251cdf7c-cc3f-43e3-bf6c-8ab56e3da833”
“celery-task-meta-a5e7a115-ddb7-4059-9422-f8967af68cb5”
“celery-task-meta-225587db-56e6-4f6e-ad89-fc3528ce8c79”
“celery-task-meta-36627c3a-d585-4900-b49f-be33a8dd3fb1”
“celery-task-meta-c7db3e00-af1a-4111-8994-89339811c539”
“celery-task-meta-f609e813-f653-4ba7-932e-8b5ce26fcb70”
“celery-task-meta-d0d7180e-0836-4155-81fb-bf33409043b8”
“celery-task-meta-319c2f4b-c7f3-4937-849f-19a61211bfd7”

As with ITWarrior, strace appears to show that celery is in an infinite loop??

lseek(31, 2030804992, SEEK_SET) = 2030804992
read(31, “\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0”…, 8192) = 8192
lseek(31, 0, SEEK_SET) = 0
read(31, “II*\0\373{)\21\0\0\1\3\0\1\0\0\0ll\0\0\1\1\3\0\1\0\0\0o\201"..., 8192) = 8192 lseek(31, 221184, SEEK_SET) = 221184 read(31, "\3y\221\264\3y\260\264\3y\317\264\3y\356\264\3y\r\265\3y,\265\3yK\265\3y\256\266"..., 8192) = 8192 lseek(31, 2030321664, SEEK_SET) = 2030321664 read(31, "\234\355\301\1\r\0\0\0\302\240\376\251o\0167\240\0\0\0\0\0\0\0\200w\3\364\243\341\322x\234"..., 8192) = 8192 lseek(31, 2030804992, SEEK_SET) = 2030804992 read(31, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 lseek(31, 0, SEEK_SET) = 0 read(31, "II*\0\373{)\21\0\0\1\3\0\1\0\0\0ll\0\0\1\1\3\0\1\0\0\0o\201”…, 8192) = 8192
lseek(31, 221184, SEEK_SET) = 221184
read(31, “\3y\221\264\3y\260\264\3y\317\264\3y\356\264\3y\r\265\3y,\265\3yK\265\3y\256\266”…, 8192) = 8192
lseek(31, 2030321664, SEEK_SET) = 2030321664
read(31, “\234\355\301\1\r\0\0\0\302\240\376\251o\0167\240\0\0\0\0\0\0\0\200w\3\364\243\341\322x\234”…, 8192) = 8192
lseek(31, 2030804992, SEEK_SET) = 2030804992
read(31, “\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0”…, 8192) = 8192
lseek(31, 0, SEEK_SET) = 0
read(31, “II*\0`\373{)\21\0\0\1\3\0\1\0\0\0ll\0\0\1\1\3\0\1\0\0\0o\201”…, 8192) = 8192
lseek(31, 221184, SEEK_SET) = 221184
read(31, “\3y\221\264\3y\260\264\3y\317\264\3y\356\264\3y\r\265\3y,\265\3yK\265\3y\256\266”…, 8192) = 8192
lseek(31, 2030321664, SEEK_SET) = 2030321664
read(31, “\234\355\301\1\r\0\0\0\302\240\376\251o\0167\240\0\0\0\0\0\0\0\200w\3\364\243\341\322x\234”…, 8192) = 8192

pierotofy · July 21, 2020, 2:54am

There’s definitely something going on, as we keep getting sporadic reports of WebODM hanging.

I have not been able to reproduce this, unfortunately.

And this would be the very first step required to fix this. Can this be reproduced with some consistency? If anyone finds a way to get it to stuck reliably, please document it or offer access to your server/computer where it gets stuck. After we have it fail somewhat reliably we can try to start a debugger and analyze the code.

israelbar · July 22, 2020, 4:08pm

Same problem here.

It was processed using Lightning, so now I’m downloading directly from webodm.net. I hope all assets are fine.

@pierotofy I have restarted webodm just to see if problem persists, and it do. I can give you access to my computer using anydesk. Just let me know.

pierotofy · July 22, 2020, 4:29pm

@israelbar can you please document the steps and the environment you are using to cause it to get stuck (how did you install WebODM?). Also, confirm that you have sufficient disk space/memory from the Diagnostic tab?

israelbar · July 22, 2020, 5:24pm

@pierotofy hope this information is useful. I’m not familiar to docker and installed following the webodm documentation instructions.

It was installed using docker on ubuntu 20.04. Have run several processess and this is the very first time having trouble.

pierotofy · July 22, 2020, 5:31pm

This helps! Can you post the output of:

docker logs worker

?

Is there any number of lines that seem to be repeating/stuck in a loop?

israelbar · July 22, 2020, 5:42pm

Not sure if its complete, since I restarted webodm and just after restart I deleted some unused projects.

Postgres is up - executing command
wait-for-it.sh: waiting for broker:6379 without a timeout
wait-for-it.sh: broker:6379 is available after 0 seconds
wait-for-it.sh: waiting for webapp:8000 without a timeout
wait-for-it.sh: webapp:8000 is available after 20 seconds
Checking for celery… OK
Starting worker using broker at redis://broker
INFO Initializing GRASS engine using /usr/bin/grass78
/usr/local/lib/python3.6/site-packages/celery/platforms.py:801: RuntimeWarning: You’re running the worker with superuser privileges: this is
absolutely not recommended!

Please specify a different user using the --uid option.

User information: uid=0 euid=0 gid=0 egid=0

uid=uid, euid=euid, gid=gid, egid=egid,
INFO Processing status: 40 for Task [Task of 2020-07-21T18:11:43.456Z] (bf22574e-3529-4cd5-8472-c0e9d93825de)
[2020-07-22 13:50:29,658: INFO/ForkPoolWorker-2] worker.tasks.process_task[4df1d009-4cab-453e-be25-fa57f349ef72]: Processing status: 40 for Task [Task of 2020-07-21T18:11:43.456Z] (bf22574e-3529-4cd5-8472-c0e9d93825de)
INFO Removing old assets directory: /webodm/app/media/project/14/task/bf22574e-3529-4cd5-8472-c0e9d93825de/assets/ for Task [Task of 2020-07-21T18:11:43.456Z] (bf22574e-3529-4cd5-8472-c0e9d93825de)
[2020-07-22 13:50:29,659: INFO/ForkPoolWorker-2] worker.tasks.process_task[4df1d009-4cab-453e-be25-fa57f349ef72]: Removing old assets directory: /webodm/app/media/project/14/task/bf22574e-3529-4cd5-8472-c0e9d93825de/assets/ for Task [Task of 2020-07-21T18:11:43.456Z] (bf22574e-3529-4cd5-8472-c0e9d93825de)
INFO Downloading all.zip for Task [Task of 2020-07-21T18:11:43.456Z] (bf22574e-3529-4cd5-8472-c0e9d93825de)
[2020-07-22 13:50:33,358: INFO/ForkPoolWorker-2] worker.tasks.process_task[4df1d009-4cab-453e-be25-fa57f349ef72]: Downloading all.zip for Task [Task of 2020-07-21T18:11:43.456Z] (bf22574e-3529-4cd5-8472-c0e9d93825de)
INFO Extracting all.zip for Task [Task of 2020-07-21T18:11:43.456Z] (bf22574e-3529-4cd5-8472-c0e9d93825de)
[2020-07-22 13:57:12,010: INFO/ForkPoolWorker-2] worker.tasks.process_task[4df1d009-4cab-453e-be25-fa57f349ef72]: Extracting all.zip for Task [Task of 2020-07-21T18:11:43.456Z] (bf22574e-3529-4cd5-8472-c0e9d93825de)
INFO Extracted all.zip for Task [Task of 2020-07-21T18:11:43.456Z] (bf22574e-3529-4cd5-8472-c0e9d93825de)
[2020-07-22 14:01:13,277: INFO/ForkPoolWorker-2] worker.tasks.process_task[4df1d009-4cab-453e-be25-fa57f349ef72]: Extracted all.zip for Task [Task of 2020-07-21T18:11:43.456Z] (bf22574e-3529-4cd5-8472-c0e9d93825de)
The following warnings were found:

The file is greater than 512xH or 512xW, it is recommended to include internal overviews

INFO Optimizing /webodm/app/media/project/14/task/bf22574e-3529-4cd5-8472-c0e9d93825de/assets/odm_orthophoto/odm_orthophoto.tif as Cloud Optimized GeoTIFF
[2020-07-22 14:01:13,668: INFO/ForkPoolWorker-2] worker.tasks.process_task[4df1d009-4cab-453e-be25-fa57f349ef72]: Optimizing /webodm/app/media/project/14/task/bf22574e-3529-4cd5-8472-c0e9d93825de/assets/odm_orthophoto/odm_orthophoto.tif as Cloud Optimized GeoTIFF
INFO Removing Task [Task of 2020-07-21T13:33:07.773Z] (bb63298f-9d33-4004-ad80-c2e4d712b4c9)
[2020-07-22 15:57:36,289: INFO/ForkPoolWorker-257] worker.tasks.process_task[4a7d574d-4398-481d-a062-948ad451d468]: Removing Task [Task of 2020-07-21T13:33:07.773Z] (bb63298f-9d33-4004-ad80-c2e4d712b4c9)
INFO Removing Task [Task of 2020-07-17T02:00:35.981Z] (5f5c37a5-7ed0-4b3f-b396-5b469527c642)
[2020-07-22 15:57:45,183: INFO/ForkPoolWorker-257] worker.tasks.process_task[1d4b8d7a-9125-4471-a6a4-2bd73e7ccf51]: Removing Task [Task of 2020-07-17T02:00:35.981Z] (5f5c37a5-7ed0-4b3f-b396-5b469527c642)
INFO Removing Task [Task of 2020-07-10T04:42:21.776Z] (95d9fa4d-51ad-4dac-bcd6-df143b4756d9)
[2020-07-22 15:58:00,121: INFO/ForkPoolWorker-259] worker.tasks.process_task[08352aa1-5a2a-44d4-9134-50ea6c2a8fb2]: Removing Task [Task of 2020-07-10T04:42:21.776Z] (95d9fa4d-51ad-4dac-bcd6-df143b4756d9)
INFO Removing Task [Task of 2020-07-09T03:33:05.069Z] (711e7aa2-e54d-410c-b249-44b9976480d4)
[2020-07-22 15:58:00,149: INFO/ForkPoolWorker-257] worker.tasks.process_task[0275af5c-b55f-4763-960a-1360afee787a]: Removing Task [Task of 2020-07-09T03:33:05.069Z] (711e7aa2-e54d-410c-b249-44b9976480d4)
INFO Deleted 2 projects
[2020-07-22 15:58:22,353: INFO/ForkPoolWorker-259] worker.tasks.cleanup_projects[46264440-abd0-4cc8-a031-9c4a78e6fe6f]: Deleted 2 projects

pierotofy · July 22, 2020, 5:45pm

It seems that you removed the project that got stuck also?

israelbar · July 22, 2020, 5:49pm

I did not deleted the project and it is still in the dashboard