Another task failure when almost complete

Sick of the continued failings of my larger dataset, I ran a smaller dataset (1607 M2P images) from May this year, which completed successfully at the time, but failed this time. One difference was I used ORB this time, SIFT the previous time, and looking at the GSD, I probably resized by X0.5 last time, full size this time.

Details -
Options: auto-boundary: true, dem-resolution: 2.0, dsm: true, dtm: true, feature-quality: ultra, feature-type: orb, gps-accuracy: 5, matcher-distance: 65, matcher-neighbors: 20, mesh-size: 300000, min-num-features: 15000, orthophoto-resolution: 2.0, pc-quality: high, pc-rectify: true, use-3dmesh: true

Processing failed (4294967295)

End of the console log -

2021-12-11 13:57:09,763 DEBUG: Undistorting image DJI_0637.JPG
[INFO] running E:\WebODM\resources\app\apps\ODM\SuperBuild\install\bin\opensfm\bin\opensfm export_visualsfm --points “E:\WebODM\resources\app\apps\NodeODM\data\5710844b-b48f-4151-b79b-b5ca6ba75744\opensfm”
[INFO] Finished opensfm stage
[INFO] Running openmvs stage
[INFO] running E:\WebODM\resources\app\apps\ODM\SuperBuild\install\bin\opensfm\bin\opensfm export_openmvs “E:\WebODM\resources\app\apps\NodeODM\data\5710844b-b48f-4151-b79b-b5ca6ba75744\opensfm”
[INFO] Running dense reconstruction. This might take a while.
[INFO] Estimating depthmaps
[INFO] running E:\WebODM\resources\app\apps\ODM\SuperBuild\install\bin\OpenMVS\DensifyPointCloud “E:\WebODM\resources\app\apps\NodeODM\data\5710844b-b48f-4151-b79b-b5ca6ba75744\opensfm\undistorted\openmvs\scene.mvs” --resolution-level 2 --min-resolution 1368 --max-resolution 5472 --max-threads 16 --number-views-fuse 2 -w “E:\WebODM\resources\app\apps\NodeODM\data\5710844b-b48f-4151-b79b-b5ca6ba75744\opensfm\undistorted\openmvs\depthmaps” -v 0 --geometric-iters 0
===== Dumping Info for Geeks (developers need this to fix bugs) =====
Child returned 3221226505
Traceback (most recent call last):
File “E:\WebODM\resources\app\apps\ODM\stages\odm_app.py”, line 94, in execute
self.first_stage.run()
File “E:\WebODM\resources\app\apps\ODM\opendm\types.py”, line 346, in run
self.next_stage.run(outputs)
File “E:\WebODM\resources\app\apps\ODM\opendm\types.py”, line 346, in run
self.next_stage.run(outputs)
File “E:\WebODM\resources\app\apps\ODM\opendm\types.py”, line 346, in run
self.next_stage.run(outputs)
[Previous line repeated 1 more time]
File “E:\WebODM\resources\app\apps\ODM\opendm\types.py”, line 327, in run
self.process(self.args, outputs)
File “E:\WebODM\resources\app\apps\ODM\stages\openmvs.py”, line 85, in process
system.run(‘%s “%s” %s’ % (context.omvs_densify_path,
File “E:\WebODM\resources\app\apps\ODM\opendm\system.py”, line 106, in run
raise SubprocessException(“Child returned {}”.format(retcode), retcode)
opendm.system.SubprocessException: Child returned 3221226505

===== Done, human-readable information to follow… =====

[ERROR] The program exited with a strange error code. Please report it at https://community.opendronemap.org


There was no mass production of 1GB files with these images, folder \16384 only has 18.5MB in it, rather than hundreds of GB with the previous job with 6X the number of files.

1 Like

You’re using v1.9.11 Build 47, right?

Does it run with the same settings you used in May?

1 Like

Yes, from another clean install, as I deleted WebODM again and re-installed after the last fail which left masses of those 1GB files.

Speaking of which, what kind of files are they that you can delete 300GB of them and not free up any disk space!?

I’ll try that after the current attempt.

What failed yesterday -

[INFO] [‘use_exif_size: no’, ‘flann_algorithm: KDTREE’, ‘feature_process_size: 5472’, ‘feature_min_frames: 15000’, ‘processes: 16’, ‘matching_gps_neighbors: 20’, ‘matching_gps_distance: 65’, ‘optimize_camera_parameters: yes’, ‘undistorted_image_format: tif’, ‘bundle_outlier_filtering_type: AUTO’, ‘sift_peak_threshold: 0.066’, ‘align_orientation_prior: vertical’, ‘triangulation_type: ROBUST’, ‘retriangulation_ratio: 2’, ‘matcher_type: FLANN’, ‘feature_type: ORB’, ‘use_altitude_tag: yes’, ‘align_method: auto’, ‘local_bundle_radius: 0’]

What worked in May -

[INFO] [‘use_exif_size: no’, ‘flann_algorithm: KDTREE’, ‘feature_process_size: 1024’, ‘feature_min_frames: 8000’, ‘processes: 16’, ‘matching_gps_neighbors: 8’, ‘matching_gps_distance: 0’, ‘optimize_camera_parameters: yes’, ‘undistorted_image_format: tif’, ‘bundle_outlier_filtering_type: AUTO’, ‘align_orientation_prior: vertical’, ‘triangulation_type: ROBUST’, ‘retriangulation_ratio: 2’, ‘feature_type: SIFT’, ‘use_altitude_tag: yes’, ‘align_method: auto’, ‘local_bundle_radius: 0’]

1 Like

Docker is a fickle beast, doubly so on windows. I have seen this behaviour in a limited way on Linux where syncing of docker mounted drives doesn’t propagate appropriately. My experience is that this isn’t common on Linux, but observable and usually resolve relatively quickly.

I can’t speak to windows. But, as much as OpenDroneMap works on windows due to popular demand, you are in the wilderness a bit running such a large dataset through on windows. You’re at the pro-level dataset size-wise. You should probably be using something closer to a native tool. Docker on Linux is pretty close, and demonstrably close enough in my testing (though more storage never hurts either).

2 Likes

It appears that this set of 1607 files has failed again. The counter is still counting, now up to 69h19m, but no console updates for ~4 hours, which is a bit long for this stage -

2021-12-14 10:31:23,246 INFO: Adding DJI_0292.JPG to the reconstruction
2021-12-14 10:31:24,798 INFO: Shots and/or GCPs are well-conditioned. Using naive 3D-3D alignment.

I tried to cancel, but it’s hung, so I closed the program, then re-opened to find-

65:31:27 (about when the console stopped updating)
Processing node went offline. This could be due to insufficient memory or a network error.

but there is no Dumping Info error message.

Looks like a peak in memory usage, then shutting down, red is CPU time.

It looks like it ran out of RAM, but there was 22GB starting size of paging, and many tens of GB more up to maximum size still available.

Settings for this attempt -

Options: auto-boundary: true, dem-resolution: 2.0, dsm: true, dtm: true, feature-quality: ultra, gps-accuracy: 5, matcher-distance: 65, matcher-neighbors: 20, mesh-size: 300000, min-num-features: 11000, orthophoto-resolution: 2.0, pc-quality: high, pc-rectify: true, use-3dmesh: true

Now I’ll try again with the original settings from May, and see if that works.

Stephen, I’ll upload this batch of 1607 files if you, or anyone else, would like to have a go at it, but it will take a while over my fairly slow internet connection.

2 Likes

Happy to take a look.

2 Likes

378 down, another 11 hours to go! Fast internet isn’t a thing you get in most rural areas here.

1 Like

Sift and 0.5X resize worked fine in 6h 36m, max memory use 40%, I’ll put the report in with the images.

1 Like

Given that Microsoft noted that any delay in pagefile expansion can cause a program to crash when it’s Allocation exceeds current RAM+pagefile, I’m wondering if you pushing the starting size significantly higher (set it to the max you have now, so it is just max/max and static) might not be more reliable?

1 Like

I guess you need to be watching in task manager at the right moment to catch that.

I wonder how much time “any delay” equates to? I assume an SSD is fast enough, but what about an idle HD with a delay while the disk spins up?

1 Like

1607 M2P images, plus the report from the successful task when resized by 0.5X.

https://drive.google.com/drive/folders/15fFQAyEDQNUr-iVx1PuFaZtBZQNsiaXG?usp=sharing

I’m currently attempting full size with a split-merge into 2 sections.

2 Likes

Downloading now…

2 Likes

They make no mention, and closed-source (not that I would be smart enough to understand something like this even if I had the source :rofl:), but they seem to imply broadly any measurable delay. I’ve certainly had issues with expanding pagefile on my Crucial MX500 that were resolved by setting a static allocation size, so I imagine a HDD with spinup would be similar.

Would NVME fare the same? I have no idea… Don’t have access to any hardware that new.

1 Like


05:18. No split.
Will post more later

2 Likes

results @
https://drive.google.com/drive/folders/1RpX6BMSxeGOpD4gdVjjcFaWDCVk-ctqV?usp=sharing
Can bump up quality quite a bit from this. used default parameters + options found in attached file.
Let me know if you need additional info

2 Likes

I don’t think it is a memory problem, I split-merged it into 2, but it failed near the end once again, committed bytes in use peaked at 60%.

37:26:27 Cannot process dataset

Options: auto-boundary: true, cameras: {“hasselblad l1d-20c 2736 1824 brown 0.7777 rgb”:{“projection_type”:“brown”,“width”:2736,“height”:1824,“focal_x”:0.8072925429425077,“focal_y”:0.8072925429425077,“c_x”:0.0008283204679778942,“c_y”:-0.011517113061019215,“k1”:-0.006811989257581941,“k2”:0.044230166719590125,“p1”:-0.004769192505233559,“p2”:-0.00009327185296352527,“k3”:-0.049276427731337454}}, dem-resolution: 2.0, dsm: true, dtm: true, gps-accuracy: 5, matcher-neighbors: 16, mesh-size: 250000, min-num-features: 10000, optimize-disk-space: true, orthophoto-resolution: 2.0, pc-quality: high, pc-rectify: true, split: 820, use-3dmesh: true

I used cameras.json from the task of the same area that I completed before these,although that was at 0.5X resizing. Not sure if that was an issue with this fail.

End of the console log:

Input file size is 22631, 28607 0…10…20…30…40…50…60…70…80…90…100 - done. [INFO] Starting smoothing… [INFO] Smoothing iteration 1 [INFO] Completed smoothing to create E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_dem\dsm.tif in 0:01:47.794152 [INFO] Completed dsm.tif in 0:26:57.182919 [INFO] Cropping E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_dem\dsm.tif [INFO] running gdalwarp -cutline “E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_georeferencing\odm_georeferenced_model.bounds.gpkg” -crop_to_cutline -co TILED=YES -co COMPRESS=DEFLATE -co BLOCKXSIZE=512 -co BLOCKYSIZE=512 -co BIGTIFF=IF_SAFER -co NUM_THREADS=16 “E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_dem\dsm.original.tif” “E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_dem\dsm.tif” --config GDAL_CACHEMAX 45.7% Creating output file that is 22547P x 28515L. Processing E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_dem\dsm.original.tif [1/1] : 0Using internal nodata values (e.g. -9999) for image E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_dem\dsm.original.tif. Copying nodata values from source E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_dem\dsm.original.tif to destination E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_dem\dsm.tif. …10…20…30…40…50…60…70…80…90…100 - done. [INFO] Cropping E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_dem\dsm.unfilled.tif [INFO] running gdalwarp -cutline “E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_georeferencing\odm_georeferenced_model.bounds.gpkg” -crop_to_cutline -co TILED=YES -co COMPRESS=DEFLATE -co BLOCKXSIZE=512 -co BLOCKYSIZE=512 -co BIGTIFF=IF_SAFER -co NUM_THREADS=16 “E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_dem\dsm.unfilled.original.tif” “E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_dem\dsm.unfilled.tif” --config GDAL_CACHEMAX 45.65% Creating output file that is 22547P x 28515L. Processing E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_dem\dsm.unfilled.original.tif [1/1] : 0Using internal nodata values (e.g. -9999) for image E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_dem\dsm.unfilled.original.tif. Copying nodata values from source E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_dem\dsm.unfilled.original.tif to destination E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_dem\dsm.unfilled.tif. …10…20…30…40…50…60…70…80…90…100 - done. [INFO] Computing euclidean distance: E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_dem\dsm.euclideand.tif [INFO] running gdal_proximity.py “E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_dem\dsm.unfilled.tif” “E:\WebODM\resources\app\apps\NodeODM\data\e30841f7-d4f5-4562-a976-6114f73495ea\submodels\submodel_0000\odm_dem\dsm.euclideand.tif” -values -9999.0 ===== Dumping Info for Geeks (developers need this to fix bugs) ===== Child returned 1 Traceback (most recent call last): File “E:\WebODM\resources\app\apps\ODM\stages\odm_app.py”, line 94, in execute self.first_stage.run() File “E:\WebODM\resources\app\apps\ODM\opendm\types.py”, line 346, in run self.next_stage.run(outputs) File “E:\WebODM\resources\app\apps\ODM\opendm\types.py”, line 346, in run self.next_stage.run(outputs) File “E:\WebODM\resources\app\apps\ODM\opendm\types.py”, line 346, in run self.next_stage.run(outputs) [Previous line repeated 6 more times] File “E:\WebODM\resources\app\apps\ODM\opendm\types.py”, line 327, in run self.process(self.args, outputs) File “E:\WebODM\resources\app\apps\ODM\stages\odm_dem.py”, line 125, in process commands.compute_euclidean_map(unfilled_dem_path, File “E:\WebODM\resources\app\apps\ODM\opendm\dem\commands.py”, line 293, in compute_euclidean_map run(‘gdal_proximity.py “%s” “%s” -values %s’ % (geotiff_path, output_path, nodata)) File “E:\WebODM\resources\app\apps\ODM\opendm\system.py”, line 106, in run raise SubprocessException(“Child returned {}”.format(retcode), retcode) opendm.system.SubprocessException: Child returned 1 ===== Done, human-readable information to follow… ===== [ERROR] Uh oh! Processing stopped because of strange values in the reconstruction. This is often a sign that the input data has some issues or the software cannot deal with it. Have you followed best practices for data acquisition? See Flying Tips — OpenDroneMap 3.1.7 documentation 100 - done. ===== Dumping Info for Geeks (developers need this to fix bugs) ===== Child returned 1 Traceback (most recent call last): File “E:\WebODM\resources\app\apps\ODM\stages\odm_app.py”, line 94, in execute self.first_stage.run() File “E:\WebODM\resources\app\apps\ODM\opendm\types.py”, line 346, in run self.next_stage.run(outputs) File “E:\WebODM\resources\app\apps\ODM\opendm\types.py”, line 327, in run self.process(self.args, outputs) File “E:\WebODM\resources\app\apps\ODM\stages\splitmerge.py”, line 164, in process system.run(" ".join(map(double_quote, map(str, argv))), env_vars=os.environ.copy()) File “E:\WebODM\resources\app\apps\ODM\opendm\system.py”, line 106, in run raise SubprocessException(“Child returned {}”.format(retcode), retcode) opendm.system.SubprocessException: Child returned 1 ===== Done, human-readable information to follow… ===== [ERROR] Uh oh! Processing stopped because of strange values in the reconstruction. This is often a sign that the input data has some issues or the software cannot deal with it. Have you followed best practices for data acquisition? See Flying Tips — OpenDroneMap 3.1.7 documentation

1 Like

Are you still using the dynamic pagefile at this stage? I wonder if bumping that up higher for starting size might not help…

1 Like

Partly- as can be seen on the screen grab, I have 32GB initial. C:\ has 16/16, and E:\ the WebODM SSD has 16/32, so I’m starting with 96 RAM + 32Virtual = 128GB, but is that really likely to be the problem, given the PerfMon plot of actual usage not going anywhere near maximum? It was only 800 images per split.

And now I have increased virtual memory on E:\ to 64/64, so I’ll be starting with 176GB. If it wont work with that I think I’ll give up and take up stamp collecting instead! :wink:

1 Like

Another fun implementation detail with multiple pagefiles: The first one to respond is the one that gets used. So, having multiple pagefiles means you might not have a deterministic usage of one or the other.

Also, don’t forget, Allocation Size is not necessarily something that you’ll see as if you don’t have enough Allocation Size room between RAM+pagefile, it will simply crash. You don’t need to have all of your current RAM+pagefile consumed for this to be an issue. Total RAM+pagefile just needs to be smaller than what would have been Allocated.

For testing, setting it so you have one starting large pagefile might be best. For instance, mine was set to start at 262144MB, with the other pagefiles disabled. This seemed stable.

2 Likes