Unexpected error: "Aborted (core dumped)" with large dataset

Hi everyone!

I’m running a process with 7,535 images taken at 120m of altitude with the mavic pro 2 (~3cm pixel size) on an amazon machine with 64vCPU (3.1 GHz) and 498GB of RAM and other 500GB of SSD SWAP (just in case…), with 1.5TB of hard drive available (SSD). I’m running on ubuntu server 18.04, using the following command:

sudo docker run -it --rm -v $(pwd)/images:/code/images -v $(pwd)/opensfm:/code/opensfm -v $(pwd)/odm_meshing:/code/odm_meshing -v $(pwd)/odm_texturing:/code/odm_texturing -v $(pwd)/odm_georeferencing:/code/odm_georeferencing -v $(pwd)/odm_orthophoto:/code/odm_orthophoto opendronemap/odm:0.9.1 --min-num-features 18000 --matcher-neighbors 24 --camera-lens brown --use-opensfm-dense --ignore-gsd --texturing-tone-mapping gamma --orthophoto-resolution 5 --verbose

I use odm v0.9.1 because it seems to work faster and with the same quality as newer versions (for our data at least). After ~3.5 days, at the reconstruction step (or immediately after: python /code/SuperBuild/src/opensfm/bin/opensfm reconstruct), I got this error:

image

It seems that I never run out of memory or disk space (and never used more than 12MB of SWAP). I have run several smaller process with a subset of ~2000 images each one, on smaller machines, without trouble (and in some cases using a lot of SWAP memory).

What could it be the reason? I re-launch the process and odm retake it at the reconstruction step to see what happens.

Thanks in advance!
Álvaro,

1 Like

Mm, I wonder if this works if you pass --use-hybrid-bundle-adjustment ?

Thanks for you suggestion; I didn’t answer before because I was collecting more data.

Unfortunately, even with your suggestion the process didn’t finish. In total, I ran it 3 times, the last one using your suggestion, and in every one of the three trials it crashed at a different image. We take a look to those 3 images and they didn’t show anything strange (nothing we could notice), so I’m still wondering what could be the cause.

The good news are that some log information was collected on this process (and other 2 more) and here are some plots of this process and two more (these last two were successful) to see if can give some information on what’s going on.

The first two plots, finished the process without problem. The last one, shows the process with the problem and the 3 tries (the arrows show when the process stopped and was re-launched). Red line shows number of CPUs used (left axis), and blue lines show memory (RAM) usage (right axis). Dotted light blue line shows VSIZE memory and continuous blue line shows RSS memory.

test3

And here’s another plot showing what process was going on (more or less, I’m still trying to improve it).

actividad

We’re currently trying the same dataset (dataset1) in the same machine (AWS EC2 r5dn.16xlarge), but with odm 0.9.8. We’re also trying with another dataset (dataset2, same place, but different sector), also with 0.9.8 but other kind of machine (less RAM and CPUs, but faster; AWS EC2 z1d.12xlarge) to see what happens.

If anyone have any insight, it will be very appreciated

Thanks!

1 Like

Maybe try split/merge?

1 Like

Thanks for the suggestion. We have tried (and we’re still testing on our machines) split/merge, but sometimes we had some alignment problems with the point cloud result (it’s very important for us) that we want to avoid, that’s why we’re not using it so far.

1 Like

Understood. I’m still working with smallish datasets, and haven’t experimented with it myself yet.