Processing dataset crashes at Aligning submodels stage

Hello ODM community

I have small ODM cluster of 2 machines with 8GB and 4Gb each that I try to use to stitch relatively small dataset of 1300 images… The Cluster doing well distributing jobs to both machines quite efficiently. Everything goes alright and then stops at Aligning submodels… stage
Last few lines from console log at where it stops…
[INFO] LRE: Downloaded and extracted assets for submodel_0011
[INFO] LRE: submodel_0011 finished successfully
[INFO] LRE: Cleaning up remote task (a76018fa-2378-4cb4-ac01-b6a1494366b6)… OK
[INFO] LRE: Downloading assets for submodel_0009
[INFO] LRE: Download of submodel_0009 at [0%]
[INFO] LRE: Download of submodel_0009 at [24%]
[INFO] LRE: Download of submodel_0009 at [56%]
[INFO] LRE: Download of submodel_0009 at [85%]
[INFO] LRE: Download of submodel_0009 at [100%]
[INFO] LRE: Downloaded and extracted assets for submodel_0009
[INFO] LRE: submodel_0009 finished successfully
[INFO] LRE: Cleaning up remote task (881d780c-b9eb-4a43-8018-5946b173aef9)… OK
[INFO] LRE: No remote tasks left to cleanup
[INFO] Aligning submodels…
/code/run.sh: line 5: 2841 Segmentation fault (core dumped) python3 $RUNPATH/run.py “$@”

My question is. Does it happen due hardware problem, low memory or incorrect configuration of the cluster and task. Available RAM seems plenty at the moment it crashes.
I have split of 100 images with 75 overlap… not sure if any other options would be relevant?

1 Like

Welcome!

I’m not sure I’d consider 1300 images small, and certainly not on machines with 8GB/4GB RAM.

What processing parameters are you using?

What are these platforms (CPU, OS, installation method, etc)?

Thanks for your reply.
from those 1300 images I was able to process orthophoto just on 8GB of a spare laptop on bare Debian server and WebODM, but it’s not enough for Point Cloud and 3D of more or less decent quality that’s for sure…
Just a brief prehistory of why I am using 8GB and 4GB machines for such job.
I ended up in interesting, challenging situation on a remote island right on the edge of the world map where I need to map around 250 hectares of mountain jungle and process orthophotos and DTMs out of it…
1300 images is about half of what to be expected in total. I came there for different job and not prepared for a job of such scale, but COVID hit the country I am in a few months ago (It’s been COVID free all this time) and I am stuck here for quite some time as any passenger connection with mainland was ceased.
So I ended up pretty much with everything I need for the job except sufficient computation power. I will be able to ship proper hardware for stitching maps, but so far I left with 2 machines of Core i5 and 8GB plus one laptop i3 and 4GB… those 3 machines I managed to set to work in ODM cluster, but memory upgrade is not possible no matter how much I want it. It’s just nowhere to buy it here. No air freight and sea freight isn’t frequent and reliable. There is no such thing as online shopping in this country we all got used to… Cloud processing also not an option as with the internet connection I am having here I am not able to upload even 1GB in one day… so I am here with what I got so far. If I still keep failing to get something useful with the computers I’ve got here, I will wait for memory and machines upgrade, but meanwhile I am keep trying…
So…
All machines I have are bare Debian 10, with SSD disks.
I use docker to start NodeODM on nodes and ClusterODM and WebODM on primary host…
ClusterODM and WebODM share same machine, but now I am thinking to try to separate them and maybe try run ClusterODM as nodejs installation for difference on different machine.

The processing parameters are the following

build-overviews: true,
dem-euclidean-map: true,
dsm: true,
dtm: true,
min-num-features: 12000,
pc-classify: true,
pc-las: true,
pc-rectify: true,
rerun-from: dataset,
sm-cluster: https://primaryhostip:4000,
smrf-scalar: 2.3,
smrf-slope: 0.05,
smrf-threshold: 0.15,
smrf-window: 10,
split: 300,
split-overlap: 100,
verbose: true

I tried different split and split-overlap values less and more… it’s always end up with Aligning submodel error… but again… the primary node has plenty of free memory at the time the processing crashes… making me think insufficient memory probably isn’t the cause of that…

UPDATE: Just tried ClusterODM on different machine with similar specs (Core i5 8GB) running it natively without docker…
Debian 10 bare no GUI, no anything else… just basic bare command line system with whatever needed to run Cluster and NodeODM…
same story… crashes at Aligning submodels
[INFO] Aligning submodels…
/code/run.sh: line 5: 223 Segmentation fault (core dumped) python3 $RUNPATH/run.py “$@”
Any ideas?..

1 Like

Do you need the extra tiepoints from --min-num-features? That can be quite resource intensive…

It may look like it has plenty of memory, but it won’t actually be able to allocate more than it has, so you won’t necessarily see a spike before it OOM and quits. I’ve seen the same behavior here numerous times with 32GB RAM, especially in the point-cloud and model part of the pipeline.

Have you tested an area of your data independently without split/merge? Say the first 300 images in an area?

Yes. I Think I need those extra min-num-features… I started with default number, but it didn’t go through saying can’t find enough features to proceed further… some parts of the area just think homogeneous trees canopy. I read somewhere in the docs that increasing that number might help. I increased it to 10000 it didn’t, then 12000 and it did… the next stop I hit was Aligning submodels…

That actually makes sense if it tries to allocate enough memory first then quits when it sees there is not actually enough. I was suspecting that it crashes before I see the spike as there probably going to be no spike since it already knows it can’t process it further before even starting.

300 images without split-merge went through fine. I actually managed to process 800 something with similar settings on one machine with 8GB, but that was only one machine with 8GB RAM and 92% of that RAM available before I started processing.

I guess memory upgrade of the primary node where ClusterODM executes the alignment should sort it? What amount do you think I might need to process lets say 3000 images and get decent quality orthophotos and models?

1 Like

Thanks for the details! Helps a lot.

Yeah, it sounds like you do need the extra features, so you need to keep that. That will mean we need to scale down other things to keep your memory usage a bit more flat.

Okay, so the data can process fine, so that’s that concern taken care of!

Can you do smaller split/merge batches? Maybe 150/50 or so?

I’m not sure on memory usage… It’s highly variable and depends strongly upon processing parameters.

I can make 32GB OOM with about 170 images, so :rofl:

My question is always:
What is the max that platform can accommodate, and is it affordable for you?

Thanks for trying to help that also helps a lot so I am not wondering alone here :slight_smile:
While I have time waiting for more mighty machine to arrive, I will be keep playing with processing parameters. I’ve tried 100/75 split merge. Didn’t go through either. How far down I can go with it?
I have IBM System x3300 M4 server (2 x Xeon CPU) + 32RAM that I am sending here and can potentially get it upgraded with 384GB… I am not sure that I can afford to get it 384 RAM straight… but probably I can get enough for this particular project if 64 Gb will do…
I don’t need highly detailed models from those images at this point. Just decent quality orthophotos, DEM and DTM from where I can produce contours.

Couple of more questions.

  1. Can increasing swap file on the machine that aligning the submodels help or it just need pure physical RAM for it?
  2. Am I getting it right that the nodes can be any specs enough to process given split merge parameters, but the machine that stitching the submodels still need to be something mighty with as much RAM as possible?
  3. Graphic card with GPU. Is it much help with rendering high quality 3d models in terms of processing time and utilising it’s own onboard RAM for that? Anything from GeForce series for example.
1 Like

Sounds like your new server should be a monster!

  1. It should help. Linux + SWAP seems to handle things better than Windows + Pagefile (for WSL2/Docker, not sure on Windows native yet)
  2. Yeah, pretty much. The compositing of the pointclouds and imagery is going to be quite hefty, though you can use the --pc-tile option to enable tiling of the pointcloud (with some potential issues on the overlap areas)
  3. No, not yet. There is a GPU-Enabled Docker build which will lean on OpenCL for the feature matching/extraction, but I’m not sure it will make a huge difference at this point.
1 Like