Processing stalled during MVS stage

Hi guys, so it’s happened again - processing has stalled during the MVS stage. It’s been stuck on “Fused depth-maps 2744” for almost a day:
Fused depth-maps 2742 (84.16%, 11h5m10s, ETA 2h5m)…
Fused depth-maps 2743 (84.19%, 11h5m23s, ETA 2h4m)…
Fused depth-maps 2744 (84.22%, 11h5m26s, ETA 2h4m)…

I can see the process “DensifyPointCloud” is currently consuming 453GB RAM and it’s doing disk reads but says N/A for disk writes, and the priority level is “Very Low”. 508GB of swap file space is being used.

The job settings are:
auto-boundary: true, dsm: true, end-with: openmvs, feature-quality: ultra, matcher-neighbors: 40, mesh-octree-depth: 12, mesh-size: 300000, min-num-features: 64000, orthophoto-resolution: 1, pc-geometric: true, pc-quality: ultra, rerun-from: openmvs, resize-to: -1, verbose: true

The server specs:
Dual Xeon 20-core / 40 thread CPUs
512GB RAM
1.5TB swap
1.8TB hard drive space dedicated for Docker & WebODM
12GB Nvidia Tesla K80 GPU
Ubuntu 20.4

The log file doesn’t indicate anything has gone wrong.
Any suggestions on how to look under the bonnet to see what’s going on?

1 Like

Well this is bizarre:
After the MVS stage was stuck at “Fused depth-maps 2744 (84.22%, 11h5m26s, ETA 2h4m)” for about 30 hours, using 0% CPU and 0% GPU, this evening it seems to have started to continue processing again!?!
A few minutes ago the latest NodeODM log message was “Fused depth-maps 3037 (93.22%, 2d22m37s, ETA 3h31m)…” and just now it is “Fused depth-maps 3186 (97.79%, 2d50m6s, ETA 1h6m)…”

I notice some bizarre behaviour in the logs, specifically that the ETA was decreasing fairly constantly from depth map 2744 to 2868 where it reached ETA 1h33m, but on depth map 2869 the ETA suddenly increased to 6h27m then started steadily decreasing for each subsequent depth map.

Less than 1% of the available CPU resources are being used, and the GPU is at 0% so it appears the NodeODM software is not doing anything the vast majority of the time.

Can anyone explain what’s happening here?

1 Like

Yes. I would say based on this:

that you managed to run out of all physical RAM and bled heavily into SWAP. If this happens at the texturing stage, the length of time things need to be in SWAP is short, so the performance impact is small. If it happens elsewhere in the toolchain, in this case in OpenMVS, it would not surprise me if this was really slow with no indication as to why.

For some processes, using physical RAM is critical. The OS usually makes good decisions about what needs fast read / write from RAM, and uses SWAP only when it makes sense or is necessary. However, if you run out of physical RAM and the OS has no choice but to crash or read/write from SWAP, then if it is a long running process with lots of data moving around (like the OpenMVS stage), you could see a well running process come almost to a halt. It will eventually complete, but moving across the bus to and from hard disk (even SSD) is orders of magnitude slower than RAM.

In short, for datasets of this size on your hardware, you either need more memory, less intense settings for pc-quality, or a whole lot of patience.

I’m glad it finished. We see this come up from time-to-time on the forum, and usually folks aren’t so patient as to let it finish. Doing so lets us know that this is very likely an under-allocation of memory that SWAP saved from a crash, but couldn’t operate fast enough to be performant.

3 Likes

It’s still sitting on 97.79% this morning so it sounds like I’ll have to wait another day or two for it to finish the MVS stage.
(I’m running one stage at a time to measure how long each takes with this dataset and quality settings)

3 Likes

Am I right in that you’re swapping out to HDD? So we can assume something like 80MB/s average read/write for a very good HDD, compared to bog-standard DDR4 coming in at something like 19200MB/s, so a whopping 0.41% of the speed of doing the work in RAM.

(this is napkin math, but yeah)

3 Likes

Yes swap is on magnetic HDD.

The settings I’ve used I took from another post where it was said they would produce equivalent quality ortho & 3D model to Pix4D.
The dataset I’m working with comprises 3265 photos @ 8 Meg resolution. It took another guy with Pix4D 60 hours to process them and generate a 1cm GSD ortho & 3D model on a Windows PC with i7-6700 CPU, 32GB RAM and a 2GB GTX960 GPU.
I thought with a server with dual Xeon CPUs, 512GB RAM and 12GB Tesla K80 GPU I’d be able to blow him out of the water, but it’s currently looking like my total processing time will be an order of magnitude longer than Pix4D.

2 Likes

The products are different and just work differently, unfortunately. Also, there are non-trivial differences in the quality presets of Pix4D compared to OpenDroneMap, so getting directly comparable settings can be a challenge.

Your settings are equivalent to the Pix4DMapper 60hr job?

1 Like

Those are settings Steve M advised were equivalent in another discussion thread I read here. I need to complete the processing with WebODM and those settings in order to be able to compare the results side-by-side.

2 Likes

The openMVS stage eventually completed - took almost 92 hours.

2 Likes

Yes, lower settings or more RAM are your options for quicker processing.

3 Likes

Would putting the swap file on a SSD have any substantial impact?

2 Likes

Assuming a SATA3/6Gbps SSD, so a really really good one is 540MBps sustained, you’re looking at about 2.8% the speed of doing it in RAM. So, significantly faster than on a really good rotational, but still going to be very slow compared to in-RAM.

2 Likes

I’m maxed out with 512GB RAM on this server so what about this idea - would it be faster overall if I figured out what split quantity would use less than 512GB RAM?

2 Likes

Yes that should help in almost every phase, or you could dial back some parameters like Stephen stated.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.