Testing distributed cluster processing on Intel NUCs

Testing cluster processing of a 3,927 image dataset on different numbers of Intel NUCs (installed via POSM). Parameters adjusted: min-num-features: 9000, split: 500, split-overlap: 120

The log output has a timestamp when it starts extracting EXIF (after uploading and resizing) and after the merge stage (before postprocessing generation of base and overview tiles). This seems to be shown in the WebODM dashboard listing a longer duration than what is calculated with the timestamps. However, for the 4x POSM dataset the difference is over 10 hours. Additionally the WebODM dashboard duration for the 4x is longer than for the 2x.

Logs here:
1 x NUC8i7BEH1
2 x NUC8i7BEH1
4 x NUC6i7KYK
6 x NUC6i7KYK

  • What might be responsible for the outlier 10 hour difference and out of order processing time?
  • I will probably rerun the 1x and 2x tests on NUC6i7KYK to remove the variation due to NUC model. Although it’s my understanding that the two should be comparable when it comes to running ODM? The specs for the the NUC8i7BEH1 and the NUC6i7KYK. All have 32GB RAM installed.
  • Separately, would it be possible to make the products ready before postprocessing (i.e. the TIFF) available for download as soon as they are ready?

Cool benchmark! Really nice to see split-merge being timed.

  • The 4x run seems like an outlier and I would trust the log export time more than the WebODM dashboard time; the log export counts the time for processing (without including post processing), but I can’t imagine why post processing would take an astonishing 10 hours, unless this is a bug of some sort or the reconstruction came out larger than expected. Was there anything strange in the output (for example, an abnormally large DSM or orthophoto file size)? Or did the NUC hosting WebODM go asleep at any time during processing?
  • CPU impacts runtime, so there might be differences between the two models in terms of CPU speed.
  • In theory, yes, but this would complicate the processing logic quite a bit (and wouldn’t be too straightforward to implement either). Happy to merge a PR if anyone wants to implement it.

  • Looks like definitely better performance on the NUC8i7BEH1.
  • Reran the 4 box test and oddly enough it still had the 10 hour difference. The download assets dropdown for the task is missing the “Point Cloud (LAZ)” option. The other test runs have all options (GeoTIFF, MBTiles, Tiles, LAZ, All Assets). Checking the [INFO] logged at the start of 4x and the 6x test, the parameters all match. The 6x test includes all the hardware that ran the 4x test so if there was a problem with one of the boxes, I would think the problem would appear in both the 4x and 6x tests.

If I understand your analysis, I suspect what is happening is that one of the major bottlenecks in the merging step is the final orthophoto, which uses gdalwarp with -cblend 20 and cutlines to reduce the effect of differences between adjacent images. gdalwarp when used with cblend is extremely inefficient and tends to use only a minimal amount of CPU as well.

In short, with more splits, we get more use of gdalwarp in the merging of the final product. Initially, this initial cost is overwhelmed by the additional efficiency of the extra nodes, but above a certain threshold, that additional efficiency is mostly lost to waiting on the final merge step.

The split parameters and the image set are the same for all the tests. So shouldn’t the number of splits be the same regardless of the number of boxes?

Oh, that is weird. Ok, I don’t have a theory for that… .

Do you ever dive under the hood of docker to see what’s going on? e.g.:

> docker ps
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS                    NAMES
2518817537ce        opendronemap/odm       "bash"                   36 hours ago        Up 36 hours                                  zen_wright
1cdc7fadf688        opendronemap/nodeodm   "/usr/bin/nodejs /va…"   36 hours ago        Up 36 hours>3000/tcp   flamboyant_dhawan

> docker logs 2518817537ce | more
[INFO]    DTM is turned on, automatically turning on point cloud classification
[INFO]    Initializing OpenDroneMap app - Mon Sep 23 01:30:33  2019
[INFO]    ==============
[INFO]    build_overviews: False
[INFO]    camera_lens: auto ...

Or telnet in to the primary node and investigate the tasks:

> telnet localhost 8080
Connected to localhost.
Escape character is '^]'.
Welcome ::ffff: ClusterODM:1.3.4
HELP for help
QUIT to quit


You can then use TASK LIST and TASK OUTPUT to dive deeper into where things are at in the process through the processing.

Random, but I added a small docker section to the docs, since I’m always forgetting how do do anything with docker. Add to as you see fit: https://docs.opendronemap.org/tutorials.html#using-docker

If there’s any logs to share from these tests or others, happy to discern what I can. This is a curious problem, and I wonder if it’s general.