Tips for processing large datasets on Kubernetes

JES · October 31, 2023, 10:15am

Hi,

I realise there are lots of topics on large datasets already, but I haven’t found anything recent covering a similar problem.

I have a NodeODM instance running on Kubernetes in a cloud environment. The machine is pretty powerful (62 CPUs and ~800 GB of memory), but no “swap”. (As far as I’m aware, Kubernetes does not have any concept of “swap”, but please correct me if this is wrong and you’ve managed to configure “swap” for ODM on Kubernetes).

I am using pyodm to submit jobs to NodeODM. I’m dealing with data from aerial drones, both RGB and Multispectral. It’s working nicely so far, and I’ve been very impressed by the combination of pyodm + NodeODM. Thanks to everyone who has worked on this

However, I’m currently struggling with appropriate settings for large missions. For example, I currently have a dataset with 6200 RGB images, each of which is 24 megapixels and typically about 10 MB in size. My goal is to generate the best quality orthophoto possible; the DSM and 3D model are also interesting, but less of a priority right now).

My first attempt used the following options:

    default_options = {
        "dsm": True,
        "dtm": True,
        "cog": True,
        "orthophoto-compression": "LZW",
        "orthophoto-resolution": 0.1,  # cm/pixel. If set very small, output will be auto-limited by data to max sensible value (I think?)
        "dem-resolution": 0.1,  # cm/pixel. If set very small, output will be auto-limited by data to max sensible value (I think?)
        "max-concurrency": 60,
        "auto-boundary": True,
        "use-3dmesh": True,
        "fast-orthophoto": False,
        "feature-quality": "high",  # ultra | high | medium | low | lowest
        "pc-quality": "high",  # ultra | high | medium | low | lowest
    }

Over a period of about 24 hours, this progressed fairly steadily to about 73% complete, but then it stopped for 5 days with no further progress running a pdal command. It didn’t actually fail or produce an error - just seemed to grind to a halt. However, I’m hoping to find a way to process these missions faster than this, so I cancelled the task and tried some different options.

My second attempt used the same options, but with "split": 2000 added. This seems to be running OK (I can see that the submodels have been created correctly) and over a period of about 2 days it has reached 84% complete. However, it has now been stuck for another two days at the same point, with the last message saying [INFO] Adjusted bounds. Looking more closely at the NodeODM working directory, it looks as though the orthophoto actually completed processing about 2 days ago (there is a 20 GB geotiff named odm_orthophtoto.original.tif). Since then, it seems NodeODM has spent another 2 days slowly writing data to dsm.tif: according to the OS, this file is constantly being modified, but yesterday it was about 5.5 GB and 24 hours later it’s only 7.7 GB, so it’s being written quite slowly.

Does anyone have any suggestions for optimisation, please?

I am reluctant to use fast-orthophoto: True, because I have read in OpenDroneMap: The missing guide that this will produce a lower quality orthophoto. I am basically looking for a combination of options that will maximise processing speed, but without sacrificing orthophoto quality. For example, I am wondering if the process currently writing to dsm.tif is slow due to the size of the point cloud, in which case maybe the dem-decimation option might help?

From the docs, it sounds as though setting "dsm": False and "dtm": False should have no effect on orthophoto quality, but "use-3dmesh": False will make the orthophoto worse. Does that sound correct?

I will continue to explore various options, but since each run takes 2 to 5 days I’d like to narrow down a sensible number of parameter combinations first. Any tips for settings used when processing large datasets (especially in a Kubernetes environment) would be greatly appreciated.

Thank you!

Declan_Keogh · November 28, 2023, 10:48am

Hi Jes,
I am interested in finding out how you are getting on with the large dataset. I have been working with large datasets and have had some successes. The largest dataset I have managed is approx 5k images. Anything over this runs into an issue with the ceres-solver algorithm that ODM uses.
Did you manage to process the 6200 images?
I expect 800GB RAM may not be enough for this.

Maurice.Sobiera · November 29, 2023, 8:15pm

have you a diagram of ram and disk usage? does latency increases? what happens to the rest of the system? Maybe it makes sense to use bundles? "„For small datasets (< 1000 images) there’s not much difference. As the number of images increases, the cost of running more local BA operations starts to outweigh the cost of running more global BA. For very large datasets, turning on this option can reduce the total run-time. It can also increase the accuracy of the reconstruction since a global bundle adjustment is performed more frequently.“

This is from the missing drone guide. https://odmbook.com Page 141

JES · November 29, 2023, 8:46pm

Hi @Declan_Keogh, @Maurice.Sobiera,

My dataset did finish successfully in the end - it took another two days or so after I posted here (so about 6.5 days in total). This was running with "split": 2000 in my NodeODM options. The quality is OK and, as far as I can tell, all the expected outputs have been created correctly, although it never produced the usual report.pdf, which perhaps implies that something went wrong somewhere (?). The log looks OK, though.

I then tried it again using "dsm": False and "dtm": False, but this crashed repeatedly, which seems strange. I also tried with fast-orthophoto: True and this completed quickly (less than 24 hours), but the quality is so poor that it’s not really usable.

So, overall, I’m impressed that NodeODM managed mosaic more than 6k high resolution images On the other hand, my options for optimisation may be limited, because turning off “optional” components seems to cause the entire workflow to fail.

I’d really like to figure out a way to process these datasets in 1 to 2 days, rather than 6 to 7. The next thing on my list is to try NodeODM with GPU acceleration to see whether that makes a noticeable difference, although that will probably have to wait until next year.

@Maurice.Sobiera are “bundles” the same as split? I think I must have a different version of the Missing Drone Guide to you, because page 87 for me is something different and I don’t get any matches searching the text for “bundles”.

Thanks for the replies!

Maurice.Sobiera · November 29, 2023, 10:16pm

you are right, it was page 141, split is used when you want to create smaller submodels, if you have a lack of ram. I guess you could give bundles an additional try. I did not tried split and bundles at the same time. Bundles is for adjusting positions and speed of this step in the process. use-hybrid-bundle-adjustment — OpenDroneMap 3.1.9 documentation