Impact of options on resource requirements

This is a long shot, but I thought I’d ask before I started potentially recreating the wheel.

My goal is to predict the capacity required to process a particular set of images. The current approach used by ClusterODM is based on number-of-images. This is awesome because it’s super fast, but it doesn’t always work. There are two other factors which I think should be taken into consideration as well: the mean image size and the options selected. The purpose of this thread is to explore the latter on a stage-by-stage level.

As a trivial example, the load-dataset stage in ./stages/dataset.py is influenced by camera-lens — if it is not set to auto, each photo must have its projections overridden. This will require more time, possibly more CPU and RAM, and maybe even more disk if the originals are saved. How much would depend on the number and size of images, as mentioned above, but a rule-of-thumb could be calculated.

So! In theory, if options were grouped under the stages they influence and if there was some way to calculate the time/CPU/RAM/disk/network/etc for each individual stage, then the example datasets could be used to calculate these rules-of-thumb. With an appropriate fudge factor, it should be possible (again in theory) to identify the stage with the largest requirements for each resource and thus describe the minimum system requirements to process that dataset — possibly including a reasonable time estimate. As a bonus point, with the right test harnesses, performance regressions could be identified before releases, if desired.

Has any work been done in this direction? Am I nuts for thinking this is worth pursuing?

4 Likes

Absolutely not nuts. This is a golden egg. I can’t think of anyone that wouldn’t love to have some semi-reasonable estimate of Task processing requirements vis-a-vis CPU, GPU, RAM, HDD, and Time.

As you’ve noted, quantifying these is not incredibly straight-forward as some options can really skew any of the stages and other factors.

I have a very limited and basic qualitative rating for each parameter being put into the docs (example):
https://docs.opendronemap.org/arguments/auto-boundary/

And I need to further implement that for everything, and then an actual quantitative one? Phew, that’d be amazing.

2 Likes

Oh wow, what you’re doing with the parameters is almost exactly what I was thinking. Very nice!

The time estimates will be relative, so there’s still huge benefits to folks posting benchmarks for particular systems, but in the general case (barring GPUs or local optimizations) I would expect the RAM and disk requirements at least to be consistent across architectures.

The biggest blocker right now as far as I can tell is the ability to calculate resource utilization for each stage. The stone-knives-and-bear-skins approach that comes to mind is to set up a job with known parameters and run one stage at a time while running docker stats to collect data, but there’s probably a smarter way to do that…

2 Likes

What would you say to starting with a back-of-the-napkin qualitative assessment for each option before we move onto trying to fully profile each one quantitatively? It’d definitely be nice to workshop this with someone.

We have a wealth of data in the ODM benchmarking repository, but the question becomes, like you noted, how to profile each part’s resource usage(s).

I mocked up something really basic/silly for OATS (the OpenDroneMap Automated Test Suite) that does a simple timer and checks RAM/SWAP before/after each task, but I don’t know how to profile “peak” resource usage throughout.

1 Like

Parse the logs… there are time stamps associated with each stage.

Bonus to this approach, you can take submissions from others along with their systems specs and start to build a larger picture.

2 Likes

Time is easy, RAM is a little harder, CPU even harder than that :frowning:

2 Likes

Perhaps not so easy when a task fails after >100 hours! :disappointed:

1 Like

Wow, so glad this is being discussed/worked on. I was just thinking of starting a spreadsheet to do this (although not in nearly as much detail, I’m sure). For starters, I want to know which tasks are CPU-heavy, or memory-heavy, and what parameters feed into each task.

1 Like

For finding the ceiling, yes, but I’d say CPU architecture is the hard one. RAM and number of CPUs is easy: this is what virtualization is good at. If someone writes a parser, I’ll provide different size VMs on a server with lots of RAM and a fair share of cores (768GB with 48 cores/96 threads total) and time for processing.

4 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.