This is a long shot, but I thought I’d ask before I started potentially recreating the wheel.
My goal is to predict the capacity required to process a particular set of images. The current approach used by ClusterODM is based on number-of-images. This is awesome because it’s super fast, but it doesn’t always work. There are two other factors which I think should be taken into consideration as well: the mean image size and the options selected. The purpose of this thread is to explore the latter on a stage-by-stage level.
As a trivial example, the load-dataset stage in ./stages/dataset.py
is influenced by camera-lens
— if it is not set to auto
, each photo must have its projections overridden. This will require more time, possibly more CPU and RAM, and maybe even more disk if the originals are saved. How much would depend on the number and size of images, as mentioned above, but a rule-of-thumb could be calculated.
So! In theory, if options were grouped under the stages they influence and if there was some way to calculate the time/CPU/RAM/disk/network/etc for each individual stage, then the example datasets could be used to calculate these rules-of-thumb. With an appropriate fudge factor, it should be possible (again in theory) to identify the stage with the largest requirements for each resource and thus describe the minimum system requirements to process that dataset — possibly including a reasonable time estimate. As a bonus point, with the right test harnesses, performance regressions could be identified before releases, if desired.
Has any work been done in this direction? Am I nuts for thinking this is worth pursuing?