Hardware Recommendations – CPU cores, graphics card, CUDA, NVIDIA, memory, RAM, storage

After some time using WebODM and researching many subjects in the forum, there are many questions about which hardware to use and how to get the best results in a reasonable time.

So I kept statistics for the projects I rendered. The rendering machine also was upgraded in between. That gave me some understanding which component is affecting the rendering process in which way.

Since there are often questions in the forum about which hardware is needed or best, I decided to put my findings online. This is a document I keep for my own reference to quantify if an upgrade is worth the effort / money or rather not. Also to stay down to earth when the next upgrade-fever takes hold of me :slight_smile:
So let’s see if more cores, more RAM, better graphics is the way to go.

Monitoring WebODM

For monitoring WebODM a little bash script is used that collects every 5-20 seconds data about CPU, GPU usage and how much VRAM or swap is used.

The machine used for rendering is a headless Ubuntu 20.04.5 LTS installation. There is no screen attached and no other processes were running in the background.

All made observations are repeatable and match the times given in the console.txt file of the process. These stages repeat with any given project. For testing purposes the standard profiles were used, with some exceptions that needed a bit of tweaking to run successfully.

After a process is complete the monitoring file and the console txt are put in a folder of a webserver and displayed with PHP and Javascript.

The light green bar in the beginning is the opensfm stage, always starting with the feature extraction. In this image the feature extraction is the first red curve.

The cloud densification happens during the openmvs stage marked in orange. Again always starting with a rise in GPU usage (the second, longer red curve).

Towards the end happen the odm stages. That is no official title, but that is where a lot of PDAL and GDAL scripts are running.

The light blue bars in the background show the RAM utilization with swap in purple and the GPU memory (VRAM) in orange. But swap and VRAM are hardly used in the sample.

This is an image of the same process, but without GPU. The red circles mark the feature extraction and the second one the point cloud densification. The major advantage of a GPU during the point cloud densification becomes very obvious (2,5 times faster than CPU).

If not mentioned the images are 4000x3000 pixel (12MP) and were taken with a DJI Phantom 3 Pro . The dataset with the 20MP was flown with DJI Phantom 4 Pro as far as I know.

A useful way of structuring this document is probably by hardware component.
Following a list of all hardware component and how they affect WebODM.

Hardware

CPU / cores

Generally the recommendation is, that more cores are faster but more cores would also increase memory usage.

The comparison here includes the amount of time a 4 or 8 core CPU needed and also the maximum memory (physical and virtual combined) that was used during the process.

A core in this comparison means a physical core, not threads. Each core here computes 2 threads, which means concurrency / threads is 8 for 4 cores and 16 for 8 cores.

feature/pc quality 4 cores / memory 8 cores / memory % diff Note
242 images ultra / ultra 66min / 38.9 Gb 62min / 41.9 Gb +6% with GPU
473 images high / medium 135min / 24.3 Gb 109min / 27.7 Gb +19% CPU only
473 images high / medium 95min / 22.3 Gb 81min / 25.9 Gb +14% with GPU
1413 images high / high 531min / 97.9 GB 467min / 101 Gb +12% with GPU

By doubling the core count from 4 to 8 cores you can save between 6 and 19% of time. Depending on the project size (amount images) and the settings (feature and pointcloud quality).

Interestingly the ultra settings showed the least difference between the core count.

The amount of memory used during a process increases between 3 and 16% when doubling the cores. There seems to be no direct co-relation between core count and memory usage. Meaning that memory use increases with more cores, but not proportionally.

I can not say what 12, 16 or more cores have for an effect, but from this little experiment it looks like doubeling the cores gives only 6-19% more speed.

Memory

First an overview of how much memory a given process uses. Here again memory means the combined use of physical and virtual memory.

feature/pc quality max memory used
242 images ultra / ultra 38 Gbyte
473 images high / medium 24 Gbyte
473 images ultra / ultra 125 Gbyte
1002 images high / medium 42 Gbyte
1413 images high / high 98 Gbyte
2819 images ultra / medium 152 Gbyte

It’s apparent that the more images, the more memory will be used.
For the 473 images a sample with feature- and pointcloud quality ultra was added, which shows the substantially higher demand of changing these settings.

A second question off course is: how much physical memory is needed?

Let’s have a look at the same processes run with different amounts of physical memory. The virtual memory (also called pagefile or swap) remained the same.

80Gbyte 64Gbyte 16Gbyte % diff Total mem used
473 images (ultra/ultra, gpu) 316 minutes 419 minutes +24% 113 Gbyte
1002 images (high/medium) - 150 min 162 min +8% 42-47 Gbyte

Having more physical memory can decrease processing time significantly, but that depends very much on the overall used memory for that process. The common rule of sizing virtual memory to be 2x the size of physical memory seems reasonable.
Having 5 times the memory shortened the process by 24%, 4 times more memory shortened it by 8%.

Another very interesting aspect is the type of storage on which to place the swap- / pagefile.

Physic. memory Swap storage device Swap in use Time needed %
80 Gbyte DDR4 NVMe M.2 SSD 34 Gbyte 316 minutes 100%
16 Gbyte DDR4 NVMe M.2 SSD 97 Gbyte 491 minutes 156%
16 Gbyte DDR4 SATA M.2 SSD 98 Gbyte 738 minutes 234%
16 Gbyte DDR4 3,5” SATA HDD canceled >1000%

(473 pics “land-after-rain” series)

Looking at these numbers shows that modern NVMe drives offer superior performance to any other (common) storage devices. All while keeping the computer responsive even during intense use, since NVMe are optimized for multiple accesses at the same time.
SATA SSDs are still usable but evidently HDDs should not be used for swapfile usage. The system becomes non responsive during processing and after 21 hours into the process and still doing the matching, this test was canceled.

Given the price point of modern NVMe PCIe 4.0 disks, it is highly recommended to use such a device for holding your swap / pagefile. Even if your PC offers only PCIe 3.0 or earlier, the later generations of NVMe drives offer superior controllers, easily handling multiple accesses and moving more data.

Installing more memory has strong effects on the processing time, as much as placing the swap- / pagefile on a modern NVMe drive.

Graphics / CUDA supported rendering

First of all, when speaking about GPU support for WebODM, at this moment (Nov 2022) this only concerns graphics cards that support CUDA. Means NVIDIA Geforce and Quadro cards.
To my knowledge there are now other producers supporting CUDA.

Another point to consider is, that only feature extraction (opensfm stage) and point cloud densification (openmvs stage) are using the GPU.

Feature Extraction

For feature extraction using GPU support you can expect the following improvements.

feature quality CPU only with GPU x faster
242 images ultra 6:10min 2:28min 2,5x
5% of total 4% of total -24% features
473 images high 2:46min 0:43min 3,6x
2,5% of total 1% of total +1,7% features
1002 images high 5:28min 2:56min 1,9x
3,7% 2,5% +1,2% features
1413 images high 16:54min 16:24min 1x
3,1% of total 3,5% -3,2% features
2819 images ultra 59:36min 19:52min 2,9x
6,8% 2,5% -0,5% features

of t otal in the table above means “percent of Total Process Time”. To put the overall efficiency in perspective.

It becomes obvious that using the GPU for feature extraction is definitely faster, in most cases and in some cases even by a factor of nearly 3 times. But there are two caveats to this:

  1. feature extraction on the GPU consumes a lot of memory. The size of the image is determining the used memory here and therefore the feature-quality setting will decide if your GPU is capable of doing the feature extraction or not.
  2. feature extraction only contributes 1-7% to the overall process time. So using your GPU for feature extraction in the best case will save you 7% of processing time. That is not a whole lot.

The quality of the feature extraction does not seem to make a real difference between CPU or GPU. Visual inspection of the resulting Orthomosaics does not show a quantifiable difference and the number of extracted features is within margin of error. Running the exact same process twice can yield higher variance between extracted features, than is the difference between CPU and GPU feature extraction.
Outliers (like the -24% mentioned above) confirm the rule :slight_smile:

About memory usage of GPU during feature extraction (fc):

Image Size VRAM @ fc high VRAM @ fc ultra
4000x3000, 12MP 1,5 Gbyte VRAM 4,7 Gbyte VRAM
5472x3648, 20MP 2 Gbyte VRAM >6 Gbyte VRAM

A 12 Megapixel (MP) image on setting ultra consumes around 4,7 Gbyte of Video Memory (VRAM). The same image on feature-quality high will only consume 1,5 Gbyte of VRAM.

A 20MP image on ultra was too big for the graphics card used during testing, but on high setting consumed 2 Gbyte of VRAM

That means you will need more VRAM the bigger your images and processing demands become.

Bottom line of using the GPU for feature extraction is saving up to 3% of total process time.

Point cloud densification

Compared to feature extraction the point cloud densification offers huge gains when using a GPU:

PC quality CPU only with GPU x faster
223 images (20MP) ultra 129:30min 14:51min 9x
33% of total 7% of total -46% total
473 images ultra 127:21min 10:51min 12x
26%of total 3,5% of total -36% total
1002 images medium 15:03min 1:32min 10x
10% of total 1,5% of total -52% total
1413 images high 80:15min 7:30min 11x
15% of total 2% of total -14% total
2819 images medium 19:59min 3:26min 6x
2,3% of total 1% of total -4% total

This part of the WebODM pipeline benefits greatly from GPU usage by being 10 times faster. Especially when increasing point cloud quality the overall process time reduces by as much as 50%.

There is almost no VRAM used during this part. Any relatively recent card with >1Gbyte of VRAM can render this part.

The possible time savings for feature extraction are rather slim, but with point cloud densification the overall process is massively affected. In some settings even halfing the time a process takes.

The gained speed with a GPU even sometimes exceeds the 10x speed up promised in this announcement:

It also becomes obvious that there can not be much further acceleration from a more powerful graphics card. The point cloud densification is reduced from 33% of total process time, to a mere 7% of the time the whole process takes. That’s a massive improvement and a more powerful card could only shorten the whole process by a mere 7%.

GPU memory usage

The only process using the VRAM (Video memory) is the feature-extraction as mentioned above. Overall gains are small here, so VRAM does not majorly influence the length of a process.

When using a GPU for point cloud densification, which yields very feelable time savings compared to feature extraction, the VRAM is not used at all. So it would not make any difference if you have 24 Gbyte or just 4 Gbyte of VRAM available.

Indicating that if you want to shorten your processing times, a CUDA card with 4 Gbyte VRAM will shorten your process nearly as much, as having a card with 24 Gbyte.
Especially adding costs into the equation you would pay 200% or more for more VRAM only to save at the most 7% processing time.

Put another way: 150 USD for something like a Geforce GTX 1660 6Gb VRAM will shorten your process by up to 50% and 300 USD and more starting with a Geforce RTX 3060 12Gb will shorten it by up to 54%. That is 4% effect for 200% money.

The cheaper card will give you most bang for your buck :smiley:

Storage (project files, Docker container etc.)

There seems to be no significant impact on processing speed between SSD or HDD as storage media.

Expect for projects up to 1000 images that 100 Gbyte will be occupied during processing, with bigger projects (2000-3000 images) can easily reach up to 250-300 Gbyte.
If your storage gets tight use “optimize storage” function in WebODM.

The only exception is the system swap- / pagefile that should ideally be on a modern NVMe drive as mentioned under memory.

Recommendations for starters

If you are starting out withWebODM a computer with 32Gbyte of RAM, a decently recent CPU (>4 cores) and a 1Tb NVMe drive (or bigger) should enable you to render all home, semi professional projects. Even the occassional bigger job (2000+ images) can be tackled, albeit that it will take a bit longer (>1 day).

A GPU is not necessary, but an existing NVIDIA card can be put to good use.

Recommendations for ambitioned users / pro’s

When performing many computations with 500-3000 images, 64 Gigabyte memory and an 8 core CPU is the start. Providing already reasonable processing times for projects up to 3000 images.
Make sure to use a latest generation NVMe for the swap- / pagefile.

Since many projects will be computed, the project folders / docker files can be stored on a >3TB HDD. Offering the possibility of keeping a large number of projects at hand.

A GPU will be helpful, especially to speed up projects that need ultra quality for their point clouds. As of writing this article (Nov 2022) and for exclusive use with WebODM a Geforce GTX 1660 SUPER with 6 Gbyte of VRAM offers most bang for the buck at around 150 USD.
It will give the biggest increases in speed at a reasonable cost and since the Geforce GTX 16xx series is quite recent, it will probably be supported for quite some time more.

For a sincere step up in GPU computing capacity a modell with 12Gbyte should be chosen. Less memory is not future proof.
Such a graphics card will increase speed by a single digit percent point, but cost double or more than the previously mentioned 1660. Solely for WebODM a powerful graphics card will not provide feelable time savings. Quality differences are not to be expected.
Yet a powerful GPU (best if used for other things also) can be put to very good use in WebODM.

About the test

For this document more than 40 processes were run and monitored. A handful of processes (~5) were run more then once to see the deviations between individual runs.

A certain “magic” is still in the process, resulting in differently looking orthomosaics, found features, processing times etc every time a project is run.
Yet the differences for identical processes run several times lies at around +/- 5%.

The following components and software was used during testing:

CPU: Ryzen 7 5700G (up to 8 cores / 16 threads)
RAM: Up to 4x8Gbyte, 2x32Gbyte DDR4 memory
Storage: WD Black 1TB SN770 (M.2 NVMe PCIe 4.0), WD Blue 1TB SSD (M.2 SATA), Hitachi 3TB HDD (3,5” SATA, 7200RPM)
Motherboard: Gigabyte B450
OS: Ubuntu 20.4.5 LTS with Kernel 5.15, Docker 20.10.18
WebODM: Version 1.9.15 with ODM 2.9.1

When a lower core count is mentioned, it means some cores are deactivated in the UEFI.

Different amounts of RAM are achieved by mixing different modules (16GB=2x8GB, 32GB=4x8GB, 64GB=2x32GB, 80GB=2x32GB+2x8GB).

If somebody cares, I am sharing the tests and the script I made here:

http://crearconamor.com/webodm-graph/

This is a webserver of a friend and I can use it for now. You can select different datasets, zoom, scroll and get important infos about it. Best is to set the hook before “show load”. The other setting did not show the CPU usage during point cloud densification since it was a process with high nice -ness (linux’er term) :wink:

Please be aware, that the hole script is quick and dirty, not very ressource efficient the way it loops through the console file and usage CSVs.

The original console file and usage CSVs can also be downloaded on the datasets webpage.

To mention it: the interval of the usage CSV is mostly between 3-5 times per minute. I tried higher resolutions and they almost stall even powerful computers. Reducing to 3-5 samples per minute does not change the min-max numbers but allows for more fluid use of the PHP script.

If anything is off, please let me know. I am happy to edit / add it.
If you have an interesting scenario / comparison that I missed, let me know.
Maybe I can run it and add it to the datasets. For now the system is set up and runs smoothly.

!!! Disclaimer !!!

This document serves the sole purpose to inform ONE individual (me) about the hardware requirements of WebODM. With the intention to guide future upgrades to effectively affect the process. Since I like keeping nice documents and want to share something with the communtiy, I decided to plublish this document.

I am no developer, just an enthusiastic WebODM user.
The tests done are far from exhaustive. It all started with the wish to see if WebODM is still running or got stuck and expanded into a little database, enabling me to compare between them.

You will decide what works for you or not!
I am computing 10-12 projects per year with 250-3000 images.

If you find something that is grossly off, please let me know.
I am happy to correct the document :slight_smile:

17 Likes

Thats really great! I will read it in depth a little later. My machine has 64GB physical / 64GB swap on NVMe, I constantly run out of RAM for ultra quality features even with > 500 images. Having said that its a Ryzen CPU with 12 real cores (24 virtualized). I’m now thinking turning down parallelisation will help a lot with little cost to overall time to run the jobs :thinking:.

Thanks for writing this up, it will really help (I think) me to use my machinery more effectively.

5 Likes

Excellent document @shiva. Thank you

4 Likes

@shiva Excellent explanation, thank you!

I can fill in a couple of gaps in your data based on my testing:
20MPx images processed with feature quality High consumes 2.1GB of Video RAM
20MPx images processed with feature quality Ultra consumes 7.3GB of Video RAM

5 Likes

@Johnny5 Thank you for the info, I will integrate it when I update the document with a few more datapoints I collected.

Funny, somewhere in the forum somebody mentioned that his 12Gb VRAM were not enough for 20 MP.
But 7.3Gbyte sounds actually reasonable.

I am anyway curious when / if at some point the shared memory (combined VRAM + RAM) will be available for CUDA processing. That would ease the memory requirements on the graphics card a lot.

3 Likes

I think that was Andreas = APOS80.

1 Like

Yes, that could be. I think he mentioned that he has a NVIDIA Geforce 3060 with 12Gbyte.
Curious why? Maybe @apos80 likes to chime in?
I think he also tried higher resolutions from a DSLR.

I was for quite while trying to figure out if I should also get myself a card with more VRAM, but even if 12Gbyte would be enough for feature-extraction on ultra quality, the heavy lifting of the GPU happens during the pointcloud densification and as I now saw also during the pointcloud geometric process.
I do not have exact numbers, mainly because there are no timestamps in between the individual point cloud densification phases. But looking at the graphs the GPU does some extra work when I activate --pc-geometric. Which is off course very handy that the GPU can do this part of the process, since ODM 3.0 has --pc-geometric activated by default.
But it also underlines my (kind of) conclusion from this topic, that a GPU is worthwile, but even one from the NVIDIA Geforce 10xx or 16xx series will provide the major bang for the buck.
Yet I always liked ATI and now Radeon cards, also because they offer much better cost/performance ratios. It’s a real pity they can’t handle CUDA and that OpenCL (with ROCm etc.) is not nearly as established/wide spread.

Anyway, I have a few more numbers that I hope to soon add to the topic above.
So I am happy that this topic stays alive and I will be able to edit my post :smirk:

4 Likes

I did not find much time to finish my benchmarking “studies” but here the latest additions. Since I can not edit my first post, I am putting it as an addendum:

Overall gains using GPU

Project Feature / PC quality CPU only (min) with GPU (min) % faster
242 images ultra / ultra 124 62 100%
473 images high / medium 109 81 35%
473 images ultra / ultra 496 316 57%
1002 images high / medium 150 117 28%
1413 images high / high 543 467 16%
2819 images ultra / medium 870 828 5%

The bigger the dataset the less time savings can be expected if utilizing GPU support. Though the overall time savings are well worth using a GPU.

Like in the first post here, the images are all 12 megapixel and it is the same hardware and software as described above.

pc-geometric / Geometric Consistend Estimated Depth Maps

The pc-geometric option helps create denser and (as it says) geometrically more consistent point clouds / depth maps. This setting increases overall processing time, but is mostly absolutely worth it. That is also why it is activated by default from ODM 3.x onwards.

Luckily the GPU can take care of this process, leading to the following advantages over CPU for this part of the process:

CPU only with GPU x faster
223 images (20MP) 29:46min 4:24min 6,5x
242 images 10:26min 2:01min 5,2x
1413 images 79:28min 49:46min 1,6x

It becomes apparent that the GPU overall processes the data faster, yet how much faster seems to depend on various factors like amount of images and point cloud quality.

GPU Memory and feature-extraction

Since the subject comes regularly up in the forum, I would like to reiterate here:
Even if WebODM says that the images do not fit into the VRAM, this only concerns the feature-extraction process. Feature-extraction on GPU is faster, but it seems to find less features and the overall time savings are marginal.
If your GPU does not have enough VRAM, do not mind that too much. The actual benefit of using a GPU is during point cloud creation. And that process does not have any VRAM limitations.

Bottomline:
If your GPU does not have enough VRAM to do feature-extraction with the image size you have, it is no reason to upgrade your GPU or fall back to CPU processing entirely.
The GPU will be used during point cloud creation. And that is where the GPU offers real strength over a CPU.
Which means that (to my testing) the amount of VRAM is no significant factor in the speed or quality of a process. I want to state this, since graphics cards with memory >6Gbyte are pretty expensive.
And even cards of older generations like the Geforce 10xx series with 4 Gbyte VRAM or less should still provide a benefit when used for processing.

Comparison ODM 2.9.x to 3.0.x

There seems to be no difference in processing time when using version 3.0.x over version 2.9.x.
This holds true when using the same settings, which means for version 2.9.x to make sure to activate the pc-geometric option.

While there is no advantage in processing speed, the actual quality of the result is visibly better (even with pc-geometric enabled for both datasets).

4 Likes

For “with GPU” was the GPU used for both feature extraction and point cloud densification for all tests, including the ultra/ultra ones? With my 4GB VRAM when I use ultra feature extraction (M2P images), it reverts to CPU, which as you explain above is not really a disadvantage. The real gains are with point cloud densification.

However, there is an occasional problem that prevents me from using GPU for point cloud densification. This appears when the race condition error occurs when GPU feature extraction is used, the only way around it is to use “no_gpu”, but this stops GPU use later in the task on when the big time savings are to be had.

One solution might be to have “no_GPU” to only apply to feature extraction.

1 Like

Yes, GPU was used for both feature-extraction and pc-densification also on ultra/ultra settings, EXCEPT for the 20MP images. With the 20MP images the available VRAM was not enough.
But for the benchmarking I only used one dataset with 20MP and when ever I quoted it, it’s marked with “20MP”.

I have read here in the forum about that, but curiously that never occurred for me.

True. That would perhaps also help solve some of the confusion on the forum here, where people mention that their GPU is not used, when in fact it is only not used for feature-extraction.
As a “dirty” workaround:
When looking closer into VRAM use, I used some DSLR (actually mirrorless) camera images with high megapixel for testing. Then step by step down sampling them by 10% and seeing when they would fit into the 6Gbyte of VRAM that I have available.
And already when using an image resolution of 4100x3075 the images did not fit anymore into VRAM. Since you have 20MP available just use a setting for feature-extraction that does not fit the VRAM :stuck_out_tongue_closed_eyes:

Which brings up an interesting point:
How much more time does feature-extraction need on ultra compared to high?

So I just went through the benchmarks that I made and it looks like not much. Especially looking at the overall process time (as I so often point out). Depending on project size it falls in a range of 5-15min difference between feature-extraction high and ultra with overall project times ranging from 150-1500min.
That is not a lot of difference overall.
When the CPU does it though, the memory usage roughly doubles (from 20Gb to 40Gb), but feature-extraction never consumes nearly as much as depth-map fusion (240Gb and more).
That would make me state: I will use feature-extraction on ultra as standard from now on.
The increase in quality and robustness of the reconstructions on feature-extraction ultra are certainly worth the handful minutes of more processing time.

3 Likes

12gb isn’t enuf for matching, that’s what I said.
For point cloud densification it’s no problem, there’s room for larger images.

2 Likes

That does work for small datasets, but when I want to process 5000 images, you can appreciate that I might want to use high rather than ultra feature extraction :wink:

3 Likes

Many thanks Shiva, for this wealth of useful information

Laurent

4 Likes

Hello,

I made several comparison with 167 photos with different memory and different swap.

I downloaded all the resources in “txt” files.

I don’t see the processing data, i.e. number of allocated memory, CPUs, Swap in detail.

where to find this is information please this is to share my mac info ?

thank you

Laurent

2 Likes

Hello Laurent,
the hardware information is not in the console.txt
Since I am running WebODM on a linux machine I am using a bash script that collects every few seconds the stats of the machine (memory usage, CPU load, GPU load etc.)
If you want to know how your hardware is used, you will need to use an external tool to collect that data. I see @Gordon use some monitoring software on his Windows machine, which even puts out nice graphs. But I am more familiar with Linux and the terminal, yet I am sure you will find some software for Mac that can output that data for you.

2 Likes

Hello Shiva,

Thank you for your answer, I will find it, no worries…

Have a good day

Laurent

3 Likes

Thanks for providing your stats; This is very interesting!

3 Likes

Excellent work testing, thanks for all the information and insights, this will help me upgrade/tune my computers for faster results.

2 Likes

One thing I think would be very beneficial to the users, as well as the developers, is to have some sort of standard benchmarking system that could go in a database.

We already have part of it, and that is sample image/data sets: Datasets - OpenDroneMap

What I think would be good to ADD to that is to have a program/script for each OS that will run various datasets from above and benchmark the processing, along with collecting the data on hardware that was used and various settings used. Kind of like a CPU-Z output which tells OS/version, cpu model/speed, memory size/speed, drive size/type used, graphics card, etc.

This is a bit beyond what I’d be able to do, but just throwing the idea out there. It would give us a wealth of information of what will/won’t work for basically all scenarios/platforms/hardware and approx how long it would take us to do various sized projects and at different settings (high quality vs ultra, different PC max points, resizing images, etc). It would help people determine what hardware they should buy instead of potentially waste money, and this kind of computing power can add up. I also believe there are a lot of people, myself included, who would run these benchmarks to collect the data needed.

3 Likes

We have OATS for automated testing:

1 Like