After some time using WebODM and researching many subjects in the forum, there are many questions about which hardware to use and how to get the best results in a reasonable time.
So I kept statistics for the projects I rendered. The rendering machine also was upgraded in between. That gave me some understanding which component is affecting the rendering process in which way.
Since there are often questions in the forum about which hardware is needed or best, I decided to put my findings online. This is a document I keep for my own reference to quantify if an upgrade is worth the effort / money or rather not. Also to stay down to earth when the next upgrade-fever takes hold of me
So let’s see if more cores, more RAM, better graphics is the way to go.
Monitoring WebODM
For monitoring WebODM a little bash script is used that collects every 5-20 seconds data about CPU, GPU usage and how much VRAM or swap is used.
The machine used for rendering is a headless Ubuntu 20.04.5 LTS installation. There is no screen attached and no other processes were running in the background.
All made observations are repeatable and match the times given in the console.txt file of the process. These stages repeat with any given project. For testing purposes the standard profiles were used, with some exceptions that needed a bit of tweaking to run successfully.
After a process is complete the monitoring file and the console txt are put in a folder of a webserver and displayed with PHP and Javascript.
The light green bar in the beginning is the opensfm stage, always starting with the feature extraction. In this image the feature extraction is the first red curve.
The cloud densification happens during the openmvs stage marked in orange. Again always starting with a rise in GPU usage (the second, longer red curve).
Towards the end happen the odm stages. That is no official title, but that is where a lot of PDAL and GDAL scripts are running.
The light blue bars in the background show the RAM utilization with swap in purple and the GPU memory (VRAM) in orange. But swap and VRAM are hardly used in the sample.
This is an image of the same process, but without GPU. The red circles mark the feature extraction and the second one the point cloud densification. The major advantage of a GPU during the point cloud densification becomes very obvious (2,5 times faster than CPU).
If not mentioned the images are 4000x3000 pixel (12MP) and were taken with a DJI Phantom 3 Pro . The dataset with the 20MP was flown with DJI Phantom 4 Pro as far as I know.
A useful way of structuring this document is probably by hardware component.
Following a list of all hardware component and how they affect WebODM.
Hardware
CPU / cores
Generally the recommendation is, that more cores are faster but more cores would also increase memory usage.
The comparison here includes the amount of time a 4 or 8 core CPU needed and also the maximum memory (physical and virtual combined) that was used during the process.
A core in this comparison means a physical core, not threads. Each core here computes 2 threads, which means concurrency / threads is 8 for 4 cores and 16 for 8 cores.
feature/pc quality | 4 cores / memory | 8 cores / memory | % diff | Note | |
---|---|---|---|---|---|
242 images | ultra / ultra | 66min / 38.9 Gb | 62min / 41.9 Gb | +6% | with GPU |
473 images | high / medium | 135min / 24.3 Gb | 109min / 27.7 Gb | +19% | CPU only |
473 images | high / medium | 95min / 22.3 Gb | 81min / 25.9 Gb | +14% | with GPU |
1413 images | high / high | 531min / 97.9 GB | 467min / 101 Gb | +12% | with GPU |
By doubling the core count from 4 to 8 cores you can save between 6 and 19% of time. Depending on the project size (amount images) and the settings (feature and pointcloud quality).
Interestingly the ultra settings showed the least difference between the core count.
The amount of memory used during a process increases between 3 and 16% when doubling the cores. There seems to be no direct co-relation between core count and memory usage. Meaning that memory use increases with more cores, but not proportionally.
I can not say what 12, 16 or more cores have for an effect, but from this little experiment it looks like doubeling the cores gives only 6-19% more speed.
Memory
First an overview of how much memory a given process uses. Here again memory means the combined use of physical and virtual memory.
feature/pc quality | max memory used | |
---|---|---|
242 images | ultra / ultra | 38 Gbyte |
473 images | high / medium | 24 Gbyte |
473 images | ultra / ultra | 125 Gbyte |
1002 images | high / medium | 42 Gbyte |
1413 images | high / high | 98 Gbyte |
2819 images | ultra / medium | 152 Gbyte |
It’s apparent that the more images, the more memory will be used.
For the 473 images a sample with feature- and pointcloud quality ultra was added, which shows the substantially higher demand of changing these settings.
A second question off course is: how much physical memory is needed?
Let’s have a look at the same processes run with different amounts of physical memory. The virtual memory (also called pagefile or swap) remained the same.
80Gbyte | 64Gbyte | 16Gbyte | % diff | Total mem used | |
---|---|---|---|---|---|
473 images (ultra/ultra, gpu) | 316 minutes | 419 minutes | +24% | 113 Gbyte | |
1002 images (high/medium) | - | 150 min | 162 min | +8% | 42-47 Gbyte |
Having more physical memory can decrease processing time significantly, but that depends very much on the overall used memory for that process. The common rule of sizing virtual memory to be 2x the size of physical memory seems reasonable.
Having 5 times the memory shortened the process by 24%, 4 times more memory shortened it by 8%.
Another very interesting aspect is the type of storage on which to place the swap- / pagefile.
Physic. memory | Swap storage device | Swap in use | Time needed | % |
---|---|---|---|---|
80 Gbyte DDR4 | NVMe M.2 SSD | 34 Gbyte | 316 minutes | 100% |
16 Gbyte DDR4 | NVMe M.2 SSD | 97 Gbyte | 491 minutes | 156% |
16 Gbyte DDR4 | SATA M.2 SSD | 98 Gbyte | 738 minutes | 234% |
16 Gbyte DDR4 | 3,5” SATA HDD | canceled | >1000% |
(473 pics “land-after-rain” series)
Looking at these numbers shows that modern NVMe drives offer superior performance to any other (common) storage devices. All while keeping the computer responsive even during intense use, since NVMe are optimized for multiple accesses at the same time.
SATA SSDs are still usable but evidently HDDs should not be used for swapfile usage. The system becomes non responsive during processing and after 21 hours into the process and still doing the matching, this test was canceled.
Given the price point of modern NVMe PCIe 4.0 disks, it is highly recommended to use such a device for holding your swap / pagefile. Even if your PC offers only PCIe 3.0 or earlier, the later generations of NVMe drives offer superior controllers, easily handling multiple accesses and moving more data.
Installing more memory has strong effects on the processing time, as much as placing the swap- / pagefile on a modern NVMe drive.
Graphics / CUDA supported rendering
First of all, when speaking about GPU support for WebODM, at this moment (Nov 2022) this only concerns graphics cards that support CUDA. Means NVIDIA Geforce and Quadro cards.
To my knowledge there are now other producers supporting CUDA.
Another point to consider is, that only feature extraction (opensfm stage) and point cloud densification (openmvs stage) are using the GPU.
Feature Extraction
For feature extraction using GPU support you can expect the following improvements.
feature quality | CPU only | with GPU | x faster | |
---|---|---|---|---|
242 images | ultra | 6:10min | 2:28min | 2,5x |
5% of total | 4% of total | -24% features | ||
473 images | high | 2:46min | 0:43min | 3,6x |
2,5% of total | 1% of total | +1,7% features | ||
1002 images | high | 5:28min | 2:56min | 1,9x |
3,7% | 2,5% | +1,2% features | ||
1413 images | high | 16:54min | 16:24min | 1x |
3,1% of total | 3,5% | -3,2% features | ||
2819 images | ultra | 59:36min | 19:52min | 2,9x |
6,8% | 2,5% | -0,5% features |
of t otal in the table above means “percent of Total Process Time”. To put the overall efficiency in perspective.
It becomes obvious that using the GPU for feature extraction is definitely faster, in most cases and in some cases even by a factor of nearly 3 times. But there are two caveats to this:
- feature extraction on the GPU consumes a lot of memory. The size of the image is determining the used memory here and therefore the feature-quality setting will decide if your GPU is capable of doing the feature extraction or not.
- feature extraction only contributes 1-7% to the overall process time. So using your GPU for feature extraction in the best case will save you 7% of processing time. That is not a whole lot.
The quality of the feature extraction does not seem to make a real difference between CPU or GPU. Visual inspection of the resulting Orthomosaics does not show a quantifiable difference and the number of extracted features is within margin of error. Running the exact same process twice can yield higher variance between extracted features, than is the difference between CPU and GPU feature extraction.
Outliers (like the -24% mentioned above) confirm the rule
About memory usage of GPU during feature extraction (fc):
Image Size | VRAM @ fc high | VRAM @ fc ultra |
---|---|---|
4000x3000, 12MP | 1,5 Gbyte VRAM | 4,7 Gbyte VRAM |
5472x3648, 20MP | 2 Gbyte VRAM | >6 Gbyte VRAM |
A 12 Megapixel (MP) image on setting ultra consumes around 4,7 Gbyte of Video Memory (VRAM). The same image on feature-quality high will only consume 1,5 Gbyte of VRAM.
A 20MP image on ultra was too big for the graphics card used during testing, but on high setting consumed 2 Gbyte of VRAM
That means you will need more VRAM the bigger your images and processing demands become.
Bottom line of using the GPU for feature extraction is saving up to 3% of total process time.
Point cloud densification
Compared to feature extraction the point cloud densification offers huge gains when using a GPU:
PC quality | CPU only | with GPU | x faster | |
---|---|---|---|---|
223 images (20MP) | ultra | 129:30min | 14:51min | 9x |
33% of total | 7% of total | -46% total | ||
473 images | ultra | 127:21min | 10:51min | 12x |
26%of total | 3,5% of total | -36% total | ||
1002 images | medium | 15:03min | 1:32min | 10x |
10% of total | 1,5% of total | -52% total | ||
1413 images | high | 80:15min | 7:30min | 11x |
15% of total | 2% of total | -14% total | ||
2819 images | medium | 19:59min | 3:26min | 6x |
2,3% of total | 1% of total | -4% total |
This part of the WebODM pipeline benefits greatly from GPU usage by being 10 times faster. Especially when increasing point cloud quality the overall process time reduces by as much as 50%.
There is almost no VRAM used during this part. Any relatively recent card with >1Gbyte of VRAM can render this part.
The possible time savings for feature extraction are rather slim, but with point cloud densification the overall process is massively affected. In some settings even halfing the time a process takes.
The gained speed with a GPU even sometimes exceeds the 10x speed up promised in this announcement:
It also becomes obvious that there can not be much further acceleration from a more powerful graphics card. The point cloud densification is reduced from 33% of total process time, to a mere 7% of the time the whole process takes. That’s a massive improvement and a more powerful card could only shorten the whole process by a mere 7%.
GPU memory usage
The only process using the VRAM (Video memory) is the feature-extraction as mentioned above. Overall gains are small here, so VRAM does not majorly influence the length of a process.
When using a GPU for point cloud densification, which yields very feelable time savings compared to feature extraction, the VRAM is not used at all. So it would not make any difference if you have 24 Gbyte or just 4 Gbyte of VRAM available.
Indicating that if you want to shorten your processing times, a CUDA card with 4 Gbyte VRAM will shorten your process nearly as much, as having a card with 24 Gbyte.
Especially adding costs into the equation you would pay 200% or more for more VRAM only to save at the most 7% processing time.
Put another way: 150 USD for something like a Geforce GTX 1660 6Gb VRAM will shorten your process by up to 50% and 300 USD and more starting with a Geforce RTX 3060 12Gb will shorten it by up to 54%. That is 4% effect for 200% money.
The cheaper card will give you most bang for your buck
Storage (project files, Docker container etc.)
There seems to be no significant impact on processing speed between SSD or HDD as storage media.
Expect for projects up to 1000 images that 100 Gbyte will be occupied during processing, with bigger projects (2000-3000 images) can easily reach up to 250-300 Gbyte.
If your storage gets tight use “optimize storage” function in WebODM.
The only exception is the system swap- / pagefile that should ideally be on a modern NVMe drive as mentioned under memory.
Recommendations for starters
If you are starting out withWebODM a computer with 32Gbyte of RAM, a decently recent CPU (>4 cores) and a 1Tb NVMe drive (or bigger) should enable you to render all home, semi professional projects. Even the occassional bigger job (2000+ images) can be tackled, albeit that it will take a bit longer (>1 day).
A GPU is not necessary, but an existing NVIDIA card can be put to good use.
Recommendations for ambitioned users / pro’s
When performing many computations with 500-3000 images, 64 Gigabyte memory and an 8 core CPU is the start. Providing already reasonable processing times for projects up to 3000 images.
Make sure to use a latest generation NVMe for the swap- / pagefile.
Since many projects will be computed, the project folders / docker files can be stored on a >3TB HDD. Offering the possibility of keeping a large number of projects at hand.
A GPU will be helpful, especially to speed up projects that need ultra quality for their point clouds. As of writing this article (Nov 2022) and for exclusive use with WebODM a Geforce GTX 1660 SUPER with 6 Gbyte of VRAM offers most bang for the buck at around 150 USD.
It will give the biggest increases in speed at a reasonable cost and since the Geforce GTX 16xx series is quite recent, it will probably be supported for quite some time more.
For a sincere step up in GPU computing capacity a modell with 12Gbyte should be chosen. Less memory is not future proof.
Such a graphics card will increase speed by a single digit percent point, but cost double or more than the previously mentioned 1660. Solely for WebODM a powerful graphics card will not provide feelable time savings. Quality differences are not to be expected.
Yet a powerful GPU (best if used for other things also) can be put to very good use in WebODM.
About the test
For this document more than 40 processes were run and monitored. A handful of processes (~5) were run more then once to see the deviations between individual runs.
A certain “magic” is still in the process, resulting in differently looking orthomosaics, found features, processing times etc every time a project is run.
Yet the differences for identical processes run several times lies at around +/- 5%.
The following components and software was used during testing:
CPU: | Ryzen 7 5700G (up to 8 cores / 16 threads) |
---|---|
RAM: | Up to 4x8Gbyte, 2x32Gbyte DDR4 memory |
Storage: | WD Black 1TB SN770 (M.2 NVMe PCIe 4.0), WD Blue 1TB SSD (M.2 SATA), Hitachi 3TB HDD (3,5” SATA, 7200RPM) |
Motherboard: | Gigabyte B450 |
OS: | Ubuntu 20.4.5 LTS with Kernel 5.15, Docker 20.10.18 |
WebODM: | Version 1.9.15 with ODM 2.9.1 |
When a lower core count is mentioned, it means some cores are deactivated in the UEFI.
Different amounts of RAM are achieved by mixing different modules (16GB=2x8GB, 32GB=4x8GB, 64GB=2x32GB, 80GB=2x32GB+2x8GB).
If somebody cares, I am sharing the tests and the script I made here:
http://crearconamor.com/webodm-graph/
This is a webserver of a friend and I can use it for now. You can select different datasets, zoom, scroll and get important infos about it. Best is to set the hook before “show load”. The other setting did not show the CPU usage during point cloud densification since it was a process with high nice -ness (linux’er term)
Please be aware, that the hole script is quick and dirty, not very ressource efficient the way it loops through the console file and usage CSVs.
The original console file and usage CSVs can also be downloaded on the datasets webpage.
To mention it: the interval of the usage CSV is mostly between 3-5 times per minute. I tried higher resolutions and they almost stall even powerful computers. Reducing to 3-5 samples per minute does not change the min-max numbers but allows for more fluid use of the PHP script.
If anything is off, please let me know. I am happy to edit / add it.
If you have an interesting scenario / comparison that I missed, let me know.
Maybe I can run it and add it to the datasets. For now the system is set up and runs smoothly.
!!! Disclaimer !!!
This document serves the sole purpose to inform ONE individual (me) about the hardware requirements of WebODM. With the intention to guide future upgrades to effectively affect the process. Since I like keeping nice documents and want to share something with the communtiy, I decided to plublish this document.
I am no developer, just an enthusiastic WebODM user.
The tests done are far from exhaustive. It all started with the wish to see if WebODM is still running or got stuck and expanded into a little database, enabling me to compare between them.
You will decide what works for you or not!
I am computing 10-12 projects per year with 250-3000 images.
If you find something that is grossly off, please let me know.
I am happy to correct the document