NVIDIA CUDA graphics card recommendation

Good evening!
Since some months now I am making fantastic maps with WebODM and am super happy about this software.
Before I used some other, but being able to process on my own hardware with as many images as I like, is fantastic.

After reading repeatedly about it in the forum, my question is:
what graphics card for CUDA computing would you recommend?
At the moment looking at a NVIDIA GTX 1650 with 4Gb VRAM or a 1660 with 6Gb VRAM.

The graphics card would be installed in this machine:
CPU: Ryzen 7 5700G
RAM: 32Gbyte DDR4
Storage: 256 Gbyte NVMe with 2TB HDD
OS: Ubuntu 20.04 LTS (kernel 5.15)
WebODM is running in a docker container.

The drone is a DJI Phantom 3 Pro, so image resolution is 4000x3000 pixel.
But this may change in future.

Short background is, that the computations are taking place Off-Grid. I am living in Uruguay, South America, a good bit away from civilization.
The aerial maps are used for permaculture eartworks and general land management.
Which means energy consumption is a concern and computing several days in a row is not often if at all possible.
Some of the bigger maps with 2500 images I made take 70-90 hours and more. Depending on the settings and if I put the Swapfile on the NVMe or HDD. Since 32Gbyte RAM isn’t that much.

Will a Geforce GTX 1650 with 4Gb VRAM shorten the overall processing time?
Would I have any advantage from having 6Gb of VRAM?

Comparing performance per Watt and overall power consumption I would clearly prefer the GTX 1650. The computer is not connected to a screen, so no other use for the graphics card but CUDA computing. The other application I plan to use is Meshroom, but as far as my research goes, it should be fine with either 4 or 6Gbyte VRAM.

In this post:

it is mentioned that the image size determines the needed VRAM.
In the post one person (MarkoM) has occasional fails with 4000px resolution and 8Gb VRAM and somebody else (Gordon) says he successfully computes 4032px images with 4Gb VRAM.

In other posts, people with 6 and 8Gb VRAM report of failures.
It seems that CUDA usage is generally not trouble free, though I hope there is a silent majority for who it just works. Especially running “out of memory” messages seem to be frequent.

If it helps I would also be open to go for a Geforce RTX 2060 with 8Gbyte of VRAM, but that is a whole lot more of electricity for 2Gbyte more of RAM. CUDA performance between 1660 Super and 2060 is not decisively much.

That is why I thought to put up a new post, trying to receive some thoughts of other users, maybe also people who successfully use CUDA to shorten their processing time.
Looking forward to hear from the community!

WebODM rocks!

1 Like

Welcome to the Community Shiva :slight_smile:

You will probably need 6GB for full size P3P images and highest quality outputs.

If I shrink my M2P images by 0.67X to 3666 pixels wide I can process with ultra quality feature extraction, but can only use high quality with full size images for GPU feature extraction, with a NVIDIA GeForce GTX 1650 SUPER.

You will need more than 32GB of RAM for large datasets of thousands of images, otherwise there will be a lot of virtual memory writing and reading from your SSD, which slows progress a lot. I’m currently processing 1435 images at ultra/ultra and expect it to take around 100 hours to finish, with 96GB RAM, and 262GB of virtual memory, although the paging file doesn’t get much use with this many images.

I’m also off-grid with PV panels, 20kWh battery, and backup generator for cloudy weather.

This graph is CPU power (i7 10700K 3.8GHz processor), showing how it has varied over the past 12 hours. The 2 large sustained high power sections are during image matching, and the smaller one an hour ago is undistorting images.
Of course the fans also consume power, and the GPU too (lower trace on the graph), although only a few watts currently. It will increase significantly in the meshing stages.

For a 2500 image set, using GPU feature extraction wont reduce processing time by a vast amount. If a task currently takes 4 days with CPU, it would still take over 3 days with GPU feature extraction.

Also, there are some issues with a ‘race condition’ when using GPU feature extraction, where some files are not written before they are called in a slightly later stage of processing (usually .npz files), which can lead to failures. This only happens some of the time, but I am yet to figure out under which exact conditions it occurs.

2 Likes

Hello Gordon,

thank you very much for your very informative reply. Kind of exactly what I was hoping for :slight_smile:

It seems the more VRAM available the better, especially considering that I will probably upgrade at some point to a newer drone with a higher resolution. Though the actual CUDA performance does not seem to majorly influence performance?

That subjectively does not look like a hole lot of difference to me.

At least not while increasing power consumption of the hole system, potentially creating new troubles to get CUDA working properly and also spending money on it.

Another big factor seems to be the available working memory for the CPU. Maybe even more than a CUDA capable graphics card. At the moment with “only” 32Gbyte of RAM the paging file sees extensive use and already relocating the paging file to a NVMe speeds up lengthy processes by a few hours in my case.

So after reading your reply and reconsidering my situation, I would almost just upgrade to 64Gbyte of RAM (I think that is the max for the Ryzen 7 5700G) and replace the NVMe for a bigger model to relocate and expand the paging file (swapfile).

Since making aerial maps is only one part of my many jobs I have here at the ecovillage it seems most trouble free and worthwhile to stay with CPU computing and rather make that work as smoothly as possible, by increasing RAM and putting the paging file (I always wanna call it swapfile) on a NVMe.

Though I saw that the NVMe is writing lots of Gigs when placing the paging file on it. But I can have 128 or 256 Gbyte of swapfile without problem and NVMe’s are getting really cheap these days.

Ok, to answer my own question which NVIDIA card would shorten the process time most in my case: probably none.

I had hoped for cutting processing times in half at least. That would have been worth some trouble and money. But at the current state of things and having access to a decently fast 8 core (16 threads) CPU, it does not look worthwhile the effort to me.

Though I may anyway get a NVIDIA Geforce GTX 1650 to be able to do some Meshroom tryouts on a second PC. If I am bored, I may also try computing an ODM project on the secondary desktop with the CUDA card, just to see how things are going :slight_smile:

Thank you Gordon for your input and for sharing your real life experience!

1 Like

OK, testing finished!
resized images to allow GPU use for feature extraction

Not using GPU (CPU only)
165 images 03:17:58
|Created on:|08/08/2022, 12:10:55|
|Processing Node:|node-odm-1 (auto)|
|Options:|auto-boundary: true, dem-resolution: 2, feature-quality: ultra, gps-accuracy: 6, mesh-octree-depth: 12, mesh-size: 300000, no-gpu: true, pc-filter: 5, pc-quality: high, resize-to: -1, use-3dmesh: true|
|Average GSD:|4.73 cm|
|Area:|238,162.51 m²|
|Reconstructed Points:|16,172,403|

Using GPU for feature extraction
165 images 02:58:02
|Created on:|08/08/2022, 08:56:40|
|Processing Node:|node-odm-1 (auto)|
|Options:|auto-boundary: true, dem-resolution: 2, feature-quality: ultra, gps-accuracy: 6, mesh-octree-depth: 12, mesh-size: 300000, orthophoto-resolution: 1, pc-filter: 5, pc-quality: high, resize-to: -1, use-3dmesh: true|
|Average GSD:|4.38 cm|
|Area:|229,008.81 m²|
|Reconstructed Points:|13,659,424|

So as you can see, not a huge saving in time, and results are similar

1 Like

Hello Gordon,
yes, that drives the point home.

I am already looking for a good second hand offer for 64Gbyte of RAM. A 1Tb NVMe is already on the way.

A question appears when reading your responses:
Do you always use ultra for feature-quality?
I just saw you mentioning it several times now.

It makes me generally wonder if I should also run my processes more often on ultra. I did that in the beginning a couple of times, but could not make out significant differences. Except that it took waaaayy longer to compute.
I had more observable changes just running the exact same process twice. Here and there holes and on the second run not, with exactly the same settings.

And thank you again saving me a lot of time and a big disappointment. I was hoping for 2-3 times faster processing times with CUDA. Similar to what I can see using QSV (Intel Quicksync) on videomaterial compared to x264/x265 on CPU. Though the video material bloats also by a factor of 2 or 3 in size for comparable quality.

Hope you are having a good time!
Shiva

1 Like

Hi Shiva, for moderately large datasets I sometimes run high/high to start with to see how it looks, then follow up with ultra/ultra, but the task duration really does blow right out sometimes. For a couple of hundred images I’ll generally go ultra/ultra straight away if I’m confident the dataset is good.
For example I’ve just arrived home after driving along our gravel road (12 km to the bitumen) which is in terrible condition, and stopped along the way to manually fly a short section of very potholed road. 219 images with heaps of overlap, the road covered in 4 passes, 2 flying sideways and 2 back and forth along the road, all at around -60 to -70° so I’m expecting to obtain a nice DEM and 3D model and I’m running ultra/ultra straight away for this task.
The difference between high and ultra for large area orthophotos will only be noticeable when you zoom in enough to resolve the GSD, otherwise when viewing the whole area on a computer monitor they can look almost identical. It really depends on how much detail you need.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.