GPU programming and overview assistance

Remembering what i had to go through to get anything to start compiling last time, I’m hoping for a little push this time.

As soon as I can get the book, and -some- of the previous work sorted and installed onto an external SSD, I will dive back into where I was before.

However, with version 2 having been announced, is that even worthwhile now? Have the overall design structures been modified / changed?

I will certainly need some guidance for V2, even if it duplicates what we did in V1. I don’t trust my memory well enough to strike out blind again, even with my lab notes.

I’ll probably get an nVidia 3070 board to help and use the 2070 Super was an associated processor chain later.

I almost feel like I’m starting afresh, as I’ve not studied V2 at all yet. Where is a good place to begin???



No one has suggestions?

I got nothing, dude. I’m so far away from understanding this codebase, it isn’t even funny.

I don’t have much to add, but that there’s some work going on actively here:

When the OpenSfM work is done, we will need to update our version which is getting pretty old.

1 Like

I wonder if @OwKal and team ASDC can have input here, based on expressing interest in the OSfM part ;)?


@adamsteer absolutely willing to help where we can on getting some GPU utilisation! I’m experimenting with getting NVidia docker working on kubernetes at the moment, it’s a few more steps than when running locally to get a working GPU in the container.

So as discussed, the easiest win seems to be that mapillary PR, then updating OpenSfM in ODM, which is probably best left to those with an intimate understanding of the OpenSfM calls in ODM, but as part of getting more familiar with codebase I might have a go at merging a newer OpenSfM and rebuilding, will at least get some interesting errors I’m sure.

@skypuppy regarding your topic, not sure of the background with what you have done so far, but to continue the discussion: have you already identified any other parts of the pipeline that are potential targets for running on a GPU?
How are you thinking of approaching this, ie: are you thinking of writing custom GPU kernels for some algorithms or using existing GPU based libraries as drop-in replacements if possible?

For comparison with commercial software, AgiSoft metashape uses GPUs with:

1 Like

With todays tools, it just makes sense to re-use what is publicly available. When there IS nothing publicly available, then roll up the shirtsleeves, light a fire under the keyboard, and have at it.


  1. There are other tools that might be even better than what is currently in the ODM toolchain and already in the public domain. OpenSFM is a good tool but I’m keeping an eye out on what else is available that might work better, while reducing MY work effort. :slight_smile: That does not mean I won’t support what we already use, just that I’ll keep an eye out.

  2. Updating the existing ODM toolchain is nearly always a winner in my little book, as long as it doesn’t break the existing tools. :slight_smile: I, personally, do not have in depth experience with any of the existing tools and their versions but that was not what I volunteered for. My overall goal with my little effort is to allow conversion of existing tools to enable CUDA/other code to be able to be used by the ODM community.

  3. For my personal development environment, I’m trying to avoid docker for exactly the reason you mention: GPU’s and docker don’t seem to get along well. I have built a computer just for this GPU effort with major cash outlays. I only mention that to give an indication of how much I want to move my little part of this project to a successful end.

  4. Yes, I have identified the major subsections that need to be accomplished, at least of the current toolchain.

  5. Re AgiSoft and etc., yes, the components of their toolchains differ, in various respects, from ODM. That’s neither good nor bad, just an observation.

  6. The guys who built ODM have obviously invested major efforts and are the major drivers behind it’s current success and utility. Having steered intense projects in my past, I feel for them. :slight_smile: Most of my commercial efforts began with 5th generation Database Management system and QNX (almost a precursor to Linux.) Photogrammetry is a recent addition to my little efforts, begun a couple years ago.

  7. Having played with C since the little white book was published, I have some familiarity with it. Less with Python. Interesting that some components of ODM are based in C and others in Python. So there is that gap to fill, too, if needed. For now, the separate toolchain components live well with each other. That said, it does seem that merging the two languages will be required somewhat in the GPU effort. Other subcomponents, like Vulkan that you mention and it’s contrast with OpenGL can present conflicts down the line; so that is another path to beware of.

  8. Don’t know if you read anything of my history but in short: had a few months into the study and code understanding when I got hit with a stroke, just a couple days before that, I made a dumb operator error and lost all the code snippets and online notes in the development computer. All I have of that now is my lab workbook and whatever is in my memory. None of that appears to be lost, but the stroke kind of “reset” my understanding of the ODM design layout. I’ll have to root around like a hog sniffing truffles to pick up any of my previous trail. :(. It ain’t easy.



In looking at some recent nVidia documentation, they have released some libraries that #include# container support! Don’t know how or when but someone (not me!) should follow up on that and see if it benefits us.

[side note] I’m beginning to lust after the Jetson Xavier unit with 384 cuda cores. Yeah, I know it’s only a small fry compared even 3,000-10,000 in video cards, but dang, 384 of them in only 10-15 watts!! I thoroughly enjoy playing with the Nano and it’s 128 cores but won’t even begin to consider that unit for ODM. However, a case might be made for the Xavier. For a stand-alone SOC for under $500 it might be handy in some corner cases. Contrast that with the $500-$600 for a full-up video card with 3,000-4,000 cuda cores, in a desktop you already have and the dollars begin to get interesting. :slight_smile: Following that train of thought, 5 Nano’s at $99 each gives you 640 cores but with attendant transfer speed blocks and it becomes another possible corner case. Still, it’s intriguing.

1 Like

Phooey. Can’t find the nvme drive. What am I gon’ do? Gloom, despair, and agony on me.

NVidia-docker has been around for a few years now, only issues I’ve had in the past were related to using older Grid K2 GPUs that were not supported by the required recent drivers. We’re running via Kubernetes, which makes it a little bit harder but should not be a big deal.

Interested in what you have identified under point 4. I don’t think much of that is going to have changed at a low level since you last looked at it. Having a specific goal (or a series of them) towards getting part of the pipeline running with GPU assistance is going to be the key to making progress.

From the basic profiling I’ve done it seems there could be quite a bit to be gained from more threaded parallel processing before even getting to the GPU stage, but I’m sure the low hanging fruit is gone and there is a reason some slow processes are still running serial.

Sorry to hear about your stroke, hope you’re recovering well.


We’ve pushed “parallel processing” in the multi-core world just about as far as it can go, at least with today’s technology. Splitting the load among multiple machines is the next obvious step but it, too, only gets you so far. GPU’s are almost the perfect solution in this problem domain.

Is everything in our pipeline that is CPU-bound already multi-threaded?

I doubt that very much. That’s mostly driven by the system admin, but the tools are mostly available. Except in the Windows world.

Nope. There are a few places in the toolchain that are not maxing out the CPU usage. The longest running portions are in OpenSfM, and I am hoping updates there will improve things.

1 Like

Smathermather-cm, how would you do that? AFAIK, if the sysadmin of a system only allows, say, a single copy of a process to run at a time, you have to be root/sysadmin to change that so that multiple instances can run simultaneously. Then, you get into the thorny debates about “well my process is more important than yours, so cut that out,” and so on. Even if it’s Windows, I imagine there aren’t many users that even know how. So I’m confused yet again.

I think you might be conflating multiple instances versus a particular tool/program being able to run multithreaded, which has everything to do with the program logic itself and how processes, algorithms, etc are written to take advantage of modern parallel processors, versus single-thread stepwise logic.

System load handing behaviors don’t factor in at this level (ideally).

Another potential untapped resource might be processing extensions such as MMX/SSE/AVX, which help speed up certain operations/instructions.

Would it be a lot of work to support this on a generic level? Are there any funding initiatives on GPU support in the pipeline?

We tried, but failed to get traction.

I think it might be worth waiting to look into a bit until after all the platform updates go in.


Ok, guys, y’all can count me out on GPU acceleration (unless I get a flash of genius somewhere besides from Uncle Jack Daniels. The GPU part is straightforward enough (mostly) :slight_smile: but I can get no support (still) regarding some basics in the ODM design philosophy.

So, anyone want to buy a (almost) brand new $4,000 number cruncher?

1 Like

Maybe taking a poke at ensuring our OpenCV pipeline is OpenCL accelerated when possible might be a great start.

We should have merged OpenCV 4.5.0 into main with the updates happening recently.

According to their example, there should be minimal to no changes required to take advantage of the OpenCL support.
Here’s a diff of their example of CPU code on the left, and OpenCL code on the right:

Even on weak iGPUs, a multiple-times speedup is expected compared to regular CPU code:

1 Like