ODM Windows Native version performance issues.(2023)

Hi everyone,
I’ve been trying to track down a problem I’ve been having with Windows Native ODM for a few months now, the low performance of the native version compared to the docker version. I’m aware that I’m not the first user to make a post about this topic, but those threads are now closed due to lack of activity and age. (I’ll include links to some of them at the bottom of this post). I was hoping that somehow ODM 3.0 would possibly change this, but sadly I still see the same kind of issues.

Some more details:

The native version was installed using the installer that was given to people who bought the docker installer back in 2021. I’ve really liked the native version due to the problems I had with docker on different drives and, virtualization issues. I was able to work around them, but it was quite bothersome when compared to the native version.

The WebODM version is 1.9.18, ODM version is 3.0.2. It’s running on Windows 10 Pro 21H2 build 19044.2604, with up-to-date drivers. The system specs are:

Ryzen 5900x
64gb ddr4 ram
RTX 3060
WebODM on drive E:/, a 4tb sata SSD.

My datasets were captured using:
DJI Mavic mini, Air 2, and Air 2s at full resolution JPEG.

I’ve noticed the difference with all sizes of datasets. Using default preset, and other presets. I’ve noticed low usage of hardware, specially when compared to the docker version. CPU usually sitting around 10%, RAM around 25% with other applications on the background, GPU some 15% usage (checked on Task manager and OCCT). SSD activity is basically 0~1%.

I noticed a possible speed up when changing the matcher-type from flan to bow, but I got inconsistent results like it is noted at the Docs. So this might just be a fluke.

GPU acceleration could be a factor as well, but since I’ve experienced this problem for a few years, before GPU acceleration was implemented on the native version, I don’t believe this is the cause.

Something interesting I noticed is that I was able to process way bigger datasets when compared to docker. I don’t have the data to prove this point, but if memory serves me right, I was only able to run datasets of at most 3k photos on docker. On native, I was able to process a dataset of 6385 photos with default bow matcher, with ok results after 32hrs.

Right now as a Hail Mary, I’m running my biggest dataset, made of 20137 photos, on the computer since I won’t be using it for a bit. After 36 hours running, it’s on the matching step and is still showing activity on the task output. This is just to fulfil my curiosity, since I wasn’t able to figure out the problem with the poor performance.

I’m just a hobbyist, so there is always the chance that I’m missing something obvious. Since the computer being used is my main desktop, it far from a clean environment that pros might be using for this kind of thing. But I don’t see what could be causing this issue besides an inherent problem/difference of the native version.

Thanks for reading the post, any insight will be greatly appreciated. Let me know if there is any relevant info that I might have forgotten to include in here.

Some questions you might have:

Q: Why not use docker then?
A: For me, docker is clunky and having to deal with hyperV, wsl2 and all this has takes quite a bit of my time and even had me to reinstall windows a few times. Native is almost perfect, the only problem is the performance.

Q: Could you provide examples of the difference by running the same thing on both versions?
A: I could do this if it might help you find the problem, but since it takes a few hours/days to run a test, and that there’s already examples that other provided, I figured that it wouldn’t be necessary.

Q: What settings have you tried changing from the defaults?
A: Like mentioned before, I changed matcher-type. I’ve also tried using split to help with bigger datasets, but there were no performance differences (I imagine this just helps with ram problems). No-gpu was also tested, I did notice an increase of usage on the CPU, but still quite slow.

Q: Do you really expect to be able to finish the 20k dataset?
A: No, I just wanted to give it a try.

Q: Have you tested other software with the same dataset?
A: Yes, I’ve tried Metashape with success. It’s way faster, but that is to be expected from a commercial product, and I like trying the open source options when available.

Q: What other hardware have you tested on?
A: I also tested this on my laptop with an I7-7700hq, GTX 1060, and 16gb of ram, and also I’ve had different hardware on my desktop computer in the past few years. But besides the expected difference in hardware, the difference between version still persisted.

Previous posts with similar concerns:

5 Likes

Welcome!

Excellent issue report!

Sorry this affected you as well. As you’ve noted, this has affected some machines and we’re not able to reproduce it nor really determine what exactly is going on.

What it feels like is an issue with mixed-topology Intel CPUs (ThreadDirector) and/or really high core-count CPUs (10+?). I have not been able to substantiate this.

Given the age of the laptop system, did you notice more full resource utilization there? On my i7-6700k, 32GB RAM, GTX 1050TI I had pretty much saturation on Defaults under Windows native. Same goes for my Celeron N3450, 4GB, HD 530.

2 Likes

Thanks for the reply!
I decided to run the same dataset on both computers. I reinstalled Native webodm on both machines, using the most recent installer available, and using the default preset. They are still running, and I’m not sure when they might finish, but here is the info I have for now:

Computer 1:
AMD Ryzen 5900x 12 core/24 threads
64gb of ddr4 3600MHz ram
4tb Samsung 860 EVO SSD
RTX 3060 12gb
Windows 10 Pro

Computer 2:
Intel i7-7700hq 4 core 8 threads
16gb of ddr4 2400MHz ram
2tb Seagate Hdd
Intel Hd 630 shared 8gb & GTX 1060 dedicated 6gb
Windows 10 Pro

While task manager is far from the best tool, I imagine it’s the one most people are used to. Things I’ve noticed:

  • Computer 2 has a slightly higher resource usage, possibly due to slower hardware.
  • On computer 2 the program seems to be using the iGPU, half of its usage is the remote program I’m using. I’m sure it’s not too hard to have it use the dedicated GPU, but for this first test I wanted to avoid changing anything.
  • It looks like that some threads are being used more than others, specially on computer 1, this might be due to the system’s way to handle tasks.
  • The HDD has spikes of activity, possibly slowing the speed, but it doesn’t look like a bottleneck.

I’ll be observing both computers as they go through the different steps of processing the data. I’ll also later on try to install the docker version to compare the performance.

I’m just using a dataset I’ve captured, but is there one that you guys usually use to diagnose and test things? If so, let me know and I can use it instead. Also, if you need any specific data, let me know.

I know you guys are always busy with something, I wouldn’t say this is a major problem for the whole project, but I hope that this info might be of help whenever this’s looked into by others. :slight_smile:

btw: I ran the 20k dataset for 170hrs, but it still was at the matching step, so I decided to cancel and do actually useful things. :stuck_out_tongue:

3 Likes

Continuing the previous post, here is the completion time of Computer 1 and 2.

Computer 1:

Computer 2:
NcBJwS4zst

As expected, the performance on the desktop computer far exceeds the laptop. While I missed most of the processing data of the desktop since it happened overnight, I was able to catch some processing of the laptop and noticed something intriguing, after matching the hardware usage is closer to what’s expected.

So I decided to run a small dataset on computer 1, and this time logged the data using hwinfo64.
Here is the time project info:

And graphs of the most relevant hardware usage (apologies for the ugly graphs):

The data logging started just before the dataset started. As you may see, the greater part of the time the computer is not using that much hardware, and from what I could gather, I believe that this low usage period is the matching step.

It started at 10:07:42, and ended at 11:52:16,599, about 1:45 hours, and sure enough it matches the low usage time. (p.s.: I wasn’t running using the debug flag, so this is just from the matching step, I don’t have exact periods for most other steps.)

I’ll now run some more test about matching and will be using the ODM debug to get some more info, and once I have more to report I’ll reply to the thread.

I hope this info might help with finding the problems. If you have any suggestions or questions, please reply.

Edit: Huh, I swear there was a debug flag on the options page. I guess not, for now I’ll just run the projects with the regular logs.

3 Likes

An update on what I’ve seen:

After some testing, this whole problem might just be me misremembering, I’ve tried testing docker webodm to get some more data, but I constantly am getting worse results. Not sure if it’s something wrong with my docker/windows installation, but it always uses all the ram and no GPU when processing the same data set as the one above. Here is a graph of the resource usage:

I’ve also ran more tests than this one, doing things like reinstalling docker and odm, smaller datasets, and different setting, but no noticiable difference.

I’ve done a little research on cuda through docker on windows, but it seems like it’s more complicated than just changing some settings, and currently I don’t have the time to learn this. So this might be it for now. If I make any progress, I’ll reply here.

2 Likes

Your Task Manager screen captures are showing a big difference in CPU speed so take that into consideration.

1 Like

Those are 2 different machines (I wanted to make sure that it wasn’t a hardware problem). And It looks like it wasn’t.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.