Typical processing speed on large datasets (>500 images)?

Hi Everyone,

First, I know processing speeds will vary wildly based on hardware but just wanted to share my experience and ask if it was typical/expected or if I possibly have a configuration issue (I’m new to running my own physical server).

I have a Dell r710 with an Intel Xeon 5650@ 2.67ghz and 64 gb ram. I’m running ProxMox and I’ve setup a Ubuntu 18.04 VM with 48 gb of ram and all the cpu power allocated to it. I’ve installed WebODM via the docker image.

I’m processing a 644 image dataset on High Resolution and it’s been running for over 16 hours. Is this normal or expected to take this long to process? For reference, I ran the same dataset on an r5.xlarge AWS instance (32 gb ram, Intel® Xeon® Platinum 8175M CPU @ 2.50GHz ) and it took 4.5 hours to process the dataset.

Is my machine this much slower? I know the CPU on the AWS instance is much more powerful and also has an SSD but I guess I wasn’t expecting it to be this much slower.

The dataset I’m processing can be found here. It’s the Back 9 dataset.

Thanks!

2 Likes

We’re just starting to collect benchmark info (see this page) to answer questions like this. But it’s still pretty early days. We should have some larger datasets up in a couple of weeks but at the moment there’s only one result up at >500 images.

It took ~7 hours for me to run the “Zoo” dataset (524 images, linked above) on a decent-but-not-new laptop with default settings and letting WebODM resize the photos to 2048px before processing. I have noticed that removing the resize can triple my processing time, or more.

So all in all… 16 hours on the High Resolution config may not be excessive, if you’re not downsizing the images first. (Are you?)

1 Like

Thanks for the info. I’ll try running the Zoo dataset on AWS and on my r710. I also might try running Ubuntu on bare metal rather than in the ProxMox hypervisor, just to eliminate any config issues I might have with ProxMox. No I’m not downsizing - I selected “No”, to resize images.

OK great. My processing machine is sitting idle so I’ll try to run that Back 9 set a couple different ways and post results. If you’re just comparing the two systems you might also try a very small set (e.g., Toledo) to give you a quick and rough comparison.

Just finished processing the Zoo dataset with defaults and Resize 2048px on two machines.

  1. AWS r5.xlarge (32 gb ram, Intel Xeon Platinum 8175M CPU @ 2.50GHz)
  2. Dell r710 (64 gb ram, Intel Xeon 5650@ 2.67ghz)

r5.xlarge = 5h 4m
Dell r710 = 2h 28m

Kinda surprised the r710 finished that much faster. I guess the 32 extra gigs of ram really helps despite the much less powerful cpu.

I updated the GitHub odm-benchmarks page and made a pull request.

I’m running the Zoo dataset on both machines with high resolution config now. I’ll post results.

1 Like

Outstanding, thanks. I see the pull request and will merge.

1 Like

should we add I/O conditions to benchmarks? Great work, I think adding some idea of whether processing happened with fast IO (local SSD) or slow IO (network, or slower spinning disks) would be great… I wonder if IO conditions contributed to @fpolig01’s observation?

If I get a chance I’ll have a crack at it… would a ‘storage access’ column with things like ‘7200RPM hdd’; or ‘local SSD’; or ‘USB3 external’; or ‘s3 in local datacentre’ be enough?

2 Likes

Yep, I think adding I/O info is a great idea. The more info the better.

The results from running the Zoo dataset with High Resolution configs are interesting. The Dell R710, took 7h 49m and on the r.5xlarge it’s taking 15+ hours (it’s still running). So it’s almost the opposite of what I observed when I ran the first original 644 image dataset I referred to in my original post (although I was running Ubuntu on Proxmox then, now I’m running Ubuntu straight on the machine, with no Hypervisor).

I notice in both log files this message -

/usr/local/lib/python2.7/dist-packages/joblib/externals/loky/process_executor.py:706: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
"timeout or by a memory leak.", UserWarning

and also in both logs -

Error) GDAL failure (1) OGR_G_RemoveGeometry() not supported on polygons yet.
(pdal info filters.hexbin Error) GDAL failure (1) OGR_G_RemoveGeometry() not supported on polygons yet.

Not sure if these are already known about/observed. I’m going to let the r5.xlarge finish, it’s been at the below step for a while now.

running pdal pipeline -i /tmp/tmpL0t_VB.json > /dev/null 2>&1

1 Like

Well, this is embarrassing. It’s looks like I’ve been running out of disk space!! In both cases were it took >15 hrs, I had only allocated 100 gb!

no_space

nospace2

This explains why the r710 on bare metal finished quickly, because that was using all my drive space not just 100 gb.

3 Likes

Ah! Good find. Filling the disk is surprisingly easy and I was just wrestling with that myself a couple of days ago.

3 Likes

Good call! I’m updating columns and will include this.

2 Likes

@fpolig01 Were you able to successfully process the Back 9 dataset? I tried a couple of ways but consistently get an Exit Code 1 after ~3 hrs. I expect this is due to my RAM limitations (16G). Just curious if it completes for you, now.

Yes, I was able to complete it on my r710 with 64 gb ram. It took 26 hours for High Resolution settings!

2 Likes