ODM Benchmark Data

coreysnipes · March 21, 2020, 4:55pm

I’ve started a GitHub repo for aggregating ODM benchmark data. It’s under my GitHub account, but if it seems useful to the community I would like to see it moved over to the ODM github account. I’m happy to manage the contributions if that is useful.

EDIT 3/27: Official repo is now GitHub - OpenDroneMap/odm-benchmarks: Benchmark data index for OpenDroneMap and WebODM

Please have a look and let me know how we might improve it. If anyone has benchmark data to add, post the info on this thread or send it to me in a PM and I will add. You can also submit a pull request if that’s easier for you.

I selected 14 datasets to start. There are probably too many datasets in the 0-200 photo range. It would be good to add a couple of larger sets, too. I am still filling in some details here, but wanted to get it out to everyone for comment.

I’m interested in people’s opinions on everything, but especially:

Does the presentation of this info make sense?
Does using GitHub for this make sense?
Suggested changes to the list of datasets?
Other parameters we should capture?

smathermather · March 21, 2020, 7:05pm

I think github is a fine place to start this, and I’m happy to pull it into an official repo. If you want larger datasets, check out the Red Cross ones here: American Red Cross projects

coreysnipes · March 26, 2020, 8:14pm

@smathermather I think my benchmarks repo is ready for you to pull over as an official ODM repo, if you’re game. Let me know if you’d like to see any changes.

I still want to select and run a couple of larger datasets but I figure this is probably enough to start. If someone else could run a couple of these datasets and post their results here, I’d love to see how processing times compare across different systems.

RainyRockies · March 26, 2020, 8:48pm

I’ll start running these as soon as processing space is available on my windows machine with 12gb.

Question, I have some absolutely immense datasets that I would be happy to share, is there any interest in adding some “insane” options as a means to set a ceiling? I’ve got some that go from 2,000 images → ~15,000 images.

coreysnipes · March 27, 2020, 3:27am

@RainyRockies I would think so. 2-3 large datasets would be useful for finding the upper limits of some configurations, and also for benchmarking split/merge. I’m still working with a single processing node at the moment, but I’m planning to dig in to Cluster ODM in a few weeks.

I struggle with massive downloads sometimes, so splitting those datasets up into a few zipfiles might be helpful. Really appreciate the offer!

smathermather · March 27, 2020, 2:41pm

Check out downthemall for resumable downloads (if you are using a gui). For wget see e.g. https://www.cyberciti.biz/tips/wget-resume-broken-download.html and for CURL: https://www.cyberciti.biz/faq/curl-command-resume-broken-download/

That said – it is kinder to bundle things up into smaller zips.

smathermather · March 27, 2020, 2:42pm

Cool! Fantastic work. This will be a great contribution.

I think you have to initiate the transfer and then I accept it. Let me know when you have initiated it, and I will also add you as a collaborator.

coreysnipes · March 27, 2020, 3:25pm

OK, I found the transfer functions but I think I need slightly expanded permissions on the ODM account beforehand. I initiated the transfer but am getting an error from Github: "You don’t have the permission to create public repositories on OpenDroneMap "

I’m following these instructions. I think this is the key bit: “you must have permission to create a repository in the target organization.” I can hop on gitter to work through this, if helpful @smathermather .

smathermather · March 27, 2020, 5:19pm

Added you to a benchmarking team. It’s been since WebODM since I have done this, so let me know if I need to make additional changes.

coreysnipes · March 27, 2020, 7:25pm

Success! GitHub - OpenDroneMap/odm-benchmarks: Benchmark data index for OpenDroneMap and WebODM
Thanks @smathermather

smathermather · March 27, 2020, 7:36pm

Looking good! Thank Corey! I’ll start running tests on my hardware and post them.

RainyRockies · April 1, 2020, 9:08pm

Ok, my main processing computer is free and I’ve started to process some of these datasets.

If I could make a small suggestion the data fields? I think CPU clock speeds, RAM clock speeds, # of cpu cores, and storage type (SSD vs HDD) could be valuable data to gather as well. Obviously not everybody will know how to find this info out so it definitely doesn’t have to be exclusionary/necessary to share your benchmarks. For many other applications the difference between someone who’s running a six core i5 overclocked at 4.2ghz vs someone who is running a four core i5 at 2.8ghz is immense.

edit: also I’m currently zipping up my big dataset for people with absolute beasts of a server/clusters to try and process. What is the best avenue of sending it to you? I’m trying to break it up into separate downloads.

coreysnipes · April 2, 2020, 11:28am

Excellent! I struggled with which fields to include in the tables, as I think we need to balance readability with completeness. Maybe the answer is to capture a wide list of fields (as available) as raw data elsewhere in the repo, and make the main github README an overview. I think your suggestions are good. Please do capture that info and I’ll try to do the same.

Another thing I have been pondering is how to capture and show certain config parameters. I’m just noting “3D Model, Resize 2048” but it would be better to capture individual parameters from the config line somehow. Those are the real variables for comparison.

Anyway, I appreciate your help with this @RainyRockies .

coreysnipes · April 2, 2020, 11:36am

Hmm, good question. Ideally we make these available for others to process too. Let me give that some thought. I might start by mailing you a couple of SD cards, if that seems workable.

smathermather · April 2, 2020, 2:07pm

Yes, I like the idea of having a separation of the data and the summary. Then, one could include the full parameter set as well.

RainyRockies · April 2, 2020, 7:11pm

I’d be happy to pm you my address and make that work. I’m also happy to host them on Google drive for you to download, but I understand if using that much bandwidth isn’t workable for you.

Right now the zipped dataset is 40gb, split into 8 separate files that are each ~4-6gb. I wish I had counted them before zipping, but I believe there are between 8,000-10,000 images. Certainly not the largest project I have, but the largest one I’ve successfully processed!

edit: Also looks like there’s currently a pretty large dataset available with ~7,000 images (ziegeleipark GitHub - zivillian/odm_ziegeleipark). So maybe mine will only be of interest when people are consistently processing that dataset.

Also on a separate note, geez louis not being critical but whoever mapped that Zeigal Park dataset much have been flying incredibly low with immense overlap. They basically matched my # of photos in an area 1/9 the size.

coreysnipes · May 23, 2020, 2:20pm

Benchmark data rafactoring is underway. I added a /data/ directory to store the raw benchmark results in a CSV, and will use that to generate the README.md. I also expanded the number of fields captured in the benchmark results, so we have better insight into the machine specs, ODM configuration, and processing results. Current CSV benchmark data is here on github.

The intended use of this repo is now:

View README.md to see benchmark data (suits most needs)
View raw CSV data if you want more detail

Next up is writing a script to parse the CSV data, and auto-generate the README.md

@fpolig01 - Would you confirm I translated your results correctly in the CSV file? You can post here if changes are needed, or submit a pull request

@adamsteer @smathermather @pierotofy - You may be interested in seeing the expanded field list in the CSV.

smathermather · May 23, 2020, 3:37pm

This looks like a much more flexible approach. Nice work.

skypuppy · June 3, 2020, 2:29am

Be nice to have the number of pictures as the 2nd column.

coreysnipes · June 4, 2020, 1:59am

Good suggestion. I’ll add a note to do that.