Split large data set - guidelines and recommendations

Hi,

I was looking around the forum to fish for recommendations on starting point for setting --split and --split-overlap. Came across few specific threads but thought to start a new topic on find a good rule of thumb to start from. For instance, would be great to set a recommendation for a 5000 image data set such as: --split = # images / 10 and --split-overlap swath / 2 or anything else that people had good experience with. What about 10000 images? larger data set?

Suggestions?

Tnx !

Choose the largest number a node can safely process; less splits = better results.

2 Likes

Does memory consumption scale linearly with image count, megapixels per image, or total pixel count (images x MP)?

Would one be able to make a safe guesstimate based upon available RAM, image MP, and image count to say, for instance, On 16GB you can safely process 3.2GP, so 200x16MP images per split.

Thanks Saijin.
That’s exactly what I’m trying to put the finger on… a valid guesstimate based on HW params.
How did you arrive at 16GB => 3.2Gpix? experience or some inherent functionality? what’s the transformation?

1 Like

@gast, that is merely an example number I pulled from nothing. I was just musing and wanted to use simple/round numbers for my example.

I’d need to do some profiling to get a better sense of what the relationship might be.

A good place to start is with the ClusterODM example slugs:

Specifically:

    "imageSizeMapping": [
        {"maxImages": 40, "slug": "s-2vcpu-2gb"},
        {"maxImages": 250, "slug": "s-4vcpu-8gb"},
        {"maxImages": 500, "slug": "s-6vcpu-16gb"},
        {"maxImages": 1500, "slug": "s-8vcpu-32gb"},
        {"maxImages": 2500, "slug": "s-16vcpu-64gb"},
        {"maxImages": 3500, "slug": "s-20vcpu-96gb"},
        {"maxImages": 5000, "slug": "s-24vcpu-128gb"}
1 Like

One thing I should add to this: this assumes a 1.5 or 2x inclusion of swapfile, so for 128GB, one would have a 192-256GB swapfile.

1 Like

And that maximages, does that assume a certain MP resolution per image, or is it solely tied to image count and a 1.2MP image counts the same as a 12MP image towards the RAM limit?

Good question - does ODM load all images in entirety to memory when processing begins or is there some other managed flow to loading images into memory?

1 Like

It’s pretty efficient for most steps. OpenSfM (if memory serves me) does a bit of streaming, I think and writes out a lot to disk, but not in an IO limiting way. I will be doing some profiling over the next week for a project, so I will update this thread as I know more, but I think the mvs-texturing step is the current memory bottleneck.

1 Like

Hmm, I don’t know. There may be some assumptions that are made based on the ways that Piero has shaped available settings in the lightning network. That said, using these settings on a previous project with auto-scaling, I don’t think I hit any memory issues, and I was using some pretty aggressive product resolution and depthmap resolution settings.

1 Like

I recently tried to run about 2600 images, each around 5MB with
--split 500
--split-overlap 50
on a 50GB memory machine which ended using up 100GB of disk for writing (…and crashed since I didn’t have more disk) ;). So yes - OpenSfM seems to use up disk.

1 Like

Haha! The (generous) estimate I use for disk usage is 10x during processing. So 50GB allocate 500GB. :man_facepalming:

3 Likes