I am seeing an issue that I wonder if others have seen: I am having depthmap calculation failures that are really strange: depthmaps run for a while and then eventually only run on one of many cores on the machine and stall indefinitely. Is anyone else seeing this? I have successfully processed this dataset in the recent past.
I’ve had it a few times under WSL2. If I rerun with max-concurrency 1, it will usually go off okay
Is there a stack trace with it? (Task output). I can check if I see it in the Lightning logs. Or it just hangs without finishing?
Task output is… unremarkable. It looks like it’s just using 1 of 96 threads, so I suppose maybe it will finish eventually, but a 14 minute job is taking hours, if it completes at all:
Estimated depth-maps 116 (6.81%, 57s, ETA 13m)...
Estimated depth-maps 117 (6.87%, 58s, ETA 13m)...
Estimated depth-maps 118 (6.93%, 58s, ETA 13m)...
Estimated depth-maps 119 (6.99%, 59s, ETA 13m)...
This is on my own machine, so no lightning.
And it’s a 96 cores machine (or 48)? Just trying to gage what kind of hardware this could be reproduced on.
It’s a 48 core/96 thread machine (4x12) that is remarkably similar to this: Dell PowerEdge R820 Server 2.40Ghz 48-Core 768GB 16x 1.2TB 12G H810 Ra – TechMikeNY (48-core poweredge 820 with 768GB RAM).
It did finish eventually. I am unclear on how long it hung out on a single thread, but based on the 12 hours it took to process, I would hazard it spend the remainder of the depth map calculations on a single thread.
Hi All - I am seeing this as well. I am running tasks that had previously taken around 1hr that I’ve let go for 12hrs plus but not sure I should just let them go. Any updates / thoughts on this?
Just found the warning as well…
“[WARNING] Failed to run process in parallel, retrying with a single thread…”
Hoping to find some sort of solution. I am running webODM. Is there a way to revert to a previous version of odm? This is the first I’ve seen this issue so hoping this could work?
Thank you in advance.
Hi all, I’m facing this as well.
I posted it earlier here, I thought it hangs at the beginning, but recently I found it just uses a single core and runs forever.
I noticed the issue mostly happens on the large dataset or “hard” dataset(low contrast images, like snow, sand, where opensfm reconstruction is not good enough). I’m using azure dsv3 series (8 to 64 cores) for processing. If you check the attached image of the opensfm reconstruction result, there are lot of noises, I guess they might be the issue? Or it just hard for openmvs to handle low constrast images?
I just tested with max-concurrency 1, it does help
Yes, but if you’ve got a big machine and want to use the cores, that’s not a great solution. It’s both good and bad to hear others are having the same issue.
Since I just encountered this issue again today, I took some time to test. I believe it has something to do with the noisy opensfm reconstruction (the image I posted above). So I manually removed those noisy points(simply remove points 130m away from the closest camera), and run openmvs again. This time it showed warnings and errors about some images are not legit, selected 118 out 138 images to move on, but the process went smoothly. I’m not sure exactly what’s going on in openmvs, but hope this could provide some insight to figure out this issue, thanks.
Very interesting. What settings (especially neighbor and images search distance settings are you using?
I’m realizing I didn’t start seeing this until I started to use clusters of streetview images with exotic matcher neighbors and matcher distance settings.
Normally I use matcher-neighbors 24 or matcher-distance 150
So, you could be getting matches that are too aggressive too. Out of curiousity, what is your matcher? FLANN (default) or BOW?
Could be, I’ll run some tests. I use FLANN
This is helpful. I am seeing something similar. How do you go about removing these points?
I edited the
reconstruction.json before running openmvs stage, remove the outlier points. Since there are lots of points, I wrote a python script to do it.
Have that in a gist or something somewhere you could share?