Hello, I have a drone video footage that is about 10 minutes long. The video footage was acquired by a DJI drone. I want to generate an orthophoto of the area so I have extracted individual frames from the video using ffmpeg. I imported the frames to WebODM and initiated processing using the ‘Fast Orthophoto’ option. However, the process failed with errors like this in the stacktrace:
2022-05-05 16:02:24,897 DEBUG: Matching frame136.jpg and frame501.jpg. Matcher: WORDS (symmetric) T-desc: 0.775 Matches: FAILED
2022-05-05 16:02:24,946 DEBUG: Matching frame272.jpg and frame492.jpg. Matcher: WORDS (symmetric) T-desc: 0.151 Matches: FAILED
2022-05-05 16:02:25,045 DEBUG: Matching frame340.jpg and frame366.jpg. Matcher: WORDS (symmetric) T-desc: 1.581 Matches: FAILED
2022-05-05 16:02:25,158 DEBUG: Matching frame113.jpg and frame101.jpg. Matcher: WORDS (symmetric) T-desc: 0.918 T-robust: 0.005 T-total: 0.926 Matches: 2078 Robust: 2052 Success: True
How can I go about resolving this? Thanks in advance.
Thanks for the pointer. I’ve realized the frames I extracted do not have location metadata in them or any other useful metadata. Seems metadata is lost whenever frames are extracted from videos. I’ll refly over the area and use still images instead. Thanks.
Other than getting an absolute georeference though, is the per-image metadata so necessary for finding corresponding features between photos? I would think it helps in seeding the optimization but my first orthophoto test with WebODM (to be clear, not the 3D model attempts described in the thread @smathermather linked to above) used video frames without GPS data and turned out OK. Maybe the issue in your case has more to do with overlap?
Somewhat related, I have been wondering if absent per-photo GPS metadata will ODM use the fact that 2 photos are successively numbered to help in the optimization? The way the “Matching fileA.jpg and fileB.jpg” messages seem to look at arbitrary pairs seems to suggest not.
GPS (and pitch/roll/yaw when available) helps a lot with optimization. When GPS isn’t available, we use a bag of words matching approach that’s much faster than brute force, but still slower than having some a priori estimate of camera position or pose.
For sufficiently small datasets or sufficiently large patience, it doesn’t matter, as long as all you want is a 3D model.
Thanks! I’m working on a project where we need to do this for DJI videos, and I used ffmpeg to extract the frames, Python to extract the location info from the SRT file, and then exiftool to add the tags to the extracted frames.
I wasn’t the pilot for the mission - honestly I just assumed that’s how DJI stored the info. The videos were MP4 and had an associated SRT file with the same name.