Process frames extracted from drone video in WebODM

Hello, I have a drone video footage that is about 10 minutes long. The video footage was acquired by a DJI drone. I want to generate an orthophoto of the area so I have extracted individual frames from the video using ffmpeg. I imported the frames to WebODM and initiated processing using the ‘Fast Orthophoto’ option. However, the process failed with errors like this in the stacktrace:

2022-05-05 16:02:24,897 DEBUG: Matching frame136.jpg and frame501.jpg.  Matcher: WORDS (symmetric) T-desc: 0.775 Matches: FAILED
2022-05-05 16:02:24,946 DEBUG: Matching frame272.jpg and frame492.jpg.  Matcher: WORDS (symmetric) T-desc: 0.151 Matches: FAILED
2022-05-05 16:02:25,045 DEBUG: Matching frame340.jpg and frame366.jpg.  Matcher: WORDS (symmetric) T-desc: 1.581 Matches: FAILED
2022-05-05 16:02:25,158 DEBUG: Matching frame113.jpg and frame101.jpg.  Matcher: WORDS (symmetric) T-desc: 0.918 T-robust: 0.005 T-total: 0.926 Matches: 2078 Robust: 2052 Success: True

How can I go about resolving this? Thanks in advance.

1 Like

That sounds very similar to this post:

1 Like

Thanks for the pointer. I’ve realized the frames I extracted do not have location metadata in them or any other useful metadata. Seems metadata is lost whenever frames are extracted from videos. I’ll refly over the area and use still images instead. Thanks.

1 Like

Other than getting an absolute georeference though, is the per-image metadata so necessary for finding corresponding features between photos? I would think it helps in seeding the optimization but my first orthophoto test with WebODM (to be clear, not the 3D model attempts described in the thread @smathermather linked to above) used video frames without GPS data and turned out OK. Maybe the issue in your case has more to do with overlap?

Somewhat related, I have been wondering if absent per-photo GPS metadata will ODM use the fact that 2 photos are successively numbered to help in the optimization? The way the “Matching fileA.jpg and fileB.jpg” messages seem to look at arbitrary pairs seems to suggest not.

1 Like

GPS (and pitch/roll/yaw when available) helps a lot with optimization. When GPS isn’t available, we use a bag of words matching approach that’s much faster than brute force, but still slower than having some a priori estimate of camera position or pose.

For sufficiently small datasets or sufficiently large patience, it doesn’t matter, as long as all you want is a 3D model.


Is the GPS metadata stored in an SRT file? If so you could add the GPS tags to each image using the GPS data in the SRT file.

1 Like

Get pointer, Jack (and welcome!)

What systems have you come across that do this? For me, I only know of OpenCamera for Android.