Split and Merge discussion after a failed task

Someone encountered this error a few months ago, and I just hit it…but I had core dump enabled when it happened.

My log said:
2022-07-24 23:14:53,851 INFO: Shots and/or GCPs are well-conditioned. Using naive 3D-3D alignment.
block_sparse_matrix.cc:80 Check failed: num_nonzeros_ >= 0
/code/SuperBuild/install/bin/opensfm/bin/opensfm: line 12: 34264 Aborted (core dumped) “$PYTHON” “$DIR”/opensfm_main.py “[email protected]

There’s a catch though…the core file is 257GB

@pierotofy is there any point giving you this core dump or is it too big to be usable?

Probably too big to be useful. :slight_smile:

Check your input images.

1 Like

@pierotofy What kind of issues should I be looking for on the input images?

Images that might be difficult to reconstruct due to lack of features, or too few patterns, (or too many repeat patterns), blurred images, bad GPS.

1 Like

I recall seeing a small number of blurred images so if the current processing task fails I’ll re-run after removing them from the dataset.
I didn’t spot any GPS lat/long metadata issues - the images were taken using a P4RTK. I used exiftool to extract the images’ GPS coordinates into a CSV file and imported that into Google Earth Pro, and checked the lat/long coordinates were ok.
I didn’t check the altitude - are variances in the altitude metadata likely to cause this type of error?

Second attempt failed after 74 hours, same error:
2022-07-28 11:52:49,570 INFO: Shots and/or GCPs are well-conditioned. Using naive 3D-3D alignment.
block_sparse_matrix.cc:80 Check failed: num_nonzeros_ >= 0
/code/SuperBuild/install/bin/opensfm/bin/opensfm: line 12: 27532 Aborted (core dumped) “$PYTHON” “$DIR”/opensfm_main.py “[email protected]

BTW the processing step where this is failing is:
[INFO] running “/code/SuperBuild/install/bin/opensfm/bin/opensfm” reconstruct “/var/www/data/66a94f7d-3d7c-4729-bf45-850e632cee87/opensfm”

It started this step at:
2022-07-25 19:05
and the failure occurred at:
2022-07-28 11:52

I’m currently looking for a way to analyze the ~7400 images in bulk to identify which ones are blurry so I can remove them from the dataset and try again.
Does anybody have a method already figured out for doing this?

1 Like

Manually check them if you have a couple of hours free. For one job last year I looked through 23500 images for blurry ones, fortunately there weren’t any.

I’m manually checking them - figure I will have to do that anyway to be able to see if an algorithmic method works.
Out of the first 600 I’ve looked at I found 7 blurry, and most seemed to occur when the drone took a photo whilst rotating to line up another straight run.

1 Like

Yes, a high yaw/angular rotation rate can certainly cause blurring away from the axis of rotation (which may not necessarily be within the FOV). Shorter exposure times would be the fix for that, if you are unable to reduce the yaw rate in the software.

1 Like

It was a DJI P4-RTK (not mine) so I’ll ask the pilot if in future he can configure it so it doesn’t shoot whilst turning (i.e. between runs/passes), or reduce the yaw rate, or shorten the exposure time.

1 Like

Took me about 8 hours to check ~7400 photos and remove the blurred ones.
Setting up a task to process the “clean” set of photos now…

1 Like

I used exiftool to extract the altitude metadata in bulk and put it in Excel, then used formulas to spot outliers, charted the data including & excluding the outliers. The typical variance was about 30cm (it was a P4RTK) and there was only 1 photo that was about 1.5m variance, and one photo of the pilot taking a selfie (which I had spotted when checking for blurry images, but at least exiftool enables catching things like that in an automated way). I removed them from the image set and restarted the processing…

1 Like

Oh dear - failed after 57 hours with same error message:
2022-08-02 10:37:11,715 INFO: -------------------------------------------------------
2022-08-02 10:37:12,941 INFO: Tile_2_DJI_0326.JPG resection inliers: 8686 / 20184
2022-08-02 10:37:13,110 INFO: Adding Tile_2_DJI_0326.JPG to the reconstruction
2022-08-02 10:37:18,282 INFO: Shots and/or GCPs are well-conditioned. Using naive 3D-3D alignment.
block_sparse_matrix.cc:80 Check failed: num_nonzeros_ >= 0
/code/SuperBuild/install/bin/opensfm/bin/opensfm: line 12: 101198 Aborted (core dumped) “$PYTHON” “$DIR”/opensfm_main.py “[email protected]

I’m not sure what to investigate next…does anyone have any suggestions?

1 Like

What filesystem is this volume on? Have you checked SMART and fsck’d it?

It’s on a RAID volume (HP server hardware). I’ll have to do some reading to see how to run a fsck safely on it.

1 Like

I’ve configured it to do a full file systems check on boot. Rebooting now so we’ll see how long it takes to complete the fsck…

1 Like

Didn’t look like any significant volume of errors were found.
I’ve updated WebODM and nodeODM:GPU and am trying with a small subset of the images next.

1 Like

Update:
I processed each of the image subsets (grouped by flight/sortie) one-by-one with no errors. That suggests two potential causes for the error when processing the full set of images, a) a software issue related to the quantity of images, or b) some difference between the sortie image sets.
I’m now trying to process the full image set with a split at 5000 images to see what happens. If it succeeds that suggests a) and if it fails that suggests b).

2 Likes

Failed after 40 hours, same error message:
2022-08-13 23:37:50,917 INFO: Shots and/or GCPs are well-conditioned. Using naive 3D-3D alignment.
block_sparse_matrix.cc:80 Check failed: num_nonzeros_ >= 0
/code/SuperBuild/install/bin/opensfm/bin/opensfm: line 12: 997876 Aborted (core dumped) “$PYTHON” “$DIR”/opensfm_main.py “[email protected]

Any suggestions on next steps to identify the root cause?
I’m thinking I could try processing two adjacent tiles, see if that works, then try 3 adjacent tiles, etc. It will take a long time, but I don’t know what else to try…

2 Likes