Method for testing for changes among ODM versions

Huey · April 16, 2026, 10:16am

Orthomosaic Quality Assessment Framework

1. Background

Traditionally, orthomosaic quality assessment has been largely qualitative, relying on visual inspection. While useful, this approach becomes limiting when:

Comparing multiple versions at scale
Benchmarking performance improvements
Validating algorithmic changes (e.g., Fast-Ortho vs standard pipelines)
Identifying subtle differences requiring further investigation

To address this, we explored both localized and global comparison approaches using:

Structural Similarity (SSIM) — pixel-level structural consistency
ORB Feature Matching — geometric/feature-level consistency

Key Insight:
Due to the use of RANSAC in photogrammetry pipelines, comparisons must tolerate noise, misalignment, and outliers rather than assuming perfect pixel alignment.

Huey · April 24, 2026, 2:24am

2. Code Walk-through

a) Reprojection Alignment (`reproj_match`)

Ensures that comparisons are valid by forcing both orthomosaics onto the same:

CRS
Spatial resolution
Grid alignment

Why it matters:
Without this, SSIM comparisons would be meaningless due to pixel misalignment.

b) Image Normalization (`get_8bit_tif`)

Key steps:

Converts multi-band imagery → grayscale
Handles nodata properly
Normalizes values to 8-bit (0–255)
Returns both image and valid data mask

Important improvement:
Returning the valid_mask prevents comparing nodata regions.

c) Masked Comparison

common_mask = mask_ref & mask_test

Only overlapping valid pixels are compared.

Why this is crucial:

Removes bias from missing data
Ensures fair comparison between outputs

d) ORB Similarity

Detects keypoints using ORB
Matches descriptors
Computes ratio of “good matches”

Interpretation:

Measures feature-level consistency
More robust to brightness and minor distortions

e) Structural Similarity (SSIM)

Measures luminance, contrast, and structure similarity
Output range: [-1, 1]

Interpretation:

Closer to 1 → highly similar structure
Sensitive to pixel-level differences

f) Aggregation and Interpretation

Results are aggregated per version:

Mean SSIM
Mean ORB
Standard deviation

With a simple classification:

PASS / WARNING / FAIL

This provides a quick diagnostic layer on top of raw metrics.

Huey · April 24, 2026, 2:25am

3. Results and Interpretation

1. DEM Comparisons

The DEM comparison results show that within-version consistency is very high, with SSIM values consistently around ~0.93–0.97 and ORB remaining at 1.0, indicating strong repeatability across runs of the same version. In the progression analysis, there is a noticeable drop in similarity between versions 3.1.0 and 3.5.1 (SSIM ~0.73), suggesting a significant processing change, after which similarity stabilizes again (~0.95–0.96) for later versions. The baseline comparison (3.0.0 vs others) confirms this pattern, showing a clear divergence from newer versions (SSIM ~0.73–0.74), indicating a systematic shift introduced after early versions, while ORB remains consistently high, implying structural features are preserved despite elevation differences.

2. Orthophoto Comparison

The orthophoto comparison results indicate moderate within-version consistency, with SSIM ranging roughly from ~0.40 to ~0.73 and ORB showing greater variability, including a sharp drop in versions 3.5.4–3.5.5, suggesting sensitivity to radiometric or feature changes. In the progression analysis, similarity fluctuates rather than stabilizing, with SSIM dipping notably between early versions and partially recovering in later ones, while ORB varies between ~0.53 and ~0.74, indicating inconsistent feature matching across updates. The baseline comparison further highlights this instability, showing significant divergence from the original version and even missing or weak comparable results in some cases, implying that orthophotos are more affected by processing changes than DEMs, particularly in texture, color balance, and feature detectability.

3. Standard Deviations Matter

Low variance → stable performance
Higher variance → inconsistent outputs across runs

Interpretation:

Stability can be as important as mean performance
A slightly lower mean with low variance may be preferable

4. Fast-Ortho vs Version Effects

Where applicable, differences suggest:

Processing strategy (e.g., Fast-Ortho) influences:
- Geometric consistency (ORB)
- Less impact on structural similarity (SSIM)

5. Key Insight

High ORB does not guarantee high SSIM similarity

This reinforces that:

Pixel similarity ≠ geometric/feature consistency
Both metrics are complementary, not interchangeable

4. Conclusion and Future Work

Conclusion

This experiment demonstrates that it is possible to:

Build a quantitative benchmarking pipeline
Evaluate ODM outputs across versions and runs
Move beyond purely visual inspection

However, orthomosaics are inherently difficult to compare on a strict pixel-by-pixel basis.

Even small differences in:

Reprojection
Blending
Interpolation

can produce measurable variation without necessarily degrading quality.

Limitations of Current Approach

SSIM
- Sensitive to small pixel shifts
ORB
- Dependent on detectable features
- May fail in low-texture regions
Both
- Influenced by preprocessing choices (grayscale conversion, masking)

Potential Improvements / Future Directions

Additional Metrics
- Mutual Information
- PSNR
- Learned perceptual metrics (LPIPS)
Geospatial Consistency Checks
- Compare tie points or GCP residuals
- Evaluate alignment error directly
Patch-Based Comparison
- Compare tiles instead of full images
- Reduce sensitivity to global misalignment
Feature Matching with Geometric Verification
- Integrate RANSAC filtering
- Measure inlier ratios after validation
Semantic / Object-Based Comparison
- Compare extracted features (roads, buildings, vegetation indices)
- More meaningful than raw pixel comparison
Benchmark Dataset Creation
- Standardized ODM test datasets
- Known expected outputs for regression testing

Huey · April 24, 2026, 2:25am

Final Thought

This is an initial step toward a more rigorous evaluation framework for ODM outputs.
SSIM and ORB provide a useful starting point, but they are not sufficient on their own.

The broader goal should be:
A multi-metric, robust, and community-driven benchmarking approach for orthomosaic quality and consistency.

The script that was used is available on Github: ODM_intergration_test/Working Similaritry rasyterio inversion-statistics.ipynb at main · hubertsamboko/ODM_intergration_test · GitHub

DodgySpaniard · April 24, 2026, 7:45am

Awesome job, this is definitely in our roadmap. Wew will do a thorough review of your changes and expand if needed.

Method for testing for changes among ODM versions

Orthomosaic Quality Assessment Framework

1. Background

2. Code Walk-through

a) Reprojection Alignment (reproj_match)

b) Image Normalization (get_8bit_tif)

c) Masked Comparison

d) ORB Similarity

e) Structural Similarity (SSIM)

f) Aggregation and Interpretation

3. Results and Interpretation

1. DEM Comparisons

2. Orthophoto Comparison

3. Standard Deviations Matter

4. Fast-Ortho vs Version Effects

5. Key Insight

4. Conclusion and Future Work

Conclusion

Limitations of Current Approach

Potential Improvements / Future Directions

Final Thought

a) Reprojection Alignment (`reproj_match`)

b) Image Normalization (`get_8bit_tif`)