Method for testing for changes among ODM versions

Orthomosaic Quality Assessment Framework

1. Background

Traditionally, orthomosaic quality assessment has been largely qualitative, relying on visual inspection. While useful, this approach becomes limiting when:

  • Comparing multiple versions at scale

  • Benchmarking performance improvements

  • Validating algorithmic changes (e.g., Fast-Ortho vs standard pipelines)

  • Identifying subtle differences requiring further investigation

To address this, we explored both localized and global comparison approaches using:

  • Structural Similarity (SSIM) β€” pixel-level structural consistency

  • ORB Feature Matching β€” geometric/feature-level consistency

Key Insight:
Due to the use of RANSAC in photogrammetry pipelines, comparisons must tolerate noise, misalignment, and outliers rather than assuming perfect pixel alignment.


3 Likes

2. Code Walk-through

a) Reprojection Alignment (reproj_match)

Ensures that comparisons are valid by forcing both orthomosaics onto the same:

  • CRS

  • Spatial resolution

  • Grid alignment

Why it matters:
Without this, SSIM comparisons would be meaningless due to pixel misalignment.


b) Image Normalization (get_8bit_tif)

Key steps:

  • Converts multi-band imagery β†’ grayscale

  • Handles nodata properly

  • Normalizes values to 8-bit (0–255)

  • Returns both image and valid data mask

Important improvement:
Returning the valid_mask prevents comparing nodata regions.


c) Masked Comparison

common_mask = mask_ref & mask_test

Only overlapping valid pixels are compared.

Why this is crucial:

  • Removes bias from missing data

  • Ensures fair comparison between outputs


d) ORB Similarity

  • Detects keypoints using ORB

  • Matches descriptors

  • Computes ratio of β€œgood matches”

Interpretation:

  • Measures feature-level consistency

  • More robust to brightness and minor distortions


e) Structural Similarity (SSIM)

  • Measures luminance, contrast, and structure similarity

  • Output range: [-1, 1]

Interpretation:

  • Closer to 1 β†’ highly similar structure

  • Sensitive to pixel-level differences


f) Aggregation and Interpretation

Results are aggregated per version:

  • Mean SSIM

  • Mean ORB

  • Standard deviation

With a simple classification:

PASS / WARNING / FAIL

This provides a quick diagnostic layer on top of raw metrics.


2 Likes

3. Results and Interpretation

1. DEM Comparisons

The DEM comparison results show that within-version consistency is very high, with SSIM values consistently around ~0.93–0.97 and ORB remaining at 1.0, indicating strong repeatability across runs of the same version. In the progression analysis, there is a noticeable drop in similarity between versions 3.1.0 and 3.5.1 (SSIM ~0.73), suggesting a significant processing change, after which similarity stabilizes again (~0.95–0.96) for later versions. The baseline comparison (3.0.0 vs others) confirms this pattern, showing a clear divergence from newer versions (SSIM ~0.73–0.74), indicating a systematic shift introduced after early versions, while ORB remains consistently high, implying structural features are preserved despite elevation differences.

2. Orthophoto Comparison

The orthophoto comparison results indicate moderate within-version consistency, with SSIM ranging roughly from ~0.40 to ~0.73 and ORB showing greater variability, including a sharp drop in versions 3.5.4–3.5.5, suggesting sensitivity to radiometric or feature changes. In the progression analysis, similarity fluctuates rather than stabilizing, with SSIM dipping notably between early versions and partially recovering in later ones, while ORB varies between ~0.53 and ~0.74, indicating inconsistent feature matching across updates. The baseline comparison further highlights this instability, showing significant divergence from the original version and even missing or weak comparable results in some cases, implying that orthophotos are more affected by processing changes than DEMs, particularly in texture, color balance, and feature detectability.


3. Standard Deviations Matter

  • Low variance β†’ stable performance

  • Higher variance β†’ inconsistent outputs across runs

Interpretation:

  • Stability can be as important as mean performance

  • A slightly lower mean with low variance may be preferable


4. Fast-Ortho vs Version Effects

Where applicable, differences suggest:

  • Processing strategy (e.g., Fast-Ortho) influences:

    • Geometric consistency (ORB)

    • Less impact on structural similarity (SSIM)


5. Key Insight

High ORB does not guarantee high SSIM similarity

This reinforces that:

  • Pixel similarity β‰  geometric/feature consistency

  • Both metrics are complementary, not interchangeable


4. Conclusion and Future Work

Conclusion

This experiment demonstrates that it is possible to:

  • Build a quantitative benchmarking pipeline

  • Evaluate ODM outputs across versions and runs

  • Move beyond purely visual inspection

However, orthomosaics are inherently difficult to compare on a strict pixel-by-pixel basis.

Even small differences in:

  • Reprojection

  • Blending

  • Interpolation

can produce measurable variation without necessarily degrading quality.


Limitations of Current Approach

  • SSIM

    • Sensitive to small pixel shifts
  • ORB

    • Dependent on detectable features

    • May fail in low-texture regions

  • Both

    • Influenced by preprocessing choices (grayscale conversion, masking)

Potential Improvements / Future Directions

  1. Additional Metrics

    • Mutual Information

    • PSNR

    • Learned perceptual metrics (LPIPS)

  2. Geospatial Consistency Checks

    • Compare tie points or GCP residuals

    • Evaluate alignment error directly

  3. Patch-Based Comparison

    • Compare tiles instead of full images

    • Reduce sensitivity to global misalignment

  4. Feature Matching with Geometric Verification

    • Integrate RANSAC filtering

    • Measure inlier ratios after validation

  5. Semantic / Object-Based Comparison

    • Compare extracted features (roads, buildings, vegetation indices)

    • More meaningful than raw pixel comparison

  6. Benchmark Dataset Creation

    • Standardized ODM test datasets

    • Known expected outputs for regression testing


3 Likes

Final Thought

This is an initial step toward a more rigorous evaluation framework for ODM outputs.
SSIM and ORB provide a useful starting point, but they are not sufficient on their own.

The broader goal should be:
A multi-metric, robust, and community-driven benchmarking approach for orthomosaic quality and consistency.

The script that was used is available on Github: ODM_intergration_test/Working Similaritry rasyterio inversion-statistics.ipynb at main Β· hubertsamboko/ODM_intergration_test Β· GitHub

3 Likes

Awesome job, this is definitely in our roadmap. Wew will do a thorough review of your changes and expand if needed.

3 Likes