Traditionally, orthomosaic quality assessment has been largely qualitative, relying on visual inspection. While useful, this approach becomes limiting when:
Comparing multiple versions at scale
Benchmarking performance improvements
Validating algorithmic changes (e.g., Fast-Ortho vs standard pipelines)
Identifying subtle differences requiring further investigation
To address this, we explored both localized and global comparison approaches using:
Key Insight:
Due to the use of RANSAC in photogrammetry pipelines, comparisons must tolerate noise, misalignment, and outliers rather than assuming perfect pixel alignment.
The DEM comparison results show that within-version consistency is very high, with SSIM values consistently around ~0.93β0.97 and ORB remaining at 1.0, indicating strong repeatability across runs of the same version. In the progression analysis, there is a noticeable drop in similarity between versions 3.1.0 and 3.5.1 (SSIM ~0.73), suggesting a significant processing change, after which similarity stabilizes again (~0.95β0.96) for later versions. The baseline comparison (3.0.0 vs others) confirms this pattern, showing a clear divergence from newer versions (SSIM ~0.73β0.74), indicating a systematic shift introduced after early versions, while ORB remains consistently high, implying structural features are preserved despite elevation differences.
The orthophoto comparison results indicate moderate within-version consistency, with SSIM ranging roughly from ~0.40 to ~0.73 and ORB showing greater variability, including a sharp drop in versions 3.5.4β3.5.5, suggesting sensitivity to radiometric or feature changes. In the progression analysis, similarity fluctuates rather than stabilizing, with SSIM dipping notably between early versions and partially recovering in later ones, while ORB varies between ~0.53 and ~0.74, indicating inconsistent feature matching across updates. The baseline comparison further highlights this instability, showing significant divergence from the original version and even missing or weak comparable results in some cases, implying that orthophotos are more affected by processing changes than DEMs, particularly in texture, color balance, and feature detectability.
This is an initial step toward a more rigorous evaluation framework for ODM outputs.
SSIM and ORB provide a useful starting point, but they are not sufficient on their own.
The broader goal should be:
A multi-metric, robust, and community-driven benchmarking approach for orthomosaic quality and consistency.