Hi Gabor,
As you may know, we can divide the initial stages into the following:
- Feature extraction (find things to match)
- Matching
- Structure from Motion (including bundle adjustment)
- Multi-view stereo
Sparse points are the 3D representation of the matches between photos from structure from motion.
Dense points are the ones extracted in the multi-view stereo stage, which uses filtered depth maps from the pairs of matched images to increase the density of points.
They are quantitative measures, but there may be better ways to discern improvements. Human visual acuity I find to be far more useful.
The other things to look at in the report are how even are the sparse points and are there important areas that are not well covered. This one is pretty even:

vs. this one which has no matches through the center (it’s a river, so this is understandable in this case):
It is useful to look at the survey data. How many images are available for the reconstruction at any given area. This is the same as the first dataset which has decent sparse matching, but as it is over a vegetated area, there are lots of occlusions. We like to see 4 or 5+ from all locations, but we have a fair amount of 2, 3, and missing coverage:
This doesn’t necessarily mean we have bad data, but it does mean that the data in the green areas will be much better than the data in other areas.
We can also look at the strength of the matching between images in Track Details, feature use across the image frame in Feature Details, and finally it’s useful to look at Camera Models Details and see whether we see a strong pattern in the camera model, which is not a good sign, or if it’s relatvely noise residuals which is a good sign – it means we probably got a representative self-calibration of the camera.
Looks like Saijin has you on the right track too. Cheers.