Point Cloud Classification Datasets?

Attempt #4.

15 million points, classified in ~6 minutes on a scrappy laptop.


Very cool.

What classes are being used? LAS/ASPRS normal ones?

The addition of Eye Dome Lighting helps legibility a ton, too.

Ground, vegetation, buildings and vehicles (the last one is not LAS/ASPRS).


Looking great!


Attempt #5 with the addition of local smoothing:


Attempt #6 with more training data, better features, memory efficiency tweks:

Updated repository with new name: GitHub - uav4geo/OpenPointClass: Fast and memory efficient semantic segmentation of 3D point clouds

15 million points classified in 1 minute, 30 seconds on the same scrappy 4-core laptop (which blows out of the water anything else out there AFAIK).

1 Like

A few more screenshots:


I’ve published the model on the GitHub page so others might play with it.

To be noted that this was trained using a very small number (2) of datasets, so I would expect large improvements to be gained by creating more high quality training data.


Looks cool! Addinng to and hopefully not repeating what @vonnonn said - what about adding some geometric feature analysis? for example flatness, curvature… the model is inferring those things anyway from training data :slight_smile:

can we use point neighbourhood geometic characteristics and color values to infer shapeness of things and label them? (an old example: point-cloud-processing-static ) PDAL, again, has many filters which can be chained up to do things like this. My experience with filters.covariancefeatures is that processing big dense point clouds is a sit back and wait affair :slight_smile:


It does all of that, but faster and using multiple scales. OpenPointClass/features.hpp at main · uav4geo/OpenPointClass · GitHub

This is not a neural network; it’s a random forest classifier, with well-defined features, mostly parameter-free that is able to generalize to different point densities and scenes. The features are extracted from a multi-scale pyramid, which uncover details that cannot be captured at single scale.

In short, suitable for automated workflows like ODM.


Maybe a bit late in the discussion but if you are looking for datasets how about the (complete) Netherlands :wink:

The Dutch government provides lidar datasets (.las / .laz) as open data. The AHN4 (Actueel Hoogtebestand Nederland roughly translated as Recent Heightfile Netherlands) also has Classifications and is colorgraded with RGB colors.


  1. Go to Geotiles.nl
    2a) Zoom in to a certain level and click on a tile (files are usually 4GB)
    2b) Zoom a bit further and you can download smaller tiles and click on a tile (Files are usually 300mb)
  2. Download the AHN4 laz file


I tried one of these laz files in Potree viewer and I was surprised the pointcloud had classes. It seems only the class buildings is working, but hope it helps your question.

without buildings

only buildings


The issue with the Geotiles.nl datasets is that they have not been manually classified (?)

They do not differentiate between buildings, vegetation, etc.

Maybe I just picked a “bad” tile, as the screenshot you showed seemed to have buildings classified separately from vegetation.

Edit: higher resolution tiles seem to have the proper building classification. :tada:


In CloudCompare there’s a function where you can calculate a value for every point depending on the relation to adjacent points. Like if it’s part of flat surface it gets a high number. I use that to decimate clouds.

There’s also a function for separating non-ground objects and ground.

I wonder if there’s something in there that can give you some clues on how to go further in your quest. It’s open source so maybe you can find some code.

1 Like

Attempt #7 with yet more training data, better sampling strategies and parameter tweaks:


Attempt #8 using gradient boosted trees instead of random forest:


And finally, the result of completely automated classification (SMRF + OpenPointClass) as integrated in ODM:

With ground, vegetation and buildings mostly correctly classified (~14% error).

Vehicles and other gray points in the screenshot above are “unclassified” points (hopefully soon we’ll train more powerful models that can differentiate more classes, more reliably).

Enabled via --pc-classify (as usual, nothing changed).


Some misclassification (highlighted) still expected, but as more training data is added I expect these results to improve.


It would also be interesting to apply regularization via graph cuts as it’s implemented in CGAL, although I suspect performance could suffer significantly and I’m not sure it’d be worth it in the end.

1 Like

This is brilliant work!

So can we support this by tracking down more classified point cloud datasets?
We can also maybe provide some high end compute if that’s of use to you

1 Like

Yes; I’ll have a central repository for community contributions opened soon.

The classifier is actually pretty efficient both memory wise and speed wise, so doesn’t require a whole lot of computing power (that said, we’ll see…)


This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.