Point Cloud Classification Datasets?

Hey all :wave:

Currently exploring the task of semi-automated point cloud classification (e.g. labeling points in a point cloud as belonging to one of several classes, e.g. ground, road, building, vegetation, etc.).

I’m having difficulties finding labeled point cloud datasets that could be used for training a classifier.

Does anyone know of good resources?


Found so far:

1 Like

I’ve been thinking about training a neural network to do that but I have not got to it, mainly a time problem.

But I’m going to play a bit with R and hopefully get some value from that.

1 Like

Take a look at ISPRS:

I haven’t used them myself, but they should include reference publications so you can figure out how well you are doing against the published state-of-the-art as well.

Edit, also these (some are 2D, some 3D, and a mix of LiDAR and photogrammetric):


Useful, thanks!

Also looking for suggestions on software for manually classifying point clouds. I know this can be done with CloudCompare, but perhaps there are more specialized tools.

1 Like

I’m assuming that you want FOSS tooling, right?

I used to use Merrick MARS.

I have some LiDAR from my thesis processed in MARS still, and my municipality releases all their classified LiDAR from Pictometry as open data.

1 Like

Preferably :slight_smile:

1 Like

I imagine you could do it all with PDAL. They’ve recently added pdal-parallelizer, which should speed things up too.

1 Like

PDAL has only ground/non-ground classifiers. Filters — pdal.io (which we already use).

I’m (roughly) following the method of:

Timo Hackel, Jan D Wegner, and Konrad Schindler. Fast semantic segmentation of 3d point clouds with strongly varying density. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Prague, Czech Republic, 3:177–184, 2016

Which is for the most part conveniently implemented in CGAL.

1 Like

First attempt:




Sure, but there’s more options than that. You could use filter range to query RGB values, or height above ground values, or a std limit based off some clustering function for finding shapes, and you can tag those queries for classifying or probably set a new classification value (i havent tried this yet). Here’s me just messing around with different filters in a pipeline. This isn’t a polished example, but it tags the original PC, samples it for faster ground classification, then merges the classified PC to create a HAG which i then filter to the limits, -0.5m to 50m. I imagine with a few more lines of code you could tag vertical intervals based of some max z value and RGB value to stratify veg.

But your first attempt looks pretty great with that code!


It’s like CGAL Polyhedron if I understand well. It needs some enhancements and tweaks to perform better


Second attempt:


That’s very nice!

I hope to get that working in R.

I would need a function that searches for specific features and give me a centroid point for each future.

It would really help me in my field of work

1 Like

Attempt #3 (without local smoothing) after rewriting the classifier code from scratch without CGAL:


Oh, this is phenomenal!

Well a bit noisy; I think there’s lots of room for improvement. I’m going to replace random forests with gradient boosting and see how that performs, try various scales, plus I haven’t verified that all features are computed perfectly.

1 Like

Attempt #4.

15 million points, classified in ~6 minutes on a scrappy laptop.


Very cool.

What classes are being used? LAS/ASPRS normal ones?

The addition of Eye Dome Lighting helps legibility a ton, too.