That looks like maybe a good task for object detection and extraction. I’ve seen work on this using LiDAR-derived point clouds at my former university, so maybe a quick search of the literature might turn up projects other folks are working on?
Another direction might be automatic segmentation of the 3D model.
If I understand you correctly, you may need two steps to achieve the target:
extract the buildings body from a 3D scene.
transfer the building roof to polygons and compute the precise roof dimensions.
As Saijin_Naib says, the point cloud segmentation could be a method to extract buildings, but the results rely on point cloud quality, you may need LiDAR to generate high-quality point cloud data to do the segmentations.
After the building extraction. You can do the roof vectorization. May be Convolutional Neural Network for object segmentation can help, such as U-Net or SegNet (but you need transfer the extracted buildings into 2D images and try to keep coordinate information).
(I have a question here, for the roof length/area, do you need projected or actual value? In your sample image, it seems the projected value, if you need projected value, it means you can get it from orthophoto, it could be easier to do the detection, segmentation and vectorization)