Mapping 2D pixel values to 3D point cloud coordinates

Hello.

I am interested in mapping 2D pixel values from the original images to the point cloud. The inverse mapping (point cloud to pixel values) is quite straightforward using the build-in methods in OpenSfM:

shot = rec.shots[image]
pt2D = shot.project(pt3D)
pt2D_px = cam.normalized_to_pixel_coordinates(pt2D)

However, I did not manage to find the suitable methods to map a 2D pixel in the original image to the corresponding point in the 3D point cloud. From my limited experience in Metashape, there is a direct approach for this mapping, however, I’m still unable to accomplish this even indirectly.

I have tried the following:

nube = o3d.io.read_point_cloud('./opensfm/undistorted/openmvs/scene_dense_dense_filtered.ply')
data = dataset.DataSet('./opensfm')
rec = data.load_reconstruction()[0]
shot = rec.shots[image]
pose = shot.pose
cam = shot.camera
pt2D = cam.pixel_to_normalized_coordinates(pt2D_px)
bearing = cam.pixel_bearing(pt2D)
t3D_world = pose.inverse().transform(bearing)

I understand I need some form of scaling with the depth of the point, but there are no details in the documentation on how to obtain the correct depth for the given pixel.

Thanks.

3 Likes

Welcome!

This seems like very interesting work. May I ask what you’re looking to accomplish with this “backwards” mapping from 2D to Point Cloud?

2 Likes

Thank you for the interest :slight_smile:

I have several ideas in mind. One of them is to annotate a single image, using a bounding box for example, and use the point cloud to annotate the overlapping images automatically.

1 Like

That’s really powerful stuff! Unfortunately, I’m not going to be a great resource at this point as my understanding of the code is pretty poor. Maybe someone more knowledgeable will pop in soon!

Would you be interested in pushing these abilities upstream once you get them working? I think they’d be a huge asset for our community!

2 Likes

I will be happy to.

2 Likes

Awesome!

From a cursory search, it looks like this was at least asked of OpenSFM a while back, and it looks like @Yannoun provided some nice pointers. Reading the above, it looks like this is all old news to you, though.

1 Like

Indeed I have found this answer. Unfortunately, there are no guidelines or an efficient implementation for the depth estimation nor the intersection with the point cloud.

1 Like

Architecturally, do you think that everything you need and need to accomplish is doable from within OpenSFM? You just need to implement new functions?

1 Like

I believe so, however, I did not yet manage to implement a working POC.

1 Like

Hi,

I have managed to get a working POC. I will be happy to push my code once it is efficient enough.

Thank you.

3 Likes

Very cool! Very fast, too!

1 Like

This is really cool stuff! I would love to try out a feature like this. If you want another set of eyes on the code i’m happy to volunteer.

2 Likes

@oringa Any chance you have a solution I could look at? Trying to get this working but I’m not quite there… here is what I’ve got so far using the resources linked above. I’m quite new to this so any insight/help is welcome and much appreciated!

I’m also doing this in the context of WebODM… I’ve modified the NodeODM Docker image to output the opensfm data so that I can access it at --media-dir location for running this new action below.

[edit] code… its pretty close but still the projection is off a bit

def run_dataset(data: DataSet, filename, x, y, output):
    """Compute the position of a point in 3D from 2D pixel

    Args:
        output: undistorted

    """

    # Load the reconstruction, shot, pose and camera
    rec: Reconstruction = data.load_reconstruction()[0]
    shot: Shot = rec.shots[filename]
    pose: Pose = shot.pose
    camera: Camera = shot.camera

    # relative pixel coords
    pt2d = camera.pixel_to_normalized_coordinates((x, y))
    
    # bearing
    bearing = camera.pixel_bearing(pt2d)
    print('\nBearing')
    print(bearing)
    # bearing /= bearing[2]

    # Camera Dimensions
    print('\nCamera Dimensions')
    print(f'{camera.width} x {camera.width}')

    # Load depthmap dataset
    udata_path = os.path.join(data.data_path, output)
    udata = dataset.UndistortedDataSet(data, udata_path)
    dMap, plane, score, nghbr, nghbrs = udata.load_raw_depthmap(shot.id)
    print('\nDepthmap Size')
    print(f'{dMap[0].size} x {dMap[1].size}')

    print('\nMax of camera width/height') 
    m = max(camera.width, camera.height)
    print(m)

    print('\nScaling for X/Y')
    scaleX = dMap[0].size / m
    scaleY = dMap[1].size / m
    print(f'{scaleX} / {scaleY}')

    print('\nDepth Coordinates')
    dx = int(round((scaleX * float(x)), 1))
    dy = int(round((scaleY * float(y)), 1))
    print(f'{dx} x {dy}')

    print('\nPoint Depth')
    depth = dMap[dx][dy]
    if (depth == 0.0):
        depth = estimate_depth(dMap, dx, dy, 2)
    print(depth)
    
    # adjust depth
    pt3d = bearing * depth
    print('\nPoint 3D (bearing * depth)')
    print(pt3d)

    ##
    xxxx = shot.project(pt3d)
    pt2D_px = camera.normalized_to_pixel_coordinates(xxxx)
    print('\nPixel Reverse Projection')
    print(pt2D_px)
    ##

    # this point is in the camera coordinate system
    pt3d_world = pose.inverse().transform(pt3d)
    print('\nPoint 3D World')
    print(pt3d_world)

    # gps coords
    gps_world = rec.get_reference().to_lla(*pt3d_world)
    print('\nGPS World')
    print(gps_world)

    # ECEF From LLA
    ecef = ecef_from_lla(*gps_world)
    print('\nECEF')
    print(ecef)

    # topocentric
    topo = topocentric_from_lla(gps_world[0], gps_world[1], gps_world[2], 0, 0, 0)
    print('\nTopocentric')
    print(topo)

    # UTM
    utmdata = utm.from_latlon(gps_world[0], gps_world[1])
    print('\nUTM')
    print(utmdata)

    # 3D Point Annotation
    print('\nPoint Annotation')
    print(f'{utmdata[0]}, {utmdata[1]}, {gps_world[2]}')


def estimate_depth(map, x, y, steps):
    step = 1
    depth = map[x][y]
    print('point depth')
    print(depth)
    if depth != 0.0:
        while(step <= steps):
            sum = 0
            devisor = 0
            for i in range(x-step, x+step+1):
                devisor += 1
                for j in range(y-step, y+step+1):
                    sum = sum + map[i][j]
                    devisor += 1
            average = sum / devisor
            if depth != 0:
                depth = (depth + average) / 2
            else:
                depth = average
            
            print('average depth')
            print(depth)
            step += 1
    return depth
1 Like

Hi.

This was my general approach: Given the origin of the camera (p1) and the normalized coordinates of a pixel (p2), I calculated the shortest distance from the ray given by p1,p2 to all points in the point cloud. This code is relatively fast however it is possible to increase its performance by using a better data structure, which I did not implement yet. Let me know if that helps.

p1 = shot.pose.get_origin()
p2 = cam.pixel_to_normalized_coordinates([x, y])
bearing = cam.pixel_bearing(p2)
scale = 1 / bearing[2]
bearing = scale * bearing
p2 = pose.inverse().transform(bearing)
points = np.asarray(nube.points) #point cloud
res = np.linalg.norm(np.cross(p2-p1, p1-points), axis=1)/np.linalg.norm(p2-p1)

From here, you can use nube.points[np.argmin(res)] in order to find the 3D coordinates of the point in the point cloud corresponding to the original x,y values of the pixel. and project it to all other images in your corpus to obtain the images in which that pixel appears.

3 Likes

Hello,

Thank you for the explanation! I think this makes sense but now I have a new problem. I’m using the WebODM viewer (potree) to check if the 3d point matches up to where it should be on the point cloud by adding an annotation like this…

viewer.scene.addAnnotation([374278.6750081396, 4091938.6135286274, -171.18617447465658], {
    "title": "3D Point"
});

If I understand correctly, the 3d coordinate needs to be converted to UTM coordinates to add an annotation to the potree point cloud viewer. So I’m doing the following based on various forum posts I’ve read… It gets close sometimes but in most cases nowhere near where I’m expecting the annotation to be based on the pixel i’ve selected.

my3dpt = nube.points[np.argmin(res)]
gps_world = rec.get_reference().to_lla(*my3dpt)
utmdata = utm.from_latlon(gps_world[0], gps_world[1])

What is the best method of verifying the 3D point is actually projected to the expected location in the viewer?

Just for reference… I’d like to annotate (in potree viewer) multiple points from pixel coordinates then show all images containing that point when the annotation is clicked.

1 Like

Are you inserting a custom 3D point that was not present in the original point cloud? Did you try calculating the closest point in the original point cloud to that given point (should be trivial in numpy, however, my solution does not support your use case) and mapping it back to the image?

1 Like

I just want to add custom annotations in the Potree viewer
(potree/Annotation.js at 37dc46a25ca56683e216c973dd117bc68b75ab0f · potree/potree · GitHub)

But I cannot figure out how to convert the 3D point to whatever format the Potree viewer is expecting.

I did try calculating the closest point but its not even close after I convert the point to utm format.

Like this: https://raw.githubusercontent.com/wilsonmichaelc/randomfiles/main/annotationexample.png

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.