I am interested in mapping 2D pixel values from the original images to the point cloud. The inverse mapping (point cloud to pixel values) is quite straightforward using the build-in methods in OpenSfM:

However, I did not manage to find the suitable methods to map a 2D pixel in the original image to the corresponding point in the 3D point cloud. From my limited experience in Metashape, there is a direct approach for this mapping, however, Iâ€™m still unable to accomplish this even indirectly.

I understand I need some form of scaling with the depth of the point, but there are no details in the documentation on how to obtain the correct depth for the given pixel.

I have several ideas in mind. One of them is to annotate a single image, using a bounding box for example, and use the point cloud to annotate the overlapping images automatically.

Thatâ€™s really powerful stuff! Unfortunately, Iâ€™m not going to be a great resource at this point as my understanding of the code is pretty poor. Maybe someone more knowledgeable will pop in soon!

Would you be interested in pushing these abilities upstream once you get them working? I think theyâ€™d be a huge asset for our community!

From a cursory search, it looks like this was at least asked of OpenSFM a while back, and it looks like @Yannoun provided some nice pointers. Reading the above, it looks like this is all old news to you, though.

Indeed I have found this answer. Unfortunately, there are no guidelines or an efficient implementation for the depth estimation nor the intersection with the point cloud.

@oringa Any chance you have a solution I could look at? Trying to get this working but Iâ€™m not quite thereâ€¦ here is what Iâ€™ve got so far using the resources linked above. Iâ€™m quite new to this so any insight/help is welcome and much appreciated!

Iâ€™m also doing this in the context of WebODMâ€¦ Iâ€™ve modified the NodeODM Docker image to output the opensfm data so that I can access it at --media-dir location for running this new action below.

[edit] codeâ€¦ its pretty close but still the projection is off a bit

def run_dataset(data: DataSet, filename, x, y, output):
"""Compute the position of a point in 3D from 2D pixel
Args:
output: undistorted
"""
# Load the reconstruction, shot, pose and camera
rec: Reconstruction = data.load_reconstruction()[0]
shot: Shot = rec.shots[filename]
pose: Pose = shot.pose
camera: Camera = shot.camera
# relative pixel coords
pt2d = camera.pixel_to_normalized_coordinates((x, y))
# bearing
bearing = camera.pixel_bearing(pt2d)
print('\nBearing')
print(bearing)
# bearing /= bearing[2]
# Camera Dimensions
print('\nCamera Dimensions')
print(f'{camera.width} x {camera.width}')
# Load depthmap dataset
udata_path = os.path.join(data.data_path, output)
udata = dataset.UndistortedDataSet(data, udata_path)
dMap, plane, score, nghbr, nghbrs = udata.load_raw_depthmap(shot.id)
print('\nDepthmap Size')
print(f'{dMap[0].size} x {dMap[1].size}')
print('\nMax of camera width/height')
m = max(camera.width, camera.height)
print(m)
print('\nScaling for X/Y')
scaleX = dMap[0].size / m
scaleY = dMap[1].size / m
print(f'{scaleX} / {scaleY}')
print('\nDepth Coordinates')
dx = int(round((scaleX * float(x)), 1))
dy = int(round((scaleY * float(y)), 1))
print(f'{dx} x {dy}')
print('\nPoint Depth')
depth = dMap[dx][dy]
if (depth == 0.0):
depth = estimate_depth(dMap, dx, dy, 2)
print(depth)
# adjust depth
pt3d = bearing * depth
print('\nPoint 3D (bearing * depth)')
print(pt3d)
##
xxxx = shot.project(pt3d)
pt2D_px = camera.normalized_to_pixel_coordinates(xxxx)
print('\nPixel Reverse Projection')
print(pt2D_px)
##
# this point is in the camera coordinate system
pt3d_world = pose.inverse().transform(pt3d)
print('\nPoint 3D World')
print(pt3d_world)
# gps coords
gps_world = rec.get_reference().to_lla(*pt3d_world)
print('\nGPS World')
print(gps_world)
# ECEF From LLA
ecef = ecef_from_lla(*gps_world)
print('\nECEF')
print(ecef)
# topocentric
topo = topocentric_from_lla(gps_world[0], gps_world[1], gps_world[2], 0, 0, 0)
print('\nTopocentric')
print(topo)
# UTM
utmdata = utm.from_latlon(gps_world[0], gps_world[1])
print('\nUTM')
print(utmdata)
# 3D Point Annotation
print('\nPoint Annotation')
print(f'{utmdata[0]}, {utmdata[1]}, {gps_world[2]}')
def estimate_depth(map, x, y, steps):
step = 1
depth = map[x][y]
print('point depth')
print(depth)
if depth != 0.0:
while(step <= steps):
sum = 0
devisor = 0
for i in range(x-step, x+step+1):
devisor += 1
for j in range(y-step, y+step+1):
sum = sum + map[i][j]
devisor += 1
average = sum / devisor
if depth != 0:
depth = (depth + average) / 2
else:
depth = average
print('average depth')
print(depth)
step += 1
return depth

This was my general approach: Given the origin of the camera (p1) and the normalized coordinates of a pixel (p2), I calculated the shortest distance from the ray given by p1,p2 to all points in the point cloud. This code is relatively fast however it is possible to increase its performance by using a better data structure, which I did not implement yet. Let me know if that helps.

From here, you can use nube.points[np.argmin(res)] in order to find the 3D coordinates of the point in the point cloud corresponding to the original x,y values of the pixel. and project it to all other images in your corpus to obtain the images in which that pixel appears.

Thank you for the explanation! I think this makes sense but now I have a new problem. Iâ€™m using the WebODM viewer (potree) to check if the 3d point matches up to where it should be on the point cloud by adding an annotation like thisâ€¦

If I understand correctly, the 3d coordinate needs to be converted to UTM coordinates to add an annotation to the potree point cloud viewer. So Iâ€™m doing the following based on various forum posts Iâ€™ve readâ€¦ It gets close sometimes but in most cases nowhere near where Iâ€™m expecting the annotation to be based on the pixel iâ€™ve selected.

What is the best method of verifying the 3D point is actually projected to the expected location in the viewer?

Just for referenceâ€¦ Iâ€™d like to annotate (in potree viewer) multiple points from pixel coordinates then show all images containing that point when the annotation is clicked.

Are you inserting a custom 3D point that was not present in the original point cloud? Did you try calculating the closest point in the original point cloud to that given point (should be trivial in numpy, however, my solution does not support your use case) and mapping it back to the image?