Split Merge on Cluster -- expected behavior

Running split merge through WebODM → ClusterODM → Nodes on a moderately large dataset. I remember that previously, detect_features and match_features ran on the primary and secondary nodes. Currently in this run, it appears to only be running on the primary node. Does this mean it’s not running as a distributed cluster, or is this a change in behavior?

[INFO]    Loading 22707 images
[INFO]    Found 22707 usable images
[INFO]    Parsing SRS header: WGS84 UTM 37S
[INFO]    Finished dataset stage
[INFO]    Running split stage
[INFO]    Large dataset detected (22707 photos) and split set at 3000. Preparing split merge.
[INFO]    Writing exif overrides
[WARNING] Legacy option --resize-to (this might be removed in a future version). Use --feature-quality instead.
[INFO]    PyOpenCL is missing (not a GPU build)
[INFO]    Altitude data detected, enabling it for GPS alignment
[INFO]    Enabling hybrid bundle adjustment
[INFO]    ['use_exif_size: no', 'flann_algorithm: KDTREE', 'feature_process_size: -1', 'feature_min_frames: 8000', 'processes: 40', 'matching_gps_neighbors: 12', 'matching_gps_distance: 0', 'optimize_camera_parameters: yes', 'undistorted_image_format: tif', 'bundle_outlier_filtering_type: AUTO', 'align_orientation_prior: vertical', 'triangulation_type: ROBUST', 'retriangulation_ratio: 2', 'camera_projection_type: BROWN', 'feature_type: SIFT', 'use_altitude_tag: yes', 'align_method: auto', 'bundle_interval: 100', 'bundle_new_points_ratio: 1.2', 'local_bundle_radius: 1', 'submodels_relpath: ../submodels/opensfm', 'submodel_relpath_template: ../submodels/submodel_%04d/opensfm', 'submodel_images_relpath_template: ../submodels/submodel_%04d/images', 'submodel_size: 3000', 'submodel_overlap: 150']
[INFO]    running /code/SuperBuild/src/opensfm/bin/opensfm extract_metadata "/var/www/data/4870e40f-1cc3-4b8b-ac4e-c6d9c3605f57/opensfm"
[INFO]    running /code/SuperBuild/src/opensfm/bin/opensfm detect_features "/var/www/data/4870e40f-1cc3-4b8b-ac4e-c6d9c3605f57/opensfm"
[INFO]    running /code/SuperBuild/src/opensfm/bin/opensfm match_features "/var/www/data/4870e40f-1cc3-4b8b-ac4e-c6d9c3605f57/opensfm"
1 Like

Interesting. I restarted it but set it explicitly to have a clusterodm node and now it’s behaving:

[INFO]    Found 22707 usable images
[INFO]    Parsing SRS header: WGS84 UTM 37S
[INFO]    Finished dataset stage
[INFO]    Running split stage
[INFO]    Setting max-concurrency to 39 to better handle remote splits
[INFO]    Large dataset detected (22707 photos) and split set at 3000. Preparing split merge.
1 Like

That’s a big dataset…

¿how long does it take to process?

I did 91k in ~2 weeks across 6 machines. This one will probably take a week, depending on machine availability. It could be done faster depending on what compute availability I have. On a rented cluster, I could probably get it done in about 2 days by just spinning up more instances.

2 Likes

That’s just amazing!

How this project ended? is this a world record?

1 Like

Ha! I think it is a world record. I haven’t found a larger project. I will be finishing up and reporting on this in the next couple months.

4 Likes

@smathermather-cm Could you please clarify the difference in configuration between your first and second posts?

On my cluster, I’m using WebODM to manually select the cluster as the processing node, AND use the --sm-cluster flag in the settings to point to the same cluster.

I haven’t noticed the behavior you describe so I’m keen to test.

Congratulations!

I want to know more about this, its amazing. :smiley:

3 Likes

I ran it, set the processing node to the cluster in the drop down, didn’t apply the --sm-cluster flag, and it ran as a distributed split merge. Then I stopped it for reasons, restarted it, and when I restarted it from load dataset, it started running in single-node split merge. Then I set the --sm-cluster flag and restarted and it ran as a distributed split-merge.

2 Likes

So if I’m getting this right:

Cluster in the drop down + --sm-cluster causes distributed split-merge.

Cluster in the drop down and NO --sm-cluster option causes single-node split-merge (somewhere on the cluster?)

To me this isn’t intuitive, because if I point WebODM at the cluster, it should distribute regardless of the --sm-cluster option. The behavior you described is more like just adding nodes to WebODM in a non-cluster configuration - when individual tasks are farmed out in their entirety. Perhaps when the user populates the processing node sidebar WebODM does not automatically realize if the node is a cluster. Therefore, the --sm-cluster option is still required because it needs to be explicit.

This might be a solid clarification to the docs when administering a cluster from WebODM, or for potential config cleanup down the line.

Honestly, I think it’s a bug, but maybe only with the new rerun logic. It’s not a documentation issue: if you send a cluster a split job, it should distribute to nodes, whether you specify the flag or not. That flag is supposed to be just there for the command line where you have an ODM instance operating as the parent for the process.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.