Dear OpenDroneMap Community,
Quan and I have had the pleasure of working with Steve on this project to adapt OpenDroneMap to be able to run on HPC this winter term. It has been an enjoyable experience for me personally, and I want to share with all of you our progress.
At first, we started out not knowing much. I have only used ODM a few times before, and hardly knew the structure of ODM. It’s also my first time knowing anything related to HPC. I was afraid, but also curious on what I could learn to do and use my knowledge to improve ODM.
To adapt ODM to HPC, our challenge is that HPC environments are typically “rootless”, which means you should not need root privileges to run things in the environment. OpenDroneMap uses Docker to build and run the containers. These Docker containers need root permissions to make directories, run executables, etc. Therefore, Docker does not seem to be the ideal platform for using HPC. We thought of two alternatives to Docker: Podman and Singularity/Apptainer.
Podman at first looks really promising to us. It has similar syntax to Docker, and it just seems like another Docker but rootless, so we started out with it. However, after a lot of implementations and working around, it is not as promising as we thought it would be. Firstly, Podman has a lot of configs that needed to be set up. You need the correct permissions and privileges to actually able to run anything with podman. Secondly, even after finishing these configs, Podman has trouble working with SLURM. It looks like NFS is not fully supported in Podman, so our experimentation came to a dead end. Podman works well if you run NodeODM and ClusterODM by yourself, but struggles with scheduling system like SLURM. We had to find another way.
That’s when we tried Apptainer. Apptainer only requires a basic installation, and it seems to be the go-to platform when working with HPC clusters. However, ODM does not run as expected on Apptainer. NodeODMs and ClusterODM run well, but when we start to experiment with Starting Task and making it work, it fails halfway through. Steve suggests there was a chance because we were using Docker container, which creates an error. Since Apptainer is made to run intrinsically Apptainer-made containers, it might be a good idea to just create Apptainer container and run on it. We try making Apptainer container from definition file ported from Dockerfile, and it finally succeeds. It works perfectly on HPC and SLURM as well. After a bunch of experiments, we finally write down documentation and hope that you all can benefit from our experience working on this awesome project.
With that, we were finally able to conclude our winter term with a success. At least, I see this experience as so. I am really grateful for my colleagues working on this project with me, Steve and Quan, and want to thank them for their help. I hope this thread can shed some insights on our experience working to improve this awesome project. For all the ODM lovers out there, I hope this can encourage your burning love for drones and computers and continue to make interesting projects in the future.