ODM 3.6.0 situation report

Hi all,

I am investigating the status of the 3.6.0 release. It seems like it has gone through several iterations, but no changes have been made since November.

As far as I can tell, the last attempt failed with:

Exception: Python bindings of GDAL 3.11.1 require at least libgdal 3.11.1, but 3.8.4 was found

I am trying to reproduce locally (using GitHub Codespace), but it is taking a long time to iterate.

On top of it, it seems like there were some decisions, like updating several library versions and using a virtual enviroment I assume this was due to U24 not allowing pip install. A lot of the information and discussion seems to be in github only; maybe we can bring the main rationale of the new version changes and the current state.

On that topic, OpenSfM moved to conda during the update to Ubuntu 24.04; maybe we can follow a similar approach for ODM.

I recommend using ./start-dev-env.sh for all ODM development (fastest, probably).

The blocker is the docker release action: Merge pull request #1958 from OpenDroneMap/fix/gdal-3.11.1 · OpenDroneMap/ODM@7127aa2 · GitHub

I (personally) would start from the last stable release (3.5.6) and manually verify every update/change that has been done since. Sounds like a lot of work (it is), but everything needs to be reviewed anyway. At least starting from 3.5.6 you can be sure that things were working.

Edit: I’ll add, the 3.6.0 update probably broke things that not even OATS can detect. After the program builds, stuff needs to be tested quite exhaustively (OATS can help to some extent, but is not a replacement for manual testing inspection in some cases).

1 Like

For the conda part : so far it’s been a pleasure to use conda.

The only tricky things is that I recommend using exact pinpointing of the build, otherwise, it might use a build that wasn’t build with some dependencies we’d like to use.

For example, you can get Ceres, but we want a ceres built with METIS. Conda might silently switch to a newer package some day (because new build in the forge) that wasn’t built with METIS.

1 Like

Good to see you around :slight_smile:

As always, there are trade-offs. From what I gather, conda has gotten substantially better over the years. Given that many of our libraries rely on bindings, it may be best to adopt it sooner rather than later.

In principle, I agree with your advice, but in this case:

  • We are upgrading the OS, and usually there are discontinuities; some things get deprecated and force library updates, which cascade down into the projects. I do not see an easy one-change approach here, but a “leap of faith”. (If you see it differently, I am happy to listen)
  • It seems that Sam and Stephen did some testing and work on this, so I would like a summary of their findings. Maybe it would be useful, even if we end up taking a different road.

As a side note, I would like to make and document a “proper” release process. While I like automation, I agree that, at least for releases, a rigorous QA should be followed. I believe you have invaluable knowledge that we could try to document for other folks to follow. The process and learnings will also be useful for contributors and day-to-day dev to validate their PRs

1 Like

Original pull request is here thanks to @nathanmolson’s very thoughtful and thorough work:

High level:

Started with a pull request pushing the base image to 24.04. Discussed vendored repo strategy, the smoother library updates, windows build, gdal/numpy versions and environments, with some patches for API changes in GDAL, scikit, and CODEM.

Rough play-by-play:

Followed on with pull requests to pull all into master but we neglected the portable builds.

So, if I’m remembering / reading through the history right, the issue is we can docker build and run on a given CPU, and it passes testing in oats, but the portable builds portion, which is part of the release, is the broken step.

edit: We did mostly manual testing up to #1958, where we did a combo of oats and manual review.

3 Likes

Got it, so it is “just” the CI acting up.

I had a look, it seems that both GPU and Portable Github Actions are failing (for different reasons).

  • Portable seems to not be able to find the correct GDAL version. It seems like we updated the apt repos but I can’t find anywhere where we actually pull the library from apt-get.
  • GPU seems like our libary rely on cuda headers, the docker files is based on CUDA 13.0 image, but there are 12.5 images with U24.04, so it should be fixable (at least that error).

I tried to reproduce the “github actions” using codespaces but the compilation seem to be OOM, I’ll spend some cycles on this.

If I may ask, is there any reason why this last push was abandoned in Nov? It looks like you guys did a lot of heavy lifting already.

3 Likes

Sounds good. Let me know if you need a runner with more oomph. The CI build process is fairly slow as is. I can set something up on faster hardware if desired. Testing and pulling in Sam’s changes here will also help with build iteration timing, though maybe not OOM.

Yes, we did most of the hard work. Not abandoned, but paused. With WebODM pinned at pulling 3.5.6, I prioritized maintainer recruitment and getting budget in place. Hopefully we T’d you up for a good first win.

2 Likes

Quick update, I can reproduce the issue “locally” now (I am using GitHub Codespaces). Basically, the issue with portable build is the cross-compilation to ARM64, which is missing some of the packages from the repo. I’ll dig around to see if there are arm64 repos with gdal, or otherwise we might need to compile from source.

7 Likes

I think you have summarised it all well - sorry for chiming in late!

I’m sure you have already found it, but the apt install is done based on the content of the snapcraft yaml: ODM/snap/snapcraft24.yaml at master · OpenDroneMap/ODM · GitHub )

In order to use GDAL 3.11.1 on Ubuntu 24.04, I added the ubuntugis/ubuntugis-unstable ppa: ODM/snap/snapcraft24.yaml at e2acf27e55a93d725016ab0e8c889d2cbfe06e14 · OpenDroneMap/ODM · GitHub

Everything in the main Dockerfile was working well, with tests via oats all passing (confirmed by both me and @smathermather). The next issues to address were the portable and GPU builds, but I ran out of time there - thanks for picking it up :smiley:

(perhaps I was a bit hasty to assume the issue was the portable wrapper Do we need to maintain support for old x86_64 CPUs (pre-2010)? i.e. Should we require AVX? · Issue #1960 · OpenDroneMap/ODM · GitHub - that was next on the list to investigate for me - but glad you found it’s actually issues with ARM builds instead)

3 Likes