I am investigating the status of the 3.6.0 release. It seems like it has gone through several iterations, but no changes have been made since November.
As far as I can tell, the last attempt failed with:
Exception: Python bindings of GDAL 3.11.1 require at least libgdal 3.11.1, but 3.8.4 was found
I am trying to reproduce locally (using GitHub Codespace), but it is taking a long time to iterate.
On top of it, it seems like there were some decisions, like updating several library versions and using a virtual enviroment I assume this was due to U24 not allowing pip install. A lot of the information and discussion seems to be in github only; maybe we can bring the main rationale of the new version changes and the current state.
On that topic, OpenSfM moved to conda during the update to Ubuntu 24.04; maybe we can follow a similar approach for ODM.
I (personally) would start from the last stable release (3.5.6) and manually verify every update/change that has been done since. Sounds like a lot of work (it is), but everything needs to be reviewed anyway. At least starting from 3.5.6 you can be sure that things were working.
Edit: I’ll add, the 3.6.0 update probably broke things that not even OATS can detect. After the program builds, stuff needs to be tested quite exhaustively (OATS can help to some extent, but is not a replacement for manual testing inspection in some cases).
For the conda part : so far it’s been a pleasure to use conda.
The only tricky things is that I recommend using exact pinpointing of the build, otherwise, it might use a build that wasn’t build with some dependencies we’d like to use.
For example, you can get Ceres, but we want a ceres built with METIS. Conda might silently switch to a newer package some day (because new build in the forge) that wasn’t built with METIS.
As always, there are trade-offs. From what I gather, conda has gotten substantially better over the years. Given that many of our libraries rely on bindings, it may be best to adopt it sooner rather than later.
In principle, I agree with your advice, but in this case:
We are upgrading the OS, and usually there are discontinuities; some things get deprecated and force library updates, which cascade down into the projects. I do not see an easy one-change approach here, but a “leap of faith”. (If you see it differently, I am happy to listen)
It seems that Sam and Stephen did some testing and work on this, so I would like a summary of their findings. Maybe it would be useful, even if we end up taking a different road.
As a side note, I would like to make and document a “proper” release process. While I like automation, I agree that, at least for releases, a rigorous QA should be followed. I believe you have invaluable knowledge that we could try to document for other folks to follow. The process and learnings will also be useful for contributors and day-to-day dev to validate their PRs
Original pull request is here thanks to @nathanmolson’s very thoughtful and thorough work:
High level:
Started with a pull request pushing the base image to 24.04. Discussed vendored repo strategy, the smoother library updates, windows build, gdal/numpy versions and environments, with some patches for API changes in GDAL, scikit, and CODEM.
So, if I’m remembering / reading through the history right, the issue is we can docker build and run on a given CPU, and it passes testing in oats, but the portable builds portion, which is part of the release, is the broken step.
edit: We did mostly manual testing up to #1958, where we did a combo of oats and manual review.
I had a look, it seems that both GPU and Portable Github Actions are failing (for different reasons).
Portable seems to not be able to find the correct GDAL version. It seems like we updated the apt repos but I can’t find anywhere where we actually pull the library from apt-get.
GPU seems like our libary rely on cuda headers, the docker files is based on CUDA 13.0 image, but there are 12.5 images with U24.04, so it should be fixable (at least that error).
I tried to reproduce the “github actions” using codespaces but the compilation seem to be OOM, I’ll spend some cycles on this.
If I may ask, is there any reason why this last push was abandoned in Nov? It looks like you guys did a lot of heavy lifting already.
Sounds good. Let me know if you need a runner with more oomph. The CI build process is fairly slow as is. I can set something up on faster hardware if desired. Testing and pulling in Sam’s changes here will also help with build iteration timing, though maybe not OOM.
Yes, we did most of the hard work. Not abandoned, but paused. With WebODM pinned at pulling 3.5.6, I prioritized maintainer recruitment and getting budget in place. Hopefully we T’d you up for a good first win.
Quick update, I can reproduce the issue “locally” now (I am using GitHub Codespaces). Basically, the issue with portable build is the cross-compilation to ARM64, which is missing some of the packages from the repo. I’ll dig around to see if there are arm64 repos with gdal, or otherwise we might need to compile from source.
Everything in the main Dockerfile was working well, with tests via oats all passing (confirmed by both me and @smathermather). The next issues to address were the portable and GPU builds, but I ran out of time there - thanks for picking it up