Segfault on aligning submodels stage

I’m processing 5200 images on an EC2 instance with this specification:

  • Ubuntu 20.04
  • 32 vCPUs
  • 128 Gb of memory
  • NVIDIA T4 Tensor Core GPU

I’ve been trying to run it with a GPU and a regular ODM docker image. In both cases the last message in the docker container is
[INFO] Aligning submodels...

After that, the container exits with the 138 exit code. Its final state is

 "State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 139,
            "Error": "",
            "StartedAt": "2021-07-27T08:04:25.465713163Z",
            "FinishedAt": "2021-07-27T11:12:40.418633498Z"
        },

We can see that it’s not OOM killed. I monitor memory and GPU usage and at the moment of failure there’s plenty of free memory and the GPU is not used.

The system log on the host contains this message:

Jul 27 11:12:21 ip-10-1-2-105 kernel: [12381.831006] python3[49809]: segfault at 53 ip 00007fdb4af87c9d sp 00007fffa3db67e0 error 4 in pymap.cpython-38-x86_64-linux-gnu.so[7fdb4af45000+9c000]

As I said I ran the ODM processing with ODM and ODM-GPU docker images. And it ends up with the same error message. I also tried to set it up on Ubuntu 18. The result was the same there too.

My docker run command was “docker run -d -v /mnt/data:/datasets/code --gpus all opendronemap/odm:gpu --project-path /datasets --pc-las --split 100 --split-overlap 10”

Do you have any insights on what could have gone wrong?

1 Like

Have you tried increasing the overlap value? That seems like it might be too low.

I tried overlap values 100, 150, 200, 300. It fails the same way.

1 Like

Can the model process without split/merge and without the Las file?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.