Using specific GPU with nodeodm

ChrisDAT · September 7, 2022, 9:47pm

As I was looking to make nodeodm use the second GPU which is connected via Thunderbold3 and not the internal one where the X server runs, I found the following working approach:

Add the following to docker-compose.nodeodm.gpu.nvidia.yml on the nodeodm service:

version: '2.1'
services:
  webapp:
    depends_on:
      - node-odm
    environment:
      - WO_DEFAULT_NODES
      - NVIDIA_VISIBLE_DEVICES
  node-odm:
    image: opendronemap/nodeodm:gpu
    ports:
      - "3000"
    restart: unless-stopped
    oom_score_adj: 500
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]

Notice deploy and environment.
When running NVIDIA_VISIBLE_DEVICE=1 ./webodm.sh start --gpu, nodeodm will use the proper GPU. In this case the GPU with id 1 from the host system:

As we can see, inside the container, still both GPU devices are visible:
docker --exec --it webodm_node-odm_1 watch -n 5 nvidia-smi

Although the containers nvidia-smi does not see the process information. I dont know why.

[INFO]    Running opensfm stage
[WARNING] Legacy option --resize-to (this might be removed in a future version). Use --feature-quality instead.
[INFO]    nvidia-smi detected
[INFO]    Using GPU for extracting SIFT features

Johnny5 · September 8, 2022, 10:10pm

When you have multiple GPUs does that provide a way to tie each GPU to a specific nodeodm:gpu container?

ChrisDAT · September 10, 2022, 1:47pm

Yes, this is what NVIDIA_VISIBLE_DEVICES (inside the container) should allow. Mentioning the env variable in the docker compose file tells docker that it should propagate the env variable from the host system to the container, allowing to wire GPUs to containers from the host.
Log on to a running nodeodm:gpu container and look at the env variables. That variable will normally be set to all.
Details can be found here:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html

Although I have two GPUs (Quadro M1000M 2GB internal, RTX 2080 8GB external), I tried to put multiple nodeodm containers on the external GPU.
The motivation for doing so was the observation that one nodeodm instance maxed out at around 500MB GPU memory usage. I used 3 containers and memory usage went up to ~1700MB.
So far so good.
But: it seems that this does not work reliably. It might be that using the same GPU from different containers creates some conflicts, but I could not figure out yet what exactly happens. The system just freezes completely. There is still disk activity, but the desktop is completely frozen and the only thing I can do is switching of the machine. I need to setup a second machine and check if I can access it from the network.

Johnny5 · September 10, 2022, 10:48pm

I’ll have to try this - I have a K80 which contains two GPU’s so I should be able to allocate them to two separate nodeodm:gpu containers.

Johnny5 · September 14, 2022, 8:07am

First try at it…I started two nodeodm:gpu containers, logged into each one via Bash CLI, edited the environment variables, started a processing task on each…both used GPU 0 regardless of the second container having its environment variable NVIDIA_VISIBLE_DEVICES=1.

ChrisDAT · September 14, 2022, 8:26am

Can you try to set the env variable with the docker run command, like:
docker run -p ... -v ... --env NVIDIA_VISIBLE_DEVICES=1 opendronemap/nodeodm:gpu ... ?

I am not sure if --gpus is relevant too, but I did not set it.

Johnny5 · September 17, 2022, 3:44am

Previously I used these:
docker run -dp 3001:3000 --gpus all --name nodeodmgpu1 opendronemap/nodeodm:gpu
docker run -dp 3002:3000 --gpus all --name nodeodmgpu2 opendronemap/nodeodm:gpu

For this test I used:
docker run -dp 3001:3000 --env NVIDIA_VISIBLE_DEVICES=0 --name nodeodmgpu01 opendronemap/nodeodm:gpu
docker run -dp 3002:3000 --env NVIDIA_VISIBLE_DEVICES=1 --name nodeodmgpu02 opendronemap/nodeodm:gpu
It appears the --gpus switch is needed otherwise it doesn’t use any GPU.

Next I tried with --gpus all:
docker run -dp 3001:3000 --gpus all --env NVIDIA_VISIBLE_DEVICES=0 --name nodeodmgpu01 opendronemap/nodeodm:gpu
docker run -dp 3002:3000 --gpus all --env NVIDIA_VISIBLE_DEVICES=1 --name nodeodmgpu02 opendronemap/nodeodm:gpu
but they both still used GPU 0.

Next I tried setting specific GPU IDs with the --gpu switch:
docker run -dp 3001:3000 --gpus 0 --env NVIDIA_VISIBLE_DEVICES=0 --name nodeodmgpu01 opendronemap/nodeodm:gpu
docker run -dp 3002:3000 --gpus 1 --env NVIDIA_VISIBLE_DEVICES=1 --name nodeodmgpu02 opendronemap/nodeodm:gpu
but they both still used GPU 0…so no success for me yet.

ChrisDAT · September 18, 2022, 10:13am

This is what I did:

First nodeodm, tied to GPU 0:

Environment of first node:

odm@2d35d79bbbbe:/var/www$ env | grep 'NV.*'
NV_LIBCUBLAS_VERSION=11.3.1.68-1
NVIDIA_VISIBLE_DEVICES=0
NVIDIA_REQUIRE_CUDA=cuda>=11.2 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=450,driver<451
NV_NVTX_VERSION=11.2.67-1
NV_LIBCUSPARSE_VERSION=11.3.1.68-1
NV_LIBNPP_VERSION=11.2.1.68-1
NVIDIA_DRIVER_CAPABILITIES=compute,utility
NV_LIBNPP_PACKAGE=libnpp-11-2=11.2.1.68-1
NV_CUDA_CUDART_VERSION=11.2.72-1
NV_LIBCUBLAS_PACKAGE=libcublas-11-2=11.3.1.68-1
NV_LIBCUBLAS_PACKAGE_NAME=libcublas-11-2
NV_CUDA_LIB_VERSION=11.2.0-1
NVARCH=x86_64
NV_CUDA_COMPAT_PACKAGE=cuda-compat-11-2
NV_LIBNCCL_PACKAGE=libnccl2=2.8.4-1+cuda11.2
NV_LIBNCCL_PACKAGE_NAME=libnccl2
NV_LIBNCCL_PACKAGE_VERSION=2.8.4-1
odm@2d35d79bbbbe:/var/www$

Second nodeodm, tied to GPU 1:

Environment of second node:

odm@e8a6aecdd686:/var/www$ env | grep 'NV.*'
NV_LIBCUBLAS_VERSION=11.3.1.68-1
NVIDIA_VISIBLE_DEVICES=1
NVIDIA_REQUIRE_CUDA=cuda>=11.2 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=450,driver<451
NV_NVTX_VERSION=11.2.67-1
NV_LIBCUSPARSE_VERSION=11.3.1.68-1
NV_LIBNPP_VERSION=11.2.1.68-1
NVIDIA_DRIVER_CAPABILITIES=compute,utility
NV_LIBNPP_PACKAGE=libnpp-11-2=11.2.1.68-1
NV_CUDA_CUDART_VERSION=11.2.72-1
NV_LIBCUBLAS_PACKAGE=libcublas-11-2=11.3.1.68-1
NV_LIBCUBLAS_PACKAGE_NAME=libcublas-11-2
NV_CUDA_LIB_VERSION=11.2.0-1
NVARCH=x86_64
NV_CUDA_COMPAT_PACKAGE=cuda-compat-11-2
NV_LIBNCCL_PACKAGE=libnccl2=2.8.4-1+cuda11.2
NV_LIBNCCL_PACKAGE_NAME=libnccl2
NV_LIBNCCL_PACKAGE_VERSION=2.8.4-1
odm@e8a6aecdd686:/var/www$

Both nodes only see the assigned GPU.
Next starting WebODM:

chris@laptop1:~/FH/Master/Tools/WebODM$ ./webodm.sh start --default-nodes 0 --gpu
01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M1000M] (rev a2)
01:00.1 Audio device: NVIDIA Corporation GM107 High Definition Audio Controller [GeForce 940MX] (rev a1)
0a:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 SUPER] (rev a1)
0a:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
0a:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
0a:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
GPU_NVIDIA has been found
Checking for docker...   OK
Checking for docker-compose...   OK
Starting WebODM...

Using the following environment:
================================
Host: localhost
Port: 8000
Media directory: appmedia
SSL: NO
SSL key: 
SSL certificate: 
SSL insecure port redirect: 80
Celery Broker: redis://broker
Default Nodes: 0
================================
Make sure to issue a ./webodm.sh down if you decide to change the environment.

docker-compose -f docker-compose.yml up --remove-orphans
Removing orphan container "webodm_node-odm-1_1"
Removing orphan container "webodm_node-odm-2_1"
Removing orphan container "webodm_node-odm-3_1"
Starting db     ... done
Starting broker ... done
Starting worker ... done
Starting webapp ... done
Attaching to db, broker, worker, webapp
...
webapp    | Congratulations! └@(･◡･)@┐
webapp    | ==========================
webapp    | 
webapp    | If there are no errors, WebODM should be up and running!
webapp    | 
webapp    | Open a web browser and navigate to http://localhost:8000

Starting clusterodm:

chris@laptop1:~$ docker run --rm -dp 3000:3000 -p 8081:8080 -p 10000:10000 opendronemap/clusterodm
7fbf9324d646e9782dbccf68b85331c8c5305e0bbf8371692073885a1fa175f9
chris@laptop1:~$ telnet localhost 8081
Trying ::1...
Connected to localhost.
Escape character is '^]'.
Welcome ::ffff:172.17.0.1:59704 ClusterODM:1.5.3
HELP for help
QUIT to quit
#> NODE LIST

#> NODE ADD 192.168.0.150 3001
OK
#> NODE ADD 192.168.0.150 3002
OK
#> NODE LIST
1) 192.168.0.150:3001 [online] [0/1] <engine: odm 2.8.8> <API: 2.2.0>
2) 192.168.0.150:3002 [online] [0/1] <engine: odm 2.8.8> <API: 2.2.0>

#>

Adding clusterodm to WebODM:

And finally the dataset running on two nodeodm instances:

One thing I can think about is the Nvidia container runtime, which needs to be installed. The package is named nvidia-docker2. When I tied the GPUs with my initial post, I added the device node to the nvidia yml docker compose file. But if that makes a difference. The Nvidia runtime can be set as default using Dockers daemon.json file, according to this page:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html

At least this is how it works for me. Give it a try with the Nvidia runtime.

ChrisDAT · September 18, 2022, 10:43am

What I noticed too, the reconstruction for the first time introduced a huge issue, which I never got before, quality was observable better than with clusterodm. One “area” has been placed orthogonal within the point cloud to the rest of the reconstructed scene (Image is dark as I extracted the temperatures from thermal images and applied a b/w color palette). Maybe this is a result of the split/merge operation?

And too, the processing time has been increased by 2 minutes (20mins vs. 22min) when using clusterodm and both GPUs, rather than running on a single nodeodm using the faster GPU only.

Johnny5 · September 19, 2022, 7:31am

Thanks heaps for the detailed steps, I’ll give it a try and let you know how I go.
Legend!

Johnny5 · September 19, 2022, 9:53am

I’ll have to investigate why I’m getting this error (my server o/s is Ubuntu 22.04 LTS):

username@server:~$ sudo apt-get install -y nvidia-docker-2
Reading package lists… Done
Building dependency tree
Reading state information… Done
E: Unable to locate package nvidia-docker-2

rumenchooo · September 19, 2022, 9:58am

check this out :

ChrisDAT · September 19, 2022, 10:07am

The name of the package is nvidia-docker2, without the dash in front of 2.

Johnny5 · September 27, 2022, 7:09am

Success!!!
Two separate tasks using separate GPU cores:
Screenshot 2022-09-27 170731

Thanks so much for your help @ChrisDAT !!!

ChrisDAT · September 27, 2022, 7:36am

Nice, congratulation

system · October 27, 2022, 7:37am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.