ODM docker as base image does not work on sagemaker

hi all ,
i am trying to run odm as base image ( basically wrote the code on top of odm as base image to download data and then call run.py ) . i am using ml.m5d.24xlarge instance . input data is 600 mbs around 66 images . same code ( with odm as base image and my code on top of it ) runs fine as a docker on local . i am not sure what is the issue when i deploy the same on sagemaker . please find attached screenshot and suggest what can be done for the same to fix it . the command i am using is python run.py (it basically tries to give all feasible outputs ( i dont think that is the issue bcz it runs perfectly fine on local ) . i am using odm 2.5.1 image . )( the results directory of odm are stored in /mnt directory of the docker )

it gives child returned 134 error

the expected behavior is that it should process all the images and give an othomosaic in the end but inspte it gives above two screenshots as as error . . i am using odm docker (2.5.1) version as a base image and buildding my code on top of it ( basically which downlaods data from s3 in /mnt drive and then calls run.py ) and deploying all of it as a docker on sagemaker

1 Like

The first recommendation would be to use the latest version of ODM (any reason you’re still using 2.5.1?)

1 Like

for some reason i am not able to run apt update and install nginx upon it when i use latest version as base image . i think the latest version uses ubuntu 21 as os? since i am not able to install packages i need on latest one i am using older version . anyway i can use latest version and update packages ?

Definitely try to use the latest image and start from finding out why you can’t update the packages. It will be easier than figuring out why the old version of ODM is not working.

1 Like

okay , i will try that . so what i am doing is i an deploying the odm in sagemaker as an endpoint and whenever i call the endpoint it is running run.py file and i get this error .

do you think it can be a permission issue ? i am not sure when i call the endpoint it runs the docker as a user or with root permission and maybe because of that it is not able to read the exif file and hence it says no such file exists . another thing is i am running with same odm version in local and everything works fine so i am wondering why would it give error like above when the same docker ,same data is passed

another thing that i have notices is with the same docker image version , when i build it on sagemaker notebook and run it as a docker it runs fine . but the same docker image version , same code , gives above error when i deploy it on sagemaker as an endpoint . i am not sure why

Is your user in the docker group?

Chilld returned 137 → your machine is running out of memory. Add more RAM.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.