Overnight job just stops without any specific reason

I am using subprocess module within python to run the odm processing over multiple sets(we call it missions) of images. Each of these folders approximately have 600-1000 images per folder (approx. 35 GB per folder/mission). When I run the process overnight on an EC2 instance (40 vCPU and 160GB of RAM), the process takes around 5 hours per folder, and when moving onto subsequent folders the python code just gets interrupted for no reason(no logs captured despite using a try catch). Also, the instance when tried to ssh says that the volume is busy and I have to unmount and remount the instance to start the process manually again.

Following is the python code:
GNU nano 6.2 stitch_all_images.py
import os
import subprocess
root=‘s3_bucket’
for i in os.listdir(root):
for j in os.listdir(os.path.join(root,i)):
cmd=""“docker run --cpus=40 -m=160g --memory-swap=-1 -it
–rm -v “/home/ubuntu/s3_bucket/{}”:/datasets
opendronemap/odm
–project-path /datasets {} --build-overviews
–cog --orthophoto-compression JPEG --fast-orthophoto --optimize-disk-space”"".format(i>
out = open(i+’_’+j+’.txt’, ‘w’)
try:
subprocess.call(cmd, shell=True,stdout=out)
except Exception as e:
out.write(str(e))
#os.system(cmd)
out.close()

1 Like