Hi,
I’m trying to set up the autoscaling in AWS, but so far no luck after reading and trying the previous links in the forum. I would super appreciate it if anyone have more experience that could share with me. It’s a long post but thx for your patience!
The quick question is Do I have to use a web interface to trigger the job? – I know that in this structure pic WebODM is on top of ClusterODM. But from the post that I’ve read I feel like someone is triggering a job by using just command line, (like in a previous post the command docker run -ti -v "$(pwd)/images:/code/images" opendronemap/odm --split 2500 --sm-cluster http://youriphere:3000
is being used.
If the question above is yes, you can skip the following part and jump to the summary of some useful resources I found along my way of testing. If both is no. Then here is what I’ve tried and the error I came across.
My steps are
-
set up an EC2 instance, 16GB RAM just to be safe
-
install necessary packages, esp docker and docker-machine
-
setting up of clusterODM using:
git clone https://github.com/OpenDroneMap/ClusterODM cd ClusterODM npm install
*and saved a AWS autoscaling config json in the same ClusterODM folder
*then run
node index.js --asr configuration.json
-
setting up nodeODM using the docker command
docker run -p 3001:3000 opendronemap/nodeodm
but changing the port from 3000 to 3001 so that I can use it as a dummy node
*add and lock the node usingtelnet localhost 8080 > NODE ADD localhost 3001 > NODE LOCK 1 > NODE LIST 1) localhost:3001 [online] [0/2] <version 1.5.1> [L]
-
download a test image of 52 images that was orthomosaiced easily with ODM before, using the command
docker run -ti -v /home/ubuntu/odm/917/code/images:/code/images opendronemap/odm --split 30 --split-overlap 3 --sm-cluster http://172.17.0.1:3000 --debug --verbose > ./output.txt
Here is where I start getting errors.
previously, when clusterODM starts running, it connects to the dummy node a few warnings, no errors
I did get a hello.txt file in s3, I don’t know if this is related
when nodeODM starts running, there’s also no errors
However, when I use the docker command to point to processing to the cluster, it says “attempted to autoscale but failed”
In the clusterODM screen, here are the error messages.
and I think the main issue is
Cannot create machine: Error: docker-machine exited with code 1
I went to check my EC2 history, there were instances created, but got terminated right away.
Here might be a similar issue with digitalocean.
would this be something related to the bash file mentioned in the configuration file: "engineInstallUrl": "\"https://releases.rancher.com/install-docker/19.03.9.sh\"",
I’m not really familiar with docker-machine and it would be nice if someone can help me to understand it’s purpose here and if there’s a way to get around it.
Another related question is about the dummy node. Why would the images still got processed locally when a cluster is being pointed to? As it was mentioned in the doc
You should always have at least one static NodeODM node attached to ClusterODM, even if you plan to use the autoscaler for all processing. If you setup auto scaling, you can’t have zero nodes and rely 100% on the autoscaler. You need to attach a NodeODM node to act as the “reference node” otherwise ClusterODM will not know how to handle certain requests (for the forwarding the UI, for validating options prior to spinning up an instance, etc.). For this purpose, you should add a “dummy” NodeODM node and lock it:
This way all tasks will be automatically forwarded to the autoscaler.
My understanding is that everything will be sent to autoscaler and get processed accordingly. However, my experience was the first subset got processed locally and the rest was sent to autoscaler.
Summary
links I find useful
great summary about primary machine and secondary machine: How I set up clusterodm
autoscaling on aws
aws autoscaling
autoscale using odm
autoscaling using clusterodm + aws
some odm blogs
nodeODM repo
clusterODM repo