I’m trying to set up ClusterODM autoscaling using AWS.
I’ve got one EC2 instance running WebODM, and one EC2 instance running ClusterODM and a NodeODM locked dummy node.
I’ve added the ClusterODM server as a processing node on WebODM.
I’ve tried running a few test jobs using the new ClusterODM processing node, and confirmed that it passes to the autoscaler/doesn’t use my locked dummy node.
Where I’m stuck at the moment is that the ClusterODM autoscaling fails to create a machine according to the CLI log. I can see a new EC2 instance created in my AWS account’s EC2 instances dashboard, can see it has the correct security group and that it gets to a running state, but that’s it.
After approximately 10 minutes of processing time, the job fails, and the CLI for ClusterODM says "Cannot create machine: Error: docker-machine exited with code 1
As a new user I’m restricted to only being able to add one screenshot, so here’s the server CLI error message:
I’ve had a read through my AWS configuration file, and also tried recreating my ClusterODM server a few times/reinstalling dependencies.
I SSH’d over to one of the autoscaler instances during a test and found that docker wasn’t installed (I believe docker machine should handle this when it creates the machine) perhaps that’s where the process is falling over?
Is anyone able to advise on other things I should check/what the cause might be?