ClusterODM AWS Autoscaling - Stuck at Trying to Create Machine

Hello.

I’m trying to set up ClusterODM autoscaling using AWS.

I’ve got one EC2 instance running WebODM, and one EC2 instance running ClusterODM and a NodeODM locked dummy node.

I’ve added the ClusterODM server as a processing node on WebODM.

I’ve tried running a few test jobs using the new ClusterODM processing node, and confirmed that it passes to the autoscaler/doesn’t use my locked dummy node.

Where I’m stuck at the moment is that the ClusterODM autoscaling fails to create a machine according to the CLI log. I can see a new EC2 instance created in my AWS account’s EC2 instances dashboard, can see it has the correct security group and that it gets to a running state, but that’s it.

After approximately 10 minutes of processing time, the job fails, and the CLI for ClusterODM says "Cannot create machine: Error: docker-machine exited with code 1

As a new user I’m restricted to only being able to add one screenshot, so here’s the server CLI error message:

screenshot4

I’ve had a read through my AWS configuration file, and also tried recreating my ClusterODM server a few times/reinstalling dependencies.

I SSH’d over to one of the autoscaler instances during a test and found that docker wasn’t installed (I believe docker machine should handle this when it creates the machine) perhaps that’s where the process is falling over?

Is anyone able to advise on other things I should check/what the cause might be?

Thank you.

1 Like

Welcome!

Sorry you’re having difficulty.

My knowledge in this space is pretty poor, so I apologize, but can you ensure that your user is part of the docker group so that it can manage the docker instance with the proper permissions?

1 Like

No worries and thanks for the suggestion!

I’ve given that a try but no dice - same sticking point and result.

I’ve got ClusterODM running with NodeJS (to simplify using the --asr parameter and passing in my configuration file) and NodeODM (reference dummy node for ClusterODM) running with docker. Could that be causing this issue?

1 Like

I honestly do not know.

Hopefully someone more knowledgeable with these two tools stops by soon!

1 Like

Hi there!

Quick update, I’ve managed to resolve this issue.

I went through a few iterations and made a few changes in one go, so can’t be 100% sure what worked, but I think it might have been as simple as updating the auto-scaled instance’s security group to allow outbound traffic.

All sorted.

1 Like

Aha, allowing outbound traffic would help. :slight_smile:

1 Like