Cannot create node with docker-machine

So, I’ve tried about half a dozen different AMI’s using AWS Linux, RHEL, and Ubuntu. Nothing getting me past the basics of docker-machine as shown below. Following the docker-machine class it appears to try spawning a machine and fails. It catches an exit of the process immediately.

info: Found docker-machine executable
info: Loaded 1 nodes
info: Loaded 2 routes
info: Starting http proxy on 3000
info: Trying to create machine… (1)
warn: Cannot create machine: Error: docker-machine exited with code 1
info: Trying to create machine… (2)
warn: Cannot create machine: Error: docker-machine exited with code 1
info: Trying to create machine… (3)
warn: Cannot create machine: Error: docker-machine exited with code 1
warn: Could not remove docker-machine, it’s likely that the machine was not created, but double-check!
warn: Cannot create node via autoscaling: Cannot create machine (attempted 3 times)
warn: Cannot forward task 5d400299-07f8-4327-8876-d9e22aa8d7f0 to processing node 10.0.0.181:3001: No nodes available (attempted to autoscale but failed). Try again later.

Exit code 1, going by docker convention, suggests an application error. Looking through ClusterODM.log shows nothing of interest. Versioning has been an issue in other places, are there specific versions of supporting packages that could be fouling this up? I’m currently running node 14.13.0, docker 20.10.7, and docker-machine 0.16.2. Spot is currently ‘false’ so no spot complexities. The node configuration starts with a single, locked node.

A few things I’ve considered and tried:

  1. varying the OS versions
  2. making sure the user (ec2-user for RHEL and AWS, ubuntu for Ubuntu) has permissions for ec2 and s3.
  3. Updating and logging into a provided AMI before creating a child AMI so that the cred’s are in .ssh.
  4. Ensuring that ports are open. I’ve allowed inbound traffic for the primary for 22 and 3000-3001. The private IP’s are allowed 22, 443, 80, and 3000-3001. Are there other ports used by docker-machine?

If I can get past machine creation, then I fully expect AWS Linux to fail since the 19.03.9.sh engineInstallURL doesn’t account for amzn. (BTW - that shell script is awfully similar to the docker one here - https://get.docker.com.) However, that doesn’t explain the success others have reported using an Ubuntu AMI - obviously, the solution doesn’t rest on launching AWS Linux instances only although that would be a reasonable extension of the .sh docker installer.

I saw drivers used in docker machine creation for virtualbox. Do I need that installed? Are there specific OS releases that have worked?

Any insights would be greatly appreciated!

1 Like

It’s likely a problem with docker-machine. Unfortunately Docker Inc. has decided to drop support for docker-machine some time ago. So we’re a bit in limbo in regard to support.

From the ClusterODM console, issue:

ASR VIEWCMD 50

It will give you the command that ClusterODM launches to spin up a machine. Start from there to troubleshoot any issue with docker machine.

Try the rancher fork: GitHub - rancher/machine: Machine management for a container-centric world which seems to be receiving updates.

1 Like

Thanks - I see the generated docker-machine command from ASR VIEWCMD 50 assumes a default VPC/Subnet which I do not have set - causing the generated command to fail. Requires help from AWS to actually set that.

Progress!

3 Likes

Successfully launched a machine manually using the ASR generated command. I do have VPC/Subnet specified but the primary discovery was that I had the security group wrong - it’s the name, not the ID!

I was able to test the generated machine by logging in as ec2-user and I see that docker is not loaded (as expected for AWS Linux).

Next stop, running from the config file, aws.json, with a different version of Linux.

1 Like

Here’s running the docker-machine manually - I set retries to 10 but that had no effect on the time-out waiting for ssh.

Running pre-create checks…
Creating machine…
(debug-machine) Launching instance…
Waiting for machine to be running, this may take a few minutes…
Detecting operating system of created instance…
Waiting for SSH to be available…
Error creating machine: Error detecting OS: Too many retries waiting for SSH to be available. Last error: Maximum number of retries (60) exceeded

The instance is coming up and accessible but without reaching the docker version load (19.03.9.sh).

Since I don’t have a default VPC or Subnet yet I can’t test from the config aws.json. Also, it requires specifying the zone (c) in concert with the region (us-east-1). I’m curious what that JSON spec looks like for the aws.json config files - will it take any/all of the parameters in the docker-machine spec? Is that from docker-machine or ODM?

1 Like

I’ll spend some time trying to extend this and hopefully generate a pull request.

2 Likes

Thank you :pray:

1 Like