Autoscaling using ClusterODM + AWS

A key element of the docker install shell script is lsb_release - and it is not part of the AWS Linux distro. To be recognized (or hope to be) by the docker install script, have to add with -

‘sudo yum install redhat-lsb-core’

It remains to be seen whether this will allows AWS Linux to be recognized as like REHL/CENTOS/FEDORA or not - all of them resolve to the Centos docker install.

Note that this should be added to the instance of the target AMI and a new AMI created.

1 Like

So you’ve made a new IAM user through the dashboard with the required permissions?

I actually have the ec2-user with the required permissions. Like I said, I’m not sure what it means by “an account for Cluster ODM to use”. Since the shell goes back to get the user (ec2-user is a non-root user), this probably just means that a non-root user in the docker group.

1 Like

So, what I’ve embarked upon is an update to the ./19.03.9.sh script that installs the correct docker version into the provided AMI launched instance. Since the Linux variants are nicely broken down, I’m simply adding “amazon”, the lsb_release response of my instance, into the script. I’m then testing this. The yum categories, rhel, centos, fedora, (and now amazon) resolve to centos docker-ce repo. Unfortunately, I’m getting this error finding the repo to load and would likely happen to all the linux variants in this category. Error below -

./19.03.9.sh

One of the configured repositories failed (Docker CE Stable - x86_64),
and yum doesn’t have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work “fix” this:

 1. Contact the upstream for the repository and get them to fix the problem.

 2. Reconfigure the baseurl/etc. for the repository, to point to a working
    upstream. This is most often useful if you are using a newer
    distribution release than is supported by the repository (and the
    packages for the previous distribution release still work).

 3. Run the command with the repository temporarily disabled
        yum --disablerepo=docker-ce-stable ...

 4. Disable the repository permanently, so yum won't use it by default. Yum
    will then just ignore the repository until you permanently enable it
    again or use --enablerepo for temporary usage:

        yum-config-manager --disable docker-ce-stable
    or
        subscription-manager repos --disable=docker-ce-stable

 5. Configure the failing repository to be skipped, if it is unavailable.
    Note that yum will try to contact the repo. when it runs most commands,
    so will have to try and fail each time (and thus. yum will be be much
    slower). If it is a very temporary problem though, this is often a nice
    compromise:

        yum-config-manager --save --setopt=docker-ce-stable.skip_if_unavaila                                                                                                                                                             ble=true

failure: repodata/repomd.xml from docker-ce-stable: [Errno 256] No more mirrors to try.
https://download.docker.com/linux/centos/2/x86_64/stable/repodata/repomd.xml: [E rrno 14] HTTPS Error 404 - Not Found

1 Like

O.K. - now back from the exercise to get local ClusterODM executing properly. See that thread for the fix - @pierotofy identified the fix, I was ahead in the node versions and had to drop back to 14.17.0.

Now with the --asr flag set to my aws config, the docker machine is trying to create a new machine! As expected, with an AWS AMI, that is going to fail - and it did.

info: Trying to create machine… (1)
warn: Cannot create machine: Error: docker-machine exited with code 1
warn: Could not remove docker-machine, it’s likely that the machine was not created, but double-check!
warn: Cannot create node via autoscaling: Cannot create machine (attempted 1 times)
warn: Cannot forward task 29ae7c90-8214-45f9-ac22-8ca1de211884 to processing node 10.0.0.181:3001: No nodes available (attempted to autoscale but failed). Try again later.

So, next try is to use an Ubuntu AMI…

1 Like

So, I’ve tried about half a dozen different AMI’s using AWS Linux, RHEL, and Ubuntu. Nothing getting me past the basics of docker-machine as shown below. Following the dockermachine class code it appears to try spawning a machine and fails.

info: Found docker-machine executable
info: Loaded 1 nodes
info: Loaded 2 routes
info: Starting http proxy on 3000
info: Trying to create machine… (1)
warn: Cannot create machine: Error: docker-machine exited with code 1
info: Trying to create machine… (2)
warn: Cannot create machine: Error: docker-machine exited with code 1
info: Trying to create machine… (3)
warn: Cannot create machine: Error: docker-machine exited with code 1
warn: Could not remove docker-machine, it’s likely that the machine was not created, but double-check!
warn: Cannot create node via autoscaling: Cannot create machine (attempted 3 times)
warn: Cannot forward task 5d400299-07f8-4327-8876-d9e22aa8d7f0 to processing node 10.0.0.181:3001: No nodes available (attempted to autoscale but failed). Try again later.

Exit code 1, going by docker convention, suggests an application error. Looking through ClusterODM.log shows nothing of interest.

A few things I’ve considered:

  1. varying the OS versions
  2. making sure the user (ec2-user for RHEL and AWS, ubuntu for Ubuntu) has correct permissions
  3. Updating and logging into an AMI before creating another AMI of that so that the cred’s are in .ssh.

If I can get past machine creation, then I fully expect AWS Linux to fail since the 19.03.9.sh engineInstallURL doesn’t account for amzn. (BTW - that shell script is awfully similar to the docker one here - https://get.docker.com.) However, that doesn’t explain the success others have reported by others using an Ubuntu AMI.

I would guess I’m missing something in my environment. I saw drivers used in docker machine creation for virtualbox. Do I need that installed? Are there specific OS releases that have worked?

Any insights would be greatly appreciated!

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.