For additional context, see my other posts on autoscaling. If others have a working autoscaled environment using AWS I’d love to hear about it - specifically wrt the security environment setup.
I’ve been working to get autoscaling working in an AWS environment. Yesterday was spent working through the details of the 19.03.9.sh (supported by rancher) script that installs the docker environment. In that script are several different OS’s - RHEL, Centos, Debian, Fedora - but no Amazon. I was testing a fresh instance and running the script by hand to see what modifications would be needed to support my OS. Generally, not much change is required. The Centos install is largely used as the Centos, RHEL, Fedora, OracleLinux, catch all since they all use Yum as their package manager. Among other things, that script uses lsb which is not part of the AWS Linux install, although that’s easy enough to get around with this: ‘sudo yum install redhat-lsb-core’, and to make that part of the environment before image creation. However, the version number that is pulled is always ‘2’, the version reported by lsb (2/Karoo) which winds up inserted into the Centos repo path name for installation. However, this appeared to have come from my yum repos (/etc/yum.repos.d) rather than as part of path generation. I disabled that repo, and it attempted to use the correct Centos repo, failing only because a repo dependency was missing.
Upstream of the docker installation process, I still have connection errors in the launch of the autoscaled instance. WebODM hangs during uploading images and eventually fails on a connection error. Not much by way of debug that I’ve found thus far. However, I never see a spot instance request or an instance appear on the AWS side, which makes me think I still have major issues in the IAM/permissions side of the house. I’ve got ssh, tcp 3000-3001, http, and https open in my security group for the instance/user and have confirmed in the engineInstallURL script that this is the user ($USER) that is used there. The user has a full set of priv’s spanning S3, EC2, autoscaling, and IAM. When I launch ClusterODM with node, it confirms S3 access, docker machine, etc.
Finally, as perhaps a pointer that others will recognize to identify the failure point, I cannot get any alternative AMI’s working and I’ve tried many - Ubuntu, RHEL, Suse, Centos.