IaC build of WebODM, ClusterODM, NodeODM using Terraform, cloud-init and GitHub Actions

New build posted. GitHub - kendrickcc/odm-azure-wf3

So will be important to get Rclone working locally and have a good rclone.conf file. It will be copied to the VM as part of the build.

1 Like

This is really interesting, I also work with ODM on Azure. I am really missing the autoscaling feature which brings the benefit of the cloud. There is proxys you can set up to get S3 api to a Storage account in Azure. So maybe worth a try setting that up.

However, what I did was to invent my own logic where I spin up a new virtual machine and run a docker node via Azure Serverless function. The function is written with the ODM Python SDK. Then I have that function hooked up to a file storage account which is mounted to the function and can upload the images from that path. After the job is done it is downloaded via the webhook and another function deletes the virtual machine. I have also wrtitten a function which calculates the cost of that job and sends it via mail/chat.

If it would be of interest I could share that code in a Github repo?

Only thing I dont like is that I have to write all the logic myself and handle any potential error. You dont want a VM to be left running or even worse that the script bugs out and spins up a bunch of VMs, especially when you deal with pricy GPU.

So linking up with ODM Autoscaling for its cluster would be great!

Best Regards,
Rasmus

2 Likes

I would really like to see your repo. I looked a little at Kubernetes as well but was not sure it was a fit. I appreciate you chiming in.

1 Like

Yeah, so far I dont have any IaC to set it up, should probably create that so its easy to deploy. Involves some Managed Identities and .sh script to hook up the file storage to Azure Function.

Yeah, I saw someone working on a fork with kubernetes. I am just curious how it would work. I mean I can see WEBODM deployed as microservice on kubernetes. But how would the logic scale with worker nodes?

1 Like

With managed identities, this is where GitHub secrets can really help. Used in conjunction with GitHub Actions and it really opens up a lot of possibilities.

1 Like

Managed identities are setup between function and resource group in Azure.

1 Like

Can you describe how you want to use Github Actions? Except pushing out the infrastructure for ODM as CI/CD pipeline.

btw, I find the idea to push it out as IaC really good, would be a great addition to the central odm github. Enter credentials and then boom ODM cluster is setup.

1 Like

Well, basically it is CI/CD except that I set the trigger to “on: workflow_dispatch”. So rather than it kicking off an action once a commit is executed, I manually trigger the workflow. This allows me to also destroy on demand. With “workflow_dispatch” you then navigate to Actions, then select the appropriate workflow, then initiate the workflow.

Agree, why I built this. I typically run this on my MacBook but I really hate allocating it to these jobs. So I wanted something I could repeatedly deploy, know that I’ve got the latest ODM builds, and able to deploy to different cloud vendors if needed. Push button deploy, push button destroy.

1 Like

Yeah, brilliant! IaC ftw. Will set my up in Bicep! Perhaps terraform as well!

1 Like

@Cken using your wf3, which is also brilliant for my needs! I’ve been able to create a 1-click windows batch workflow that runs through the terraform commands to spins up the Azure resources, rclones files from an azure blob container that were uploaded via Azure storage explorer, start the docker run process; waits for docker run to complete, then syncs the output files back to the blob storage where I can download them at my convenience. Then I have another windows .bat file to click to destroy the environment.

To do this, I added the following section to main.tf if it is helpful:
sudo --set-home --user=odm rclone copy [myrcloneremote:mycontainer] /odm/datasets/project --config="/home/odm/.config/rclone/rclone.conf"
sleep 1m
docker run -ti --detach --rm --tty --name odm_container -v /odm/datasets/project:/datasets/code opendronemap/odm --project-path /datasets --pc-ept --build-overviews
sleep 1m
docker wait odm_container
sudo --set-home --user=odm rclone sync /odm/datasets/project [myrcloneremote:mycontainer] --config="/home/odm/.config/rclone/rclone.conf"

The only remaining piece that I haven’t figured out yet is how to add an auto shutdown section to the terraform provisioning that I can set to something like midnight in my timezone, so at least it will stop the VM if I forget about it. Then I can destroy the environment when I return without incurring much cost.

1 Like

That is great and something I will certainly add to my builds. As for the shutdown, could be something as simple as adding a cron job to the build to shutdown.

Append an entry to /etc/crontab

0 1 * * * root /sbin/shutdown -h now

Will shutdown at 1:00 AM.

Or even a more simpler way, just execute shutdown…
shutdown -h 23:59

Will need to check the time of the VM to make sure…

1 Like

I was totally over complicating it. I suppose the shutdown could be added to the cloud-init script after a short delay to accommodate the final rclone sync.

1 Like

Actually it looks like running shutdown command from the script won’t stop the charges. Still need to stop the VM manually or via the ARM.

“If you shut down your VM form the OS, it will continue being billed . If you stop the VM from the management portal or through PowerShell, you won’t be billed for the VM. Please make sure that the VMs are in the status “Stopped (Deallocated)” in the Windows Azure management portal.”

1 Like

Thanks for reminding me. So maybe something that can run as a Windows or Mac cron job? Run Terraform destroy as a local job?

1 Like

What about this via Terraform? Configure Azure VM To Auto Shutdown Using Terraform – Learn IT And DevOps Daily (ntweekly.com)

I tried coding this in but was stuck on how to define the virtual_machine_id.

1 Like

Interesting. This is the Terraform document:
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/dev_test_global_vm_shutdown_schedule

But there appears to be a mistake, so will need to watch this closely. Anything with dev_test can be buggy. For the build I provided, the code should look like the following:

Insert at the very bottom, or insert at line 127. It can actually go above line 86. Terraform code does not have to follow an order. You could even create a whole separate .tf file just for this snipet. Then just rename .tf to something else if you decide not to use it.

resource "azurerm_dev_test_global_vm_shutdown_schedule" "nodeodm" {
  virtual_machine_id = azurerm_linux_virtual_machine.nodeodm.*.id
  location           = azurerm_resource_group.rg.location
  enabled            = true

  daily_recurrence_time = "2300"
  timezone              = "Central Standard Time"

  notification_settings {
    enabled         = false
  }
}

The Terraform documentation shows that you can add an email address to the notification block. May be worth checking out. Will need to change false to true.

Be sure to execute terraform fmt and terraform plan before running the apply. This will likely download the additional module needed, and then let you know if it will work against the Linux VM, or if there any other issues. I don’t have time today to really test it but will try to do that soon.

I really appreciate all the feedback!

1 Like

I forgot about the count for the servers. I really need to test this to make sure I got the virtual_machine_id resource line correct.

1 Like

Have you guys thought about how to add the auto scaling mechanism?

1 Like

That was my original intent but there only appears to be code for AWS and Digital Ocean. Would really appreciate any input in how to do this in Azure.

1 Like

This is where I was stuck trying to figure it out :slight_smile: Here’s the result of code above:

│ Warning: Values for undeclared variables

│ In addition to the other similar warnings shown, 2 other variable(s) defined without being declared.


│ Error: Incorrect attribute value type

│ on main.tf line 87, in resource “azurerm_dev_test_global_vm_shutdown_schedule” “nodeodm”:
│ 87: virtual_machine_id = azurerm_linux_virtual_machine.nodeodm.*.id
│ ├────────────────
│ │ azurerm_linux_virtual_machine.nodeodm is tuple with 1 element

│ Inappropriate value for attribute “virtual_machine_id”: string required.

1 Like