ITWarrior

#1

What do you do?
Hi folks, I’m a Systems Integrator / Infrastructure Architect / Software Developer from the UK, but moved to New Zealand 7 years ago. I run my own IT company in New Zealand (and the UK), and we mostly do system design and system integration, usually through custom built software. We do a lot of work for ISPs, Government departments and the Military.

How did you get into drones?
My family came to visit at Christmas, and my brother brought his drone with him - a Mavic Pro. I was pretty impressed with how far these had come, having (poorly) flown model helicopters about a decade ago. My 30th birthday was in January, and my family bought me a Mavic Pro as a surprise. It didn’t take long before I flew it a couple of times to realise that there was no challenge in flying these drones at all, and I got bored after a few flights. So I took it apart and had a look inside, and also started hacking at the firmware to see what these things could do. It turns out, Mavic Pros will do 91.1km/h with original running gear with a couple of software ‘fixes’, though the gimbal starts to really freak out over about 80km/h.

After getting tired of the “Gimbal motor overload” and “Overcurrent” warnings through my goofing around, I looked at what else could be done with this new toy without sending it to an early grave. I looked into aerial photography (real estate etc) as a side business but every man and his dog are doing that already, and it’s a race to the bottom on pricing - it’s already hardly worth starting the car for and the prices of drones are falling, making them available to more and more people. It’s also easy, so competition can come from anyone at any price. Not good news for business, so I ditched that idea before I’d even started.

After a chance meeting in February with a bloke in a pub that was doing aerial photography and looking to get into surveying but prohibited by the frankly outrageous cost and limitations of the propriety providers, I started looking into building an alternative. I stumbled across ODM during my research, saw that it had an API and was scalable and thought I’d give it a try. I’m glad I did.

How are you using/hoping to use the software?
I’ve started a little side arm to my IT business, essentially providing the UI and compute in an easy to use way. The model is similar to Lightning, though I’ve added a wrapper around ODM to simplify a lot of stuff like managing GCPs, user accounts, parameters etc. I’ve kept things simple by not modifying ODM at all so future updates are easy to integrate.

What are you working on currently? Any projects you’d like to share or talk about?
So I’ve launched AeroSurvey now but with the cost of setup, compute, storage, cooling, electricity etc etc it’ll be a long, long time before it shows a profit, if ever, but I’ll keep improving it, especially if people start to use it properly. It was entertaining to make at least. I don’t have any other (semi-personal) projects on the horizon yet, though I’m flat-tack with work so I’m not exactly bored. We’ll see what the future brings…

2 Likes
#2

Love what you did with AeroSurvey! A very ingenious way to use the embedded iframe for model/map display. Are you running it on a bare metal machine for storage I assume since you mention cooling costs?

Also didn’t know you could mod the mavic like that, I should look into it…

#3

Everything I have is on my own tin apart from a few very exceptional things. The cost to run things in the cloud by measuring either a dollar amount or as a data sovereignty/protection risk are astronomically high. It really bothers me how many companies have fallen for cloud marketing without doing all of the maths, or even part of the maths. In most cases they’ve pretty much swallowed the marketing in full as fact. I have customers come to me all the time saying “We want to move to the cloud” and when I ask “Why?” the number one answer is “That’s where IT is going/what everybody else is doing”, followed closely by “We’re trying to reduce our costs”. I should be shocked at the sheer ignorance of the middle managers making these kinds of purchasing decisions, but I’m not. It has been this way for eternity; people being too lazy to do the maths when spending other people’s (the business’) money.

Let me explain what I’m talking about with some simple 2 minute maths. Here’s an example of how much a decent spec ODM processing node might look in Amazon; an m5d.12xlarge, which has 48 cores, 192GB, and a 900GB SSD. All for the low, low price of $5/hour (correct as of March this year, so prices have probably gone up). That’s $120 a day, or $3,600 a month.

Look at the price of off-lease servers in your area - New Zealand is expensive generally so they might be even cheaper where you are, but you can get off lease RAMless diskless R720s now for $120-150. Roughly equivalent to one day of Amazon compute (whether you use it or not), for a twin CPU, 12/16/20 core. For ECC DDR3, RAM is now at around $2-3 per GB. I’ve bought off-lease locally and also new from eBay where I couldn’t find any 16GB sticks locally - 16GB sticks are around $50 NZD on eBay brand new. So to match the Amazon spec, that’s 12X16GB, or (12x$50) = $600 for 192GB.

Disks are cheap now, you can get 1TB SSDs brand new from $200. So for a roughly equivalent compute node, you’re looking at about $920 - approximately 6 days of AWS compute time for an m5d.12xlarge.
Electricity and cooling comes in at roughly $2-3 a day per server. Dual power/network datacenter rack costs are $1200/month for an ENTIRE 42U rack in Wellington (NZ), and Wellington is generally expensive compared to other parts of the world for rack space - but you’re still well under the cost of AWS, and the gap only widens with time as your one-off hardware costs are carried over several years.

So if you consider a full 42U rack of the same spec machines for a year:

42/2 (2U servers) = 21

21x$920 (compute) = $19,320
21x$3x365 (power) = $22,995
12 x $1200 (rack) = $14,400
Total $56,715

AWS price:
21x$3,600x12 (m5d.12xlarge) = $907,200

A MILLION DOLLARS!

Every time I’ve done an AWS calculation like this the numbers are way out vs doing it yourself. I just don’t understand what calculations people are doing (if any) to conclude that cloud is worth it, despite all of its other shortcomings (like data sovereignty, control, trust, increasing prices, the extreme difficultly of getting your data OUT, etc etc etc). And the calculations above are at the ‘large’ scale where people (marketing echo-chambers) are claiming that AWS is most effective. And don’t be fooled into thinking that AWS are using all current gen hardware - they’re not, because they don’t have to, because people aren’t demanding it. And even if they were, and you buy equivalent spec new machines, you’re still way under AWS costs for just one year.

It gets worse than that when you start talking tax. AWS are not GST/VAT registered in most countries, while the cost for your own gear is reduced by 15-20% on the above prices because you can reclaim the GST/VAT tax component. It gets worse still - you can depreciate the cost of your own asses and offset it against the business profits in future years; you can’t depreciate a service that’s provided to you by Amazon. So essentially, even sitting on a shelf, the compute you buy as physical hardware eventually balances out to zero through depreciation, which is considered a business loss. And consider the fact that AWS prices are climbing while your ongoing compute amounts to zero (because you already paid for it), and power cost is generally insignificant - you could have each server on its OWN generator and still come in under AWS. Also consider all the hidden extra bullshit that AWS charge, like network data charges, disk and snapshot charges, etc etc - I really do just want to violently shake people sometimes - “WAKE UP, IT’S A TRAP!”. And I don’t buy the argument that staffing costs to run your own gear is expensive either; for a million bucks a year you can hire or contract a LOT of good people and still have change for a new Lamborghini every year that you can wrap around a tree.

I was in a meeting a couple of years back at a company where the General Manager of IT (salary $250k+) was talking about the ‘cost savings’ for cloud and had two spreadsheets - one for local, the other for cloud. The local sheet was an accurate reflection of what IT costs they had incurred from every wire to every IT staffer, and the cloud sheet was what he considered to be the equal, and the numbers were not that dissimilar but cloud was slightly cheaper. Ignoring for a moment the huge swathe of licence costs for various applications that were missing from the cloud sheet (the second largest cost in the local sheet), I asked about the largest cost: staffing. I asked “Will Amazon be provisioning and managing servers and supporting users on these new servers, or will you need to add all of the existing staff to the cloud sheet too?”. The room fell silent for a rather long time.

From the (generally piss-poor) calculations I’ve seen people do, they’re often including (rightly so) VMware in their operational costs, which is usually one of the larger expenses of local IT. But instead of recognising VMware as a tumour that needs removing from the environment, middle damagers look at replacing the entire environment instead like some cult leader. In other words, they haven’t correctly identified the money haemorrhaging problem as being with VMware and looking for alternatives to that problem first, like Proxmox.

Ever wondered how Bezos got to be the richest guy in the world by a HUGE margin in just a few years? Look no further than your middle manager deciding that “cloud” sounds like a cool idea because all the other middle damagers said it was a good idea. And don’t even get me started on Office 363!

It really rattles my cage.

But to answer your question, yes, bare metal :smiley:

I’ve got the dashboard, the job distributor and processing nodes all on KVM with all the storage on Ceph RBD. The reason for this is so that I can move everything around while it’s running to take nodes offline for maintenance or add resources; no need to terminate or pause running tasks. I’m happy in the few percent performance trade-off for the migration benefit KVM provides. It should also scale well with this design, and fault tolerant. The piece that I haven’t done yet is being able to lock off a processing VM so that new tasks aren’t assigned to it so that I can also take VMs offline without impacting running jobs. Currently, all jobs are tipped into the API and I allow WebODM to choose the node, but I’ve got a plan to ignore the WebODM task provisioner and instead implement my own pre-queuing system at the front and select nodes using processing_node instead, which should allow nodes to be ‘drained’ without general service impact.

The disks under Ceph were 2TB spinning disks to a total of 24TB spread across the nodes as 2TB disks were the most cost effective capacity at the time. However, performance was shockingly poor, so I doubled the number of disks to try to improve this but it made little difference. I ended up ripping them all out and replacing them with 2TB QVO SSDs because it was random IO that was killing it so the QVOs should be fine as I don’t need sustained writes. So far, they’re OK, but I’m keeping my eye on them. Storage has been the most expensive component in the whole project by a huge margin.

The network is also very simple and inexpensive; it’s just gigabit ethernet connected to a decent sized switch. I’ve got the interfaces bonded and connected to an LACP-capable switch so I’m getting 2 gigabit throughput for Ceph, which is plenty for now. These Dells have 4 ethernet ports, so I have scope to grow to a theoretical 4 gig before I need to start looking at 10 gig hardware.

The VMs themselves are small, ~16GB provisioned as RBD, then the folders are CephFS (kernel) mounted for infinite online growth (exabytes). It’s all scripted, so rolling a new processing node takes minutes:

wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add -

sudo apt-add-repository "deb https://download.ceph.com/debian-luminous/ $(lsb_release -sc) main"
apt-get clean all
apt-get update
apt-get -y install ceph-common

mkdir -p /etc/ceph/
echo '<Key>' > /etc/ceph/admin.keyring
mkdir /www
mkdir /code
mkdir /webodm
mkdir /swap
cat << _EOF_ >> /etc/fstab
10.0.0.5:6789,10.0.0.6:6789,10.0.0.7:6789:/$(hostname -f)/www /www ceph name=admin,secretfile=/etc/ceph/admin.keyring,noatime,_netdev 0 0
10.0.0.5:6789,10.0.0.6:6789,10.0.0.7:6789:/$(hostname -f)/code /code ceph name=admin,secretfile=/etc/ceph/admin.keyring,noatime,_netdev 0 0
10.0.0.5:6789,10.0.0.6:6789,10.0.0.7:6789:/$(hostname -f)/webodm /webodm ceph name=admin,secretfile=/etc/ceph/admin.keyring,noatime,_netdev 0 0
10.0.0.5:6789,10.0.0.6:6789,10.0.0.7:6789:/$(hostname -f)/swap /swap ceph name=admin,secretfile=/etc/ceph/admin.keyring,noatime,_netdev 0 0
_EOF_
mount -a

dd if=/dev/zero of=/swap/swap bs=1G count=300
mkswap /swap/swap
chmod 600 /swap/swap
echo 'vm.swappiness = 1' >> /etc/sysctl.conf
sysctl -p

echo '/swap/swap swap swap sw 0 0' >> /etc/fstab
swapoff -a
swapon -a

# Install from Native script
wget https://<Local_Store>/odm-native.sh
bash odm-native.sh
Production Setup
#4

What a great analysis. I’m also a big fan of bare metal setups, I think there are certain uses for the cloud (and there are many), but it’s not a silver bullet as many would like you to think it is. AWS in particular I believe is often sold to organizations and people that have no real need for the availability / features that it offers and they would be better off with other providers (or bare metal) and end up overpaying by a LOT.

Since you mention building a pre-queuing system, make sure to take a look at https://github.com/MasseranoLabs/nodeodm-proxy which is what we’re currently using for the lightning network. It’s interesting you mentioned the ability to lock a node, I’ve had this issue in the TODO list for a while now: https://github.com/MasseranoLabs/nodeodm-proxy/issues/11

1 Like