I’ve been running a few maps on AWS, and its been working just awesome. So far ive made a few maps from 200ish photos, and they’ve all taken a couple of hours to finish.
But my latest map, made from 995 images never seems to finsh. 24 hours so far, and nothing much happening.
Im running it using clusterODM using autoscaling for setting up the processing nodes, so its running on a t3a.2xlarge machine (8vcpu/32G ram) But the cpu has been more or less idling the last hours.
And the logs from the processing shows nothing much going on:
2020-03-31 01:59:29,526 INFO: DJI_0302.JPG.modified.jpeg resection inliers: 5651 / 5651
2020-03-31 01:59:29,704 DEBUG: Ceres Solver Report: Iterations: 5, Initial cost: 8.765221e+02, Final cost: 3.457865e+02, Termination: CONVERGENCE
2020-03-31 01:59:29,708 INFO: Adding DJI_0302.JPG.modified.jpeg to the reconstruction
2020-03-31 01:59:30,157 INFO: Re-triangulating
2020-03-31 01:59:30,236 INFO: Shots and/or GCPs are well-conditionned. Using naive 3D-3D alignment.
2020-03-31 02:16:55,735 INFO: Shots and/or GCPs are well-conditionned. Using naive 3D-3D alignment.
2020-03-31 04:27:52,562 DEBUG: Ceres Solver Report: Iterations: 20, Initial cost: 7.536832e+04, Final cost: 4.117475e+04, Termination: CONVERGENCE
By comparing the logs on the working old job on 200ish img and this one, I’ve concluded its not normal beaviour. In the working one, it took 1 minute from the last Ceres Solver Report before it moved on to Undistorting. So I deleted the job, and will do a reflight
Possibly low RAM for that sort of job size. I’ve got a couple of jobs running right now on different nodes in the 1500 range that are at about 150-200GB RAM utilised. Perhaps the ODM process was OOM’d and didn’t update the log.
Also I’ve found that in general AWS use the crappest hardware that they can get away with. I’ve seen VMs with 24 core 1.8GHz Intels in them like the E5-2448L. And of course, you’re paying by the minute, so what do they care that your jobs take twice as long because they use garbage CPUs? Bezos truly is a dirt ball when it comes to exploiting every morsel he can out of anything. I’d recommend using Digital Ocean or Vultr over AWS all day long. I use my own hardware, which works out cheapest of all, but seriously, AWS will rob you blind. It makes my blood boil how many AWS evangelists there are out there too - they’re generally either the people that don’t foot the bill, or haven’t done the calculations for TCO.
Run “cat /proc/cpuinfo” to see what you’re on. Here’s one from Vultr for comparison ($6/month node, flat rate):