So I’ve been running some trials on very large datasets on some very large nodes (nearly 1TB RAM). My biggest headache at the minute is the storage of the data during processing. I know that
--media-dir exists, but I don’t think that it’s enough as the Docker volumes swell to extreme sizes. I’m trying to build it to scale out like this (edit: please excuse the formatting, it’s not easy in the editor):
Control Node (Ceph block + CephFS)
Proc 1 - Proc 2 - Proc 3
CephFS Shared Storage (i.e. infinite storage)
The drama starts when trying to get Docker to play nicely with CephFS, which it doesn’t. I could use block storage, but that doesn’t solve the problem of infinite size (i.e. one large share) and it’s hard to know how large I should make each processing node in that regard - 500GB is eaten like it’s nothing, 2TB is a snack…
How is everybody else doing this? Are there any other parameters that I can pass to ODM/Docker to force it to store on CephFS? Currently overlay2 is freaking out when trying to place /var/lib/docker on CephFS.