I don’t recommend running this setup in a production environment — it’s an interesting prototype, but gives no control over the bootstrap process (it essentially relies on brute-forcing a race condition on container startup) which can lead to unreliable and unpredictable behaviour.
Several months ago I was looking at deploying a MariaDB cluster using Galera into our Docker Swarm platform. The end-goal was to implement a zero-touch solution for deploying and scaling MariaDB in Docker which could be fully automated.
Deploying MariaDB in Docker is straight-forward using the official image. Performance tweaks are similar to those required when running in a traditional VM or bare metal environment, and the config for these tweaks can be baked into a new image or managed through environment variables.
There are however two main challenges faced when deploying an application like MariaDB in Docker:
Usually, I’ll avoid persistent storage in Docker applications as it can cause issues with scaling across multiple nodes**, preferring instead to utilise external storage services such as our in-house S3 platform. This solution is perfect for object storage, but doesn’t lend itself towards the constant disk access of database workloads.
Instead, we rolled out three SSD-equipped Docker hosts in different data centres and mounted those SSD’s on Docker’s Volume directory. For each new database instance, we create a new Docker Volume (on the SSD) for database storage.
If we try and run MariaDB with this set up and then scale the service to more than one instance on the same node things will start to break due to accessing the same underlying storage from multiple instances of MariaDB, so it’s important that we limit the service to one container per database instance per node. We achieve this by running the service in Global mode across the three Docker hosts, where Global mode runs a single container on every host in the swarm (or a subset of hosts when used with Docker’s service constraints).
With most applications in Docker, we’d need to manage replication of data between all of the hosts in the swarm to ensure the application is in a consistent state wherever the containers are started. Fortunately the key feature of MariaDB with Galera is application-level clustering which takes care of state (data) synchronisation for us.
Bootstrapping the Cluster
The second challenge is to find a way to achieve our zero-touch goal. Getting to the point of deploying a MariaDB service and being able to scale it is relatively easy, but requires that we already have an operational Galera cluster in place for the new nodes to join.
By default, MariaDB with Galera clustering enabled will only join an existing cluster and won’t start a new cluster without manual intervention. This is a very sensible precaution on Galera’s behalf as it prevents a split-brain scenario (where two independent instances of the cluster are running with diverging data) due to network partitioning or if all of the nodes get restarted at the same time. This bootstrapping operation only needs to be performed on a single node; you’re essentially telling the node to become the master node and take control of the cluster.
The biggest challenge in our Docker environment is knowing when to bootstrap a new cluster, which requires a bit more intelligence. Automating the process of boostrapping a single node with homogeneous containers is tricky because we need to infer the state of the cluster to identify a single instance from which to boostrap the cluster.
Selecting a Container to Bootstrap the Cluster
When a new container starts we don’t have any shared state to help us identify the state of the cluster, so we need to rely on some external (to the application) factor to tell the container to bootstrap the cluster. To achieve this we can exploit Docker Swarm’s service networking model. Specifically, we can infer details about the state of our parent service and the other containers (tasks) running as part of that service.
Docker exposes DNS endpoints for the tasks running under each Service, and a call to
getent hosts tasks.$SERVICE_NAME will return a list of the containers running as part of that service, with their IP addresses. In our entrypoint script, we attempt to bootstrap if we’re the “first” container (i.e, the container with the lowest IP address).
Avoiding Split Brain
After we’ve decided we should attempt to bootstrap we then loop through the other containers running as part of the service, attempting to connect to each one in turn. If we can connect successfully we check to see if that instance is part of a running cluster, and if so we configure MariaDB to join that cluster.
If none of our sibling containers are part of an existing cluster, we proceed with bootstrapping a new one by configuring MariaDB appropriately. As the other containers restart (or as the service is scaled and new containers are added), they’ll configure themselves automatically to join our newly created cluster using the same DNS-based service discovery.
Building the Image
We’ve successfully implemented a fully automated solution for deploying and scaling MariaDB clusters using Galera that handles bootstrapping and persistent storage with no manual intervention required. You can see the resulting image here:
** This isn’t strictly true, but it’s a good generalisation for many applications.