MariaDB Galera rolling operations (paved road)
Paved road case study.
Scope & windows
Multi-node MariaDB Galera cluster behind a load balancer.
Rolling node ops during short windows; no write loss.
Role
Platform/automation engineer (lane design + implementation).
Approach
Opinionated lane with tags: preflight → change → validate.
Rolling execution with serial: 1 across the hostgroup; any_errors_fatal: true.
LB drain per node via HAProxy runtime socket; wait until 0 live conns (runtime JSON).
wsrep gates before/after: Synced, ready=ON, cluster Primary.
Validate with retries/delays; enable on LB only after gates pass; simple write probe.
Job logs = audit trail; runs pin a git ref for repeatability.
Results
Predictable windows; fewer manual steps; lower incident risk.
Clear pass/fail gates; easy to pause/rollback per node.
Auditable runs (who/what/which ref).
Confidentiality
Client artifacts can't be shared.
Examples are anonymized and recreated; configs, names, and IPs are placeholders.
Receipts use the actual stack and are representative.
Code snippets
Prereqs:
HAProxy admin socket (e.g. /run/haproxy/admin.sock)
socat & jq on LB nodes
MySQL auth via ~/.my.cnf or vault
1) Top-level playbook — roll 1-by-1, fail closed
Ansible - galera-rolling-updates.yml
2) Preflight.yml — cluster OK, then drain + wait to 0 live conns
Ansible - preflight.yml
3) change.yml — example node ops (service update/restart)
Ansible - change.yml
4) validate.yml — retries + enable only when green
Ansible - validate.yml
5) Rundeck job — run the whole hostgroup (serial in play)
Rundeck - job.yml