Galera – Information and Maintenance

Ringfree’s PBX databases exist within a MySQL/MariaDB clustering scheme known as Galera. Essentially it’s a high-availability, multi-master, self-healing solution for data redundancy. Ringfree’s Galera cluster consists of five (5) nodes in five (5) locations distributed amongst three (3) infrastructure providers. Those nodes are as follows:

  • galera-atl-02.ringfree.biz – Resides within Ringfree operated infrastructure in the Atlanta DC.
  • galera-dfw-03.ringfree.biz – Running on a vultr.com VPS operating in Dallas, TX, USA.
  • galera-nyc-02.ringfree.biz – Running on a digitalocean.com VPS operating in New York, NY, USA.
  • galera-sfo-02.ringfree.biz – Running on a digitalocean.com VPS operating in San Francisco, CA, USA.
  • galera-fra-02.ringfree.biz – Running on a digitalocean.com VPS operating in Frankfurt, Germany.

Accessing the Nodes

To gain access to the Atlanta node, first access papalmainframe.ringfree.biz via SSH, run the rfc command to access the list of available servers, enter node003.ringfree.biz and run rfctl enter 4026 to gain root access.

To gain access to any of the other nodes, you can SSH into them directly.

Once in a node, the only service of any relevance that is running is MariaDB. Apache may be running but anything currently being served is obsolete and has been out of service for quite some time. You can gain root access to the MariaDB instance using the following:

mysql -u root -p

And entering the appropriate password.

General Galera Cluster Information

There are two primary markers to observe when checking the health and status of the Galera cluster: the cluster size, and the state UUID. If the cluster size is ever anything other than five (5), then there’s a problem somewhere. If the state UUID is different between nodes, then those nodes are out of sync, thus indicating a problem somewhere. To check the cluster size, run the following as the MariaDB root user:

SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';

To check the state UUID for any given node, run the following as the MariaDB root user:

SHOW GLOBAL STATUS LIKE 'wsrep_cluster_state_uuid';

If you wish to check all available Galera status indicators, use a wildcard in the query:

SHOW GLOBAL STATUS LIKE 'wsrep_%';

For additional information please check the official documentation.

Node Troubleshooting and Repair

By and large, Galera is extremely reliable and is generally both trouble and maintenance free. In the event that a problem is identified, it’s usually following a large scale network outage or a longer-than-intended maintenance period with one of the third party infrastructure providers.

If a node is not operating/syncing normally (as indicated by the cluster size and state UUID), often the fix is to simply restart the service:

sudo systemctl restart mysqld

Depending on how far out of sync the node is, the restart may time out several times before the service successfully starts up again. What has worked in the past has been to try to restart it twice, then wait a couple of hours and try again a couple of times. If this doesn’t work, then the node has likely failed and you’ll need to refer to the official documentation on node failure and recovery.