Easy database scalability with YaXAHA Cluster

March 27, 20225 minute read

Kirill Zinov

YaXAHA Cluster scales easily as your project grows and at the same time it is so configurable that it leaves you the PostgreSQL functionality of a non-distributed database.

The easy cluster scalability means that, for example, when you need to add another 20% of capacity, you simply add the corresponding number of nodes to the cluster. You don’t need to think over how the load is distributed across the cluster and whether your equipment can handle it. Let's take a look at what helps to scale a database easily with YaXAHA Cluster.

No dedicated master

Compared to widely adopted PostgreSQL database replication solutions, YaXAHA Cluster does not have a dedicated master node. So, when scaling the cluster, you will not face the problem of increasing the capacity of equipment that takes on the up-growing flow of client requests.

Of course, someone still needs to make certain decisions within the boundaries of the cluster, like a master-node in master-to-followers scheme, and from time to time YaXAHA Cluster elects a leader-node, but this leader-node is not like the master-node from classic PostgreSQL replication, and here's why:

The leader-node is not required to execute the transaction.
The only task of the leader-node is to make decisions.
The client is not required to contact the leader-node.
Any healthy cluster node can start a transaction.
Any healthy node can be elected as the leader-node.

Instead of starting a transaction from the master-server, the client is free to start from any cluster node available. It can be nearest to the client node or randomly assigned one from the list. The client may even not know that the leader-node exists.

You may have thought that the leader node is somewhat similar to a load balancer.

Just remember that the leader-node makes fairly lightweight decisions within the cluster boundaries, much lighter than any transaction that makes actual changes to the database. In fact, the leader-node actually works as the cluster manager.

Let's move on, because the transaction lifecycle does not end with the leader-node.

Eventual consistency

The next thing that helps to avoid performance collapse when scaling the database is synchronous transaction replication on only a part of the cluster nodes, while the rest of the cluster continues to duplicate transactions asynchronously, when the user already got a response.

Suppose you decide that synchronous replication on three healthy nodes is enough to provide a required level of fault tolerance, and you have everything up and running. Later on, you've run a successful promotional campaign, expanded your branch network, or otherwise increased your client traffic to the database.

The database load grows, and you decide to add a few more cluster nodes to stay in the green zone. Now, the client application can start transactions on even more nodes. But the synchronous replication delay that comes from the need to provide fault-tolerance remains the same: originator-node + two more nodes.

As soon as the client node that started the transaction receives the first two positive responses from the neighbor nodes - the condition will be met, and the client application will receive the response. The remaining cluster nodes will perform the transaction asynchronously.

If part of the cluster fails or becomes unavailable for any reason, the remaining healthy nodes take over the load. And even if the cluster stops completely, the data integrity will still be higher compared to the case of a master node failure when regular PostgreSQL replication is used.

Remember how we decided that each transaction must be repeated on three different nodes, before the client application receives a response?

Before you learn about the flexible configuration of YaXAHA Cluster, it is worth a short word on how we prevent logical conflicts between transactions.

Transaction boundaries

This is probably a topic for a separate article, and to remain brief, we will say that YaXAHA Cluster is analyzing the table’s dependencies to determine the boundaries of each transaction.

If a new transaction is about to make changes to the database that could conflict with the changes from a previous incomplete transaction, the new one waits for its turn. This is the key point for the things described earlier.

True flexibility

YaXAHA Cluster is designed to provide an option to configure replication of every table in every database. This allows to have local databases and local tables on any cluster node that are not synchronizing with the rest of the cluster.

For example, you can build a secure cluster using local audit tables to track the changes. This means you will have a total detailed record of all the changes made on each and every cluster node. Or implement any other ideas.

Synchronization is configured in detail in a special cluster settings table. The syntax is quite simple and is similar to wildcards in filenames. But instead of file name and extension, it has three parts: the database, schema and table. For example: payments.*.global_*

You can include or exclude databases and tables from and into synchronization with the rest of the cluster nodes. We also wrote a special SQL function for easy synchronization set up.

Kirill Zinov

CIO / Principal Software Engineer