YaXAHA Cluster Architecture#

This section describes the YaXAHA Cluster internals and how it all works, including basic principles behind its design. We recommend reading the information listed in this section to anyone using YaXAHA Cluster, especially in production.

Main Components#

Each cluster node is a copy of PostgreSQL server with the YaXAHA Cluster package installed on top of it.

Here we will cover only the components essential for understanding how the YaXAHA Cluster works, and deliberately put aside additional tools such as the ytsetup utility. So, the main components included in the package are:

Special patched version of PgBouncer
Extension for PostgreSQL server
YT-Server application

Also, in the Concepts section, we will take a look at the entities that appear during cluster operations and are not separate parts of the package. However, they must be considered in order to fully understand how things work.

PgBouncer#

PgBouncer helps to implement functionality that is currently not possible without modifying the PostgreSQL server source code. And we decided that it is better to add a slightly modified PgBouncer to our package rather than to make changes to the PostgreSQL server.

The thing is that YaXAHA Cluster operates with transactions, and not just with individual SQL queries. This brings benefits such as smart transaction boundaries detection, logical integrity and data consistency across the cluster, and so on. But it is possible only when all database queries are wrapped in transactions, even if the client's legacy system does not use transactions by design.

Also, YaXAHA Cluster package contains an extension for PostgreSQL and a server application.

Extension#

The extension works inside the PostgreSQL server address space and communicates only with the YT-Server application of the respective node. Its role is to intercept the moment when a new transaction is ready to be reported to the remaining cluster, but the transaction has not yet been completed on this node.

At this moment, the extension notifies the server application about a new transaction and upcoming data changes, then waits for a response before allowing PostgreSQL to complete the transaction locally.

YT-Server Application#

The server application ytserver is responsible for communicating with the rest of the cluster nodes and the leader-node. When the client application starts a transaction on a particular node, It initiates the process of repeating this transaction on other nodes of the cluster. That’s when the transaction is considered successful if the required number of cluster nodes successfully commit this transaction.

In other words, part of the cluster must repeat the transaction synchronously with the initiator-node in order to provide sufficient fault tolerance.

The number of required synchronous transaction copies can be configured in the settings table for the entire cluster and for individual tables as well. You can find some examples in the Failover Settings section.

When the required number of cluster nodes confirms that they are ready to complete the transaction together with the transaction initiator node, the transaction is committed on these nodes. The rest of the cluster may proceed with transaction asynchronously.

Concepts#

In addition to the components of the YaXAHA Cluster package described above, there are a few more things that need to be mentioned in order to get a complete picture of how cluster work.

Configuration Table#

The settings table on each cluster node is created by the ytsetup utility. It is a simple key=value data storage, with one additional module column to arrange settings into categories.

By default, ytsetup utility creates configuration table public.yt_config in postgres database.

Initiator Node#

A node that initially received a request from the client application to execute a transaction. Since YaXAHA Cluster does not have a dedicated master node, it can be any cluster node available for the client application.

Leader Node#

The leader node acts as a cluster manager. It takes into account all transactions in the cluster and controls their sequential execution.

The leader node is elected by the cluster as a result of voting. If current leader node becames unavailable, healthy cluster nodes hold a new leader election. So, the cluster stays operational as long as it has enough healthy nodes.