In this chapter, we will cover the installation and minimal configuration of a cluster.
You can learn more about the architecture in the YaXAHA Cluster Architecture section. In short, YaXAHA Cluster consists of: PostgreSQL extension, server application and a special patched version of PgBouncer as an optionally recommended solution.
YaXAHA Cluster hardware requirements are outperformed by PostgreSQL requirements. It can run on any supported Linux distributions where one of the supported PostgreSQL versions is installed.
- Debian and RedHat based Linux distributions
- x86-64 platform
- PostgreSQL 13, 14, 15
Each cluster node is a PostgreSQL server instance and an installed and configured YaXAHA Cluster package.
Here, we will describe the steps that need to be performed in order to make a YaXAHA Cluster node from the PostgreSQL server instance.
We recommend installing YaXAHA Cluster using a package manager whenever possible, as this makes software updates easier.
If you already have PostgreSQL installed, you can go straight to installing YaXAHA Cluster. Otherwise, you must first install PostgreSQL and its command line client:
sudo apt install postgresql postgresql-client
Before installing YaXAHA Cluster, please download and install the public signing key:
wget -O - https://www.dbinvent.com/dist/DBinventLLC.gpg.key | sudo tee /etc/apt/trusted.gpg.d/DBinventLLC.asc
And, add our repository to your
sudo sh -c 'echo "deb http://apt.dbinvent.com/focal focal non-free" > /etc/apt/sources.list.d/dbinvent.list'
Update your package list and Install the YaXAHA Cluster package:
sudo apt update && sudo apt install yaxaha
PostgreSQL Extension Installation#
After YaXAHA Cluster is installed, you need to install its extension into the PostgreSQL server and perform a minimal configuration.
The extension controls the transaction life cycle and helps YaXAHA Cluster to implement its functionality.
Currently, we do not recommend installing extensions on a running PostgreSQL server. Please stop the server before installing the extension.
In order to simplify the process of installing the extension in PostgreSQL and initial configuration, we have included the
ytsetup utility into YaXAHA Cluster package. Usually it is enough to run
ytsetup without additional parameters:
If everything went well, this PostgreSQL copy may already be part of the cluster. Now we only need to inform all nodes of the cluster about each other and set up the replication.
In order for a node to become part of a cluster, it is necessary to let this and other nodes know about each other.
Each cluster node must have a unique identifier, which is generated by default by the
ytsetup utility and is stored in the
yt.conf file. Usually it’s located in the folder where PostgreSQL keeps its settings, for example:
You can also get the node ID by running a SQL query:
By default, Universally Unique Identifier version 4 is used as the node ID, but you can set up any string you like.
For example, in the integration tests that we propose to use as an example of setting up a cluster, we give the nodes more human-readable names rather than using UUIDs.
Suppose you have installed YaXAHA Cluster on three nodes named
Node3 which have the following IP addresses:
192.168.10.197. Also, we have decided that the cluster nodes will communicate on port 2000. Then, it is necessary to add the following strings to the settings table on each node
insert into public.yt_config (name, module, value) values ('Node1','N','192.168.10.195:2000'), ('Node2','N','192.168.10.196:2000'), ('Node3','N','192.168.10.197:2000');
After that, please restart each node:
systemctl restart ytserver
To set up replication, you need to call the
yt_setup SQL function, which is created in the
ytpgxt--1.1.sql file when the PostgreSQL extension is installed.
yt_setup function accepts include and exclude patterns, an installation flag, and optionally the name of the settings table.
yt_setup(include varchar, exclude varchar, install bool, cfg_table_name varchar)
The patterns have LIKE-style syntax. In general, the table name is specified with its schema, just like when you write a regular SQL query, for example:
|%.nice_%||Any schema, table name starts with "nice_"|
|yt.%||Any table within "yt" schema|
|%_log||Any table name that ends with "_log"|
|yt.||Same as yt.% in the include pattern, see above|
The include pattern must have at least one of the pattern parts, an empty include pattern is not allowed, and the missing part will be considered as "include any of". While the exclude pattern may be empty.
Example of calling the
yt_setup function, on each node of the cluster:
select * from yt_setup('public.%', '%_log', true);
This query will include all current database tables in the public schema into the replication, except for the tables whose names end with
_log. Same way, you can disable replication by setting the
Let's try to create a table and enable the replication for it. If the cluster is running, we’ll stop it:
systemctl stop ytserver
In the psql command line prompt, create the
create table yt_test(id serial primary key, name varchar(20), value varchar(80));
Now let's enable replication of the
yt_test table. Since the nodes are stopped, it is impossible to ask YT Server App about currently employed settings table name, therefore we need to specify the settings table name as the last parameter of the
select * from yt_setup('%.yt_test', '', true, 'public.yt_config');
Now let's start
ytserver on all nodes:
systemctl start ytserver
After starting the cluster, we can proceed to insert some data into the table we just created.
Currently, the product is in the MVP stage, and before the release version we allowed ourselves the assumption that the user is using PgBouncer or a transactional ORM framework. Therefore, when using psql, we need to emulate a transaction.
On any of the started nodes, run:
begin; insert into yt_test(name, value) values ('test', 'first'); select yt_complete(true); commit;
Let's check that the replication is working. To do so, run this query on other nodes:
select * from yt_test where name='test';
Since the nodes have repeated the insert after the transaction initiator node, you should get the following output:
id | name | value ----+------+------- 1 | test | first (1 row)
At the moment, we do not provide any automated tools for convenient cluster manipulations at runtime or turning an existing database into a cluster.
By the time of the release version, we will implement some features for easier cluster management.
So far, to turn an existing database into a cluster, you need to stop the server, duplicate the database to other nodes, do the setup as described in the Node setup section, and start all the cluster nodes.
To duplicate all server data to other nodes, we recommend the pg_basebackup tool as a reliable solution using the built-in PostgreSQL replication protocol.
When you need to back up an entire server,
pg_basebackup is a much more efficient way than transferring data using pg_dump or pg_dumpall tools. However, it is not designed for copying individual databases.
To deploy a new cluster from scratch, you need to repeat the steps described in the Node Setup chapter on all cluster nodes.