Getting Started#

In this chapter, we will cover the installation and minimal configuration of a cluster.

You can learn more about the architecture in the YaXAHA Cluster Architecture section. In short, YaXAHA Cluster consists of: PostgreSQL extension, server application and a special patched version of PgBouncer as an optionally recommended solution.

Requirements#

YaXAHA Cluster hardware requirements are outperformed by PostgreSQL requirements. It can run on any supported Linux distributions where one of the supported PostgreSQL versions is installed.

  • Debian and RedHat based Linux distributions
  • x86-64 platform
  • PostgreSQL 13, 14, 15

Node Setup#

Each cluster node is a PostgreSQL server instance and an installed and configured YaXAHA Cluster package.

Here, we will describe the steps that need to be performed in order to make a YaXAHA Cluster node from the PostgreSQL server instance.

Package Installation#

We recommend installing YaXAHA Cluster using a package manager whenever possible, as this makes software updates easier.

If you already have PostgreSQL installed, you can go straight to installing YaXAHA Cluster. Otherwise, you must first install PostgreSQL and its command line client:

sudo apt install postgresql postgresql-client

Before installing YaXAHA Cluster, please download and install the public signing key:

wget -O - https://www.dbinvent.com/dist/DBinventLLC.gpg.key | sudo tee /etc/apt/trusted.gpg.d/DBinventLLC.asc

And, add our repository to your sources.list:

sudo sh -c 'echo "deb http://apt.dbinvent.com/focal focal non-free" > /etc/apt/sources.list.d/dbinvent.list'

Update your package list and Install the YaXAHA Cluster package:

sudo apt update && sudo apt install yaxaha

PostgreSQL Extension Installation#

After YaXAHA Cluster is installed, you need to install its extension into the PostgreSQL server and perform a minimal configuration.

The extension controls the transaction life cycle and helps YaXAHA Cluster to implement its functionality.

Currently, we do not recommend installing extensions on a running PostgreSQL server. Please stop the server before installing the extension.

In order to simplify the process of installing the extension in PostgreSQL and initial configuration, we have included the ytsetup utility into YaXAHA Cluster package. Usually it is enough to run ytsetup without additional parameters:

ytsetup

If everything went well, this PostgreSQL copy may already be part of the cluster. Now we only need to inform all nodes of the cluster about each other and set up the replication.

Node Configuration#

In order for a node to become part of a cluster, it is necessary to let this and other nodes know about each other.

Each cluster node must have a unique identifier, which is generated by default by the ytsetup utility and is stored in the yt.conf file. Usually it’s located in the folder where PostgreSQL keeps its settings, for example:

/etc/postgresql/13/main/conf.d/yt.conf

You can also get the node ID by running a SQL query:

show yt.node;

By default, Universally Unique Identifier version 4 is used as the node ID, but you can set up any string you like.

For example, in the integration tests that we propose to use as an example of setting up a cluster, we give the nodes more human-readable names rather than using UUIDs.

Suppose you have installed YaXAHA Cluster on three nodes named Node1, Node2 and Node3 which have the following IP addresses: 192.168.10.195, 192.168.10.196 and 192.168.10.197. Also, we have decided that the cluster nodes will communicate on port 2000. Then, it is necessary to add the following strings to the settings table on each node public.yt_config:

Table: public.yt_config
insert into public.yt_config (name, module, value) values
('Node1','N','192.168.10.195:2000'),
('Node2','N','192.168.10.196:2000'),
('Node3','N','192.168.10.197:2000');

After that, please restart each node:

systemctl restart ytserver

Replication Configuration#

To set up replication, you need to call the yt_setup SQL function, which is created in the ytpgxt--1.1.sql file when the PostgreSQL extension is installed.

The yt_setup function accepts include and exclude patterns, an installation flag, and optionally the name of the settings table.

yt_setup(include varchar, exclude varchar, install bool, cfg_table_name varchar)

The patterns have LIKE-style syntax. In general, the table name is specified with its schema, just like when you write a regular SQL query, for example:

Pattern Result
%.nice_% Any schema, table name starts with "nice_"
yt.% Any table within "yt" schema
%_log Any table name that ends with "_log"
yt. Same as yt.% in the include pattern, see above

The include pattern must have at least one of the pattern parts, an empty include pattern is not allowed, and the missing part will be considered as "include any of". While the exclude pattern may be empty.

Example of calling the yt_setup function, on each node of the cluster:

select * from yt_setup('public.%', '%_log', true);

This query will include all current database tables in the public schema into the replication, except for the tables whose names end with _log. Same way, you can disable replication by setting the install=false flag.

Example

Let's try to create a table and enable the replication for it. If the cluster is running, we’ll stop it:

systemctl stop ytserver

In the psql command line prompt, create the yt_test table:

create table yt_test(id serial primary key, name varchar(20), value varchar(80));

Now let's enable replication of the yt_test table. Since the nodes are stopped, it is impossible to ask YT Server App about currently employed settings table name, therefore we need to specify the settings table name as the last parameter of the yt_setup function:

select * from yt_setup('%.yt_test', '', true, 'public.yt_config');

Now let's start ytserver on all nodes:

systemctl start ytserver

After starting the cluster, we can proceed to insert some data into the table we just created.

Currently, the product is in the MVP stage, and before the release version we allowed ourselves the assumption that the user is using PgBouncer or a transactional ORM framework. Therefore, when using psql, we need to emulate a transaction.

On any of the started nodes, run:

 
begin;
insert into yt_test(name, value) values ('test', 'first');
select yt_complete(true);
commit;

Let's check that the replication is working. To do so, run this query on other nodes:

select * from yt_test where name='test';

Since the nodes have repeated the insert after the transaction initiator node, you should get the following output:

id | name | value
----+------+-------
 1 | test | first
(1 row)

Deploying Cluster#

At the moment, we do not provide any automated tools for convenient cluster manipulations at runtime or turning an existing database into a cluster.

By the time of the release version, we will implement some features for easier cluster management.

So far, to turn an existing database into a cluster, you need to stop the server, duplicate the database to other nodes, do the setup as described in the Node setup section, and start all the cluster nodes.

To duplicate all server data to other nodes, we recommend the pg_basebackup tool as a reliable solution using the built-in PostgreSQL replication protocol.

When you need to back up an entire server, pg_basebackup is a much more efficient way than transferring data using pg_dump or pg_dumpall tools. However, it is not designed for copying individual databases.

To deploy a new cluster from scratch, you need to repeat the steps described in the Node Setup chapter on all cluster nodes.