Lesson 2 - Load Balancing and Replication
24/02/2024 - 4 phút
Configuring PGpool-II is a crucial step in deploying a PostgreSQL database cluster. In this guide, we will walk through the steps to configure PGpool-II on an Ubuntu Linux operating system, helping you maximize the performance and high availability of your database.
Before we begin, we need to prepare 4 servers:
IP | Hostname | vCPU | RAM | DISK | OS |
---|---|---|---|---|---|
192.168.50.10 | pgpool-ii | 2 core | 4G | 60G | Ubuntu 22.04 |
192.168.50.11 | pg-master | 2 core | 4G | 60G | Ubuntu 22.04 |
192.168.50.12 | pg-slave-01 | 2 core | 4G | 60G | Ubuntu 22.04 |
192.168.50.13 | pg-slave-02 | 2 core | 4G | 60G | Ubuntu 22.04 |
Installation Architecture
Step 1: Configure Load Balancing and Replication
You need to configure Replication first, if you haven’t configured it, refer to Setting Up PostgreSQL 16 Replication.
Add the user replicator
:
Create the file pcp.conf
:
password_hash=$(pg_md5 PassKhongChilaPasss) && echo "postgre:$password_hash" >> /etc/pgpool2/pcp.conf
Next, configure the file pgpool.conf
on the pgpool2
server:
sudo chown pgpool:pgpool /etc/pgpool2/pgpool.conf
sudo chmod 0600 /etc/pgpool2/pgpool.conf
Then, open the file pgpool.conf
with a text editor nano
:
sudo nano /etc/pgpool2/pgpool.conf
Add the following configuration to the pgpool.conf
file:
listen_addresses = '*'
port = 9999
backend_hostname0 = '192.168.56.11'
backend_port0 = 5432
backend_weight0 = 0
backend_data_directory0 = '/home/ubuntu/postgresql/master'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_application_name0 = 'postgresql-master'
backend_hostname1 = '192.168.50.12'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/home/ubuntu/postgresql/slave-01'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_application_name1 = 'postgresql-slave-01'
backend_hostname2 = '192.168.50.13'
backend_port2 = 5432
backend_weight2 = 2
backend_data_directory2 = '/home/ubuntu/postgresql/slave-02'
backend_flag2 = 'ALLOW_TO_FAILOVER'
backend_application_name2 = 'postgresql-slave-02'
log_statement = on
log_per_node_statement = on
sr_check_user = 'replicator'
health_check_user = 'replicator'
health_check_period = 10
# Process management configuration
process_management_mode = static
process_management_strategy = lazy
num_init_children = 320
min_spare_children = 1000
max_spare_children = 5000
max_pool = 1000
In this configuration:
listen_addresses
is the IP address PGpool-II will listen on,port
is the port PGpool-II will listen on.backend_hostname0
,backend_port0
,backend_weight0
,backend_data_directory0
,backend_flag0
,backend_application_name0
are the connection details to thepostgresql-master
server.backend_hostname1
,backend_port1
,backend_weight1
,backend_data_directory1
,backend_flag1
,backend_application_name1
are the connection details to thepostgresql-slave-01
server.backend_hostname2
,backend_port2
,backend_weight2
,backend_data_directory2
,backend_flag2
,backend_application_name2
are the connection details to thepostgresql-slave-02
server.log_statement
andlog_per_node_statement
are the log query configurations.sr_check_user
andhealth_check_user
are the usernames for database server health checks.replicator
is the user we created in the step Setting Up PostgreSQL 16 Replication.backend_flag
is the flag that allowsFAILOVER
when thepostgresql-master
server fails,ALLOW_TO_FAILOVER
is the flag that allowsFAILOVER
when thepostgresql-slave-01
andpostgresql-slave-02
servers fail. Other flags:DISALLOW_TO_FAILOVER
: does not allowFAILOVER
ALWAYS_MASTER
: always thepostgresql-master
server
process_management_mode
is the process management mode,static
is the static mode,dynamic
is the dynamic mode.static
: The number of processes is statically managed and does not change.dynamic
: The number of processes is dynamically managed depending on the system load.
process_management_strategy
is the process management strategy,lazy
is the lazy mode,smart
is the smart mode.lazy
: Only create processes when needed.smart
: Create processes in advance to avoid delays when needed.
num_init_children
is the initial number of processes, if below it will create more processes.min_spare_children
is the minimum number of processes, if below it will create more processes.max_spare_children
is the maximum number of processes, if exceeded they will be killed.max_pool
is the maximum number of connections, if exceeded connections will be rejected.
Step 2: Run PGpool-II Configuration Check
Finally, run the PGpool-II configuration check:
sudo /usr/sbin/pgpool -n -f /etc/pgpool2/pgpool.conf -F /etc/pgpool2/pcp.conf
After configuration, you can check the status of PGpool-II by accessing it through pg4admin:
ip_address
is the IP address (192.168.56.5) of thepgpool2
server.username
is the default usernamepostgresql
.password
is the password for thepostgresql
account.
Step 3: Create Test Data
Here we will create a student
database and 10000
test data:
# Create the student database
CREATE TABLE student (
student_id SERIAL PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100),
date_of_birth DATE,
grade_level INT CHECK (grade_level BETWEEN 1 AND 12)
);
WITH generated_data AS (
SELECT
substring(md5(random()::text) from 1 for 8) AS first_name,
substring(md5(random()::text) from 1 for 8) AS last_name,
concat(substring(md5(random()::text) from 1 for 6), '@example.com') AS email,
'1990-01-01'::date + random() * (age(now(), '1990-01-01'::date)) AS date_of_birth,
(random() * 11 + 1)::int AS grade_level
FROM generate_series(1,1000000) -- Generate 100 rows
)
INSERT INTO student (first_name, last_name, email, date_of_birth, grade_level)
SELECT * FROM generated_data;
As shown in the image, the INSERT
statement has been executed on the postgresql-master
server.
Next, we query with the SELECT
statement on PG4Admin:
SELECT * FROM student;
Then we check the log of the pgpool2
server:
As shown in the image, the SELECT
statement has been executed on the postgresql-slave-01
server through the pgpool2
server.
Thus, we have successfully configured PGpool-II for Load Balancing and Replication.