乘数Cluster for PostgreSQL(简称Clup)用户手册v2.0

1. 功能与原理简介

1.1 功能

乘数Cluster for PostgreSQL软件(简称Clup)在PostgreSQL数据库集群中实现了一种高可用及读写分离的解决方案。

clup可以管理多个主备流复制的集群,每个集群的使用场景为:

  1. 有一个主库
  2. 有多个Standby库,Standby库与主库通过streaming replication进行同步。streaming replication的同步模式可以设置为同步或异步。
  3. 有一个write vip,这个write vip通常在主库所在的机器上。
  4. 有一个read vip,这个read vip是在一台部署了负载均衡器cstlb(是乘数科技提供的负载均衡器)的机器上,cstlb会把请求负载均衡到各台备库上。
  5. 通常为了简化部署,cstlb会直接部署在两台Standby的机器上。
  6. 应用如果需要执行写数据的操作,需要连接write vip,通过write vip访问主库。当然对于读延迟敏感的应用也需要通过write vip访问主库。应用可以通过访问read vip访问只读的备库。当有多个备库时,使用乘数科技的负载均衡软件cstlb,可以把读分发到多台的只读库上。
  7. clup安装在一台独立的机器上,clup在这台机器上去探测各个数据库是否正常,如果不正常,则会进行相应的切换工作。
  8. 当主库坏的时候,clup会自动把其中一台Standby库提升为主库,从而实现高可用。同时会通知负载均衡软件cstlb中把这台提升为主库的Standby库从负载均衡中去掉。
  9. 当一台Standby库出现问题时,当read vip也在这台机器上时,clup会把read vip切换到另一台机器上。同时也会把这台Standby库从负载均衡cstlb中去掉。

目前本软件仅支持PostgreSQL9.5及以上的版本的数据库。

2. 安装配置

本次安装配置以下面的环境做为示例:

机器的情况如下:

主机名IP 数据库端口 数据库数据目录
clup192.168.56.69N/AN/A
pg01192.168.56.615432/home/postgres/pgdata
pg02192.168.56.625432/home/postgres/pgdata
pg03192.168.56.635432/home/postgres/pgdata
pg04192.168.56.645432/home/postgres/pgdata

集群的信息:

  • write_vip: 192.168.56.68
  • read_vip: 192.168.56.67
  • 负载均衡器列表:192.168.62:8082,192.168.63.8082

2.1 安装要求

每台机器上需要安装arping包,因为此软件需要用到arping命令。

PostgreSQL的版本要求在9.5及以上版本。

2.2 软件安装

本软件需要有一台独立的机器,配置不需要高,具体需求如下:

  • X86服务器
  • CPU PIII 800M以上
  • 1G内存以上
  • 硬盘20G以上

在本示例中,这台机器的主机名为clup,ip地址为:192.168.56.69

clup软件是安装在root用户下,也运行root用户下。

安装包主要有两个:

  • clupX.Y.Z.tar.xz: 其中X.Y.Z是版本号,如2.0.0
  • python3.6.tar.xz

把安装包clupX.Y.Z.tar.xz和python3.6.tar.xz解压到/opt目录下即可。

会形成以下目录:

/opt/clupX.Y.Z目录。

为方便运行命令,可以把/opt/clup/bin目录加入PATH环境变量中:

如在.bash_profile文件中添加:

PATH=$PATH:/opt/clup/bin
export PATH

在其中的两台备库上安装cstlb,cstlb安装包只有一个可执行程序cstlb,把cstlb程序拷贝到/opt/cstlb目录下即完成安装:

mkdir -p /opt/cstlb
cp cstlb /opt/cstlb

2.2 配置

2.2.1 主机配置

clup需要通过无需提供密码的ssh去操作各台数据库,所以需要打通clup机器以及各台数据库机器之间root用户的ssh通道,以便能用ssh连接各台机器而不需要密码,ssh本机也不需要密码。

其中一个简单的方法为:

在第一台机器上生成ssh的key:

ssh-keygen

运行的实际情况如下:

[root@clup ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
02:74:f1:34:f9:cb:47:da:ff:95:c3:78:ba:e6:37:1a root@pg04
The key's randomart image is:
+--[ RSA 2048]----+
|    . o.o.       |
|   . . o..       |
|    .   ..       |
|     .    . .    |
|      . S. =     |
|       .  + o o .|
|           . E =.|
|             .=oo|
|            o=+.o|
+-----------------+

然后进入.ssh目录下,把id_rsa.pub的内容添加到

cd .ssh
cat id_rsa.pub >> authorized_keys
chmod 600 authorized_keys

这时应该ssh自己应该不需要密码了:

[root@clup ~]# ssh 127.0.0.1
Last login: Sat Oct 28 15:06:03 2017 from 127.0.0.1
[root@clup ~]# exit
logout
Connection to 127.0.0.1 closed.

然后把这台机器的.ssh目录拷贝到所有的机器上:

[root@clup ~]# scp -r .ssh root@192.168.56.62:/root/.
root@192.168.56.62's password:
id_rsa                                                                                                                                       100% 1679     1.6KB/s   00:00
id_rsa.pub                                                                                                                                   100%  391     0.4KB/s   00:00
authorized_keys                                                                                                                              100% 1011     1.0KB/s   00:00
known_hosts

为了方便,用相同的方法把数据库用户postgres的ssh通道也打通。

还有其它的一些打通各台机器互相ssh不需要密码的方法可以参见网上的一些文章,这里就不在赘述了。

2.2.2 配置数据库

首先需要把主备数据库搭建起来:

备库的recovery.conf文件的内容类似如下:

standby_mode = 'on'
recovery_target_timeline = 'latest'
primary_conninfo = 'application_name=stb62 user=postgres host=192.168.56.61 port=5432 password=postgres sslmode=disable sslcompression=1'

其中“primary_conninfo”中的:

  • application_name=stb62,每台备库上这个都不能相同,可以用“stb”加上ip地址的最后一部分,如ip地址为192.168.56.62,就用stb62
  • “user=postgres password=postgres”需要与配置文件clup.conf中的配置项db_repl_user和db_repl_pass保持相同。
  • host=192.168.56.61 port=5432这是主库的IP地址和端口

主备库搭建好了,需要在主库上查询select * from pg_stat_replication;来确定备库都正常工作:

[postgres@pg01 pgdata]$ psql
psql (10.4)
Type "help" for help.

postgres=# select * from pg_stat_replication;
 pid | usesysid | usename  | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lsn  | write_ls
n | flush_lsn | replay_lsn |    write_lag    |    flush_lag    |   replay_lag    | sync_priority | sync_state
-----+----------+----------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+---------
--+-----------+------------+-----------------+-----------------+-----------------+---------------+------------
 472 |       10 | postgres | stb64            | 192.168.56.64 |                 |       39022 | 2018-07-25 09:08:55.797849+00 |         3603 | streaming | 0/1896FB8 | 0/1896FB
8 | 0/1896FB8 | 0/1896FB8  | 00:00:00.000632 | 00:00:00.003187 | 00:00:00.003188 |             0 | async
 471 |       10 | postgres | stb63            | 192.168.56.63 |                 |       47384 | 2018-07-25 09:08:51.84998+00  |         3604 | streaming | 0/1896FB8 | 0/1896FB
8 | 0/1896FB8 | 0/1896FB8  | 00:00:00.000738 | 00:00:00.004411 | 00:00:00.004412 |             0 | async
 470 |       10 | postgres | stb62            | 192.168.56.62 |                 |       40098 | 2018-07-25 09:08:47.956719+00 |         3604 | streaming | 0/1896FB8 | 0/1896FB
8 | 0/1896FB8 | 0/1896FB8  | 00:00:00.000846 | 00:00:00.003342 | 00:00:00.003343 |             0 | async
(3 rows)

在主库中建探测库cs_sys_ha,注意此库的名称“cs_sys_ha”需要与clup.conf中的配置项probe_db_name的值保证一致(见后面clup.conf中的介绍 ):

CREATE DATABASE cs_sys_ha;

然后在探测库cs_sys_ha中建探测表:

CREATE TABLE cs_sys_heartbeat(
  hb_time TIMESTAMP
);
insert into cs_sys_heartbeat values(now());

2.2.3 Clup软件包的安装

把安装包clupX.Y.Z.tar.xz和python3.6.tar.xz解压到clup机器的/opt目录下即完成安装。

检查python3.6是否能正常工作,如果报如下错误:

[root@clup opt]# ./python3.6/bin/python3.6
./python3.6/bin/python3.6: /lib64/libcrypto.so.10: version `OPENSSL_1.0.2' not found (required by ./python3.6/bin/python3.6)

这是因为openssl的版本太旧,查看版本:

[root@clup opt]# rpm -qa |grep ssl
openssl-1.0.2k-8.el7.x86_64
openssl-libs-1.0.2k-8.el7.x86_64
openssl-devel-1.0.2k-8.el7.x86_64

openssl版本应该是1.0.2.

如果不是上面的版本,请运行:

yum update
yum install openssl-devel

2.2.4 cstlb的安装

cstlb需要安装到所有的数据库主机上,把cstlbX.Y.Z.tar.xz压缩包解压到/opt目录下即可:

解压后,会自动建一个目录/opt/cstlb,在/opt/cstlb目录下有一个可执行文件cstlb。

2.2.5 Clup的配置

首先在clup机器上修改配置文件/opt/clup/conf/clup.conf

此配置文件的示例如下:

#格式为 key = value

# 网络地址,本cluster软件的内部通信将运行在此网络中
network=192.168.56.0

#管理工具与服务器之间通信的密码
ha_rpc_pass = csha_cluster_pass

probe_db_name = cs_sys_ha
probe_user = postgres
probe_password = postgres

ha_db_user = postgres
ha_db_pass = postgres

db_repl_user = postgres
db_repl_pass = postgres


# 检查数据库是否正常的周期,单位为秒
probe_interval = 10

# 锁的ttl时间,单位为秒
lock_ttl = 120

# 前端web管理界面的配置
http_auth = 1
http_user = admin
http_pass = cstech

# cstlb load balance的token
cstlb_token = 1e82ff78-d73f-11e7-8a50-60f81dd129c2

配置文件说明:

  • network=192.168.56.0:主机所在的网络。
  • ha_rpc_pass = csha_cluster_pass:管理工具与服务器之间通信的密码。
  • probe_db_name = cs_sys_ha:通过探测更新此数据库中的一张表来判定此节点是否正常工作。
  • probe_user = postgres :通过更新cs_sys_ha中表探测更新时使用的数据库用户名。
  • probe_password = postgres :通过探测更新时使用的数据库用户密码。
  • ha_db_user = postgres:此程序获得一些信息时使用的数据库用户名
  • ha_db_pass = postgres:此程序获得一些信息时使用的数据库用户密码
  • db_repl_user = postgres:主备数据库之间流复制使用的数据库用户名
  • db_repl_pass = postgres:主备数据库之间流复制使用的数据库用户密码
  • probe_interval = 30:探测数据库是否工作正常的周期
  • lock_ttl:做一些特别的操作时(如故障切换),系统会持有一把锁,持有这把锁的最长时间
  • http_auth:在生产系统应该设置为1,如果设置为0,则登录前端的web界面不需要密码。
  • cstlb_token:负载均衡器cstlb的token,需要与启动cstbl的token保持相同,这样管理程序就可以通过此token操作cstbl,如从cstlb中增加、删除后端的服务器。

在clup机器上配置好上面的/opt/clup/conf/clup.conf文件。

2.2.6 启动Clup

本软件是安装在一台单独的机器上,在本例中是安装在192.168.56.69上的,在这台单独的机器上启动:

/opt/clup/bin/clupserver start

这个clupserver就是探测故障的服务,当探测到故障时,就会执行高可用的切换工作。

如果部署clup的机器坏了之后,只需要在另一台机器上重新部署即可。

然后使用”/opt/clup/bin/clupserver status”查看下clup是否成功启动。

2.2.7 启动负载均衡器cstlb:

在pg02和pg03机器上分别执行:

cd /opt/cstlb
export CSTLB_TOKEN=1e82ff78-d73f-11e7-8a50-60f81dd129c2
nohup ./cstlb  2>&1 >cstlb.log &

注意上面的CSTLB_TOKEN=XXXX中的token要与clup.conf中的配置项cstlb_token配置的token相同。

上面的命令中启动的cstlb,默认的业务端口是在5435,也就是连接读VIP的5435端口,就会负载均衡到后端的所有standby上。而cstlb默认的管理端口是8082。

cstlb的使用帮助可以通过运行cstlb –help显示出来,如下所示:

[root@pg02 cstlb]# ./cstlb --help
Usage of ./cstlb:
  -backend value
      List of backend servers, need at least one
  -debug
      Enable debug mode
  -port int
      The port to listen for connections (default 5435)
  -url string
      Get load balance backends from url
  -webport int
      The port to listen for admin web (default 8082)

3. 使用

3.1 管理界面

3.1.1 管理界面使用

在成功启动clup后,需要登陆到web界面,进行集群的初始化(即添加集群和指定数据库信息) 下面是web界面相关介绍 需要注意的是,web服务默认绑定端口8080,请确保8080端口没有被其它应用占用,否则将出现问 题。

在部署好Clup并启动后,在浏览器中输入”http://192.168.xxx.xxx:8080" 进入web管理平台的登陆入口。(对应地址是Clup在的机器地址),系统默认用户名是:admin 密码是:cstech

3.1.2 集群管理页面

该页面中提供了集群的添加、冻结、解冻以及查看详情和删除的功能。

点击上面的添加集群功能,可以自定义一个集群,这个集群被添加之后默认会是冻结状态,服务端的检查机制暂时不会去检查集群所配置的信息是否正确,所以在添加集群后要先确定所配置信息真实可靠。其中库IP列表里面,是提供主备数据库的ip地址,以“,”号分隔,默认第一台为主库。

其中冻结和解冻时,需要先选择要操作的集群,然后点击冻结或解冻。处于冻结状态的集群,服务端的检查机制会暂时跳过不去做检查,在集群解冻之后即再次检查集群各信息。

而选择操作中的详情时,会显示集群信息,包括读写VIP、负载均衡器列表、主备数据库、数据延时和LSN的相关信息。 其中集群信息子菜单中支持修改负载均衡器列表,但是需要注意的是,正在运行中的集群不建议修改,因为这样只是修改了配置,需要重启Clup才可以。

主备数据库子菜单中,除了显示数据库相关的信息外,还提供了数据库的新增、信息的修改、移除、修复、切换主备库的功能。其中修复功能是在某台数据备库down掉的情况下才会出现,将其手工改好,然后点击修复重新添加到集群中来。而且移除功能也是需要在至少一主三备的情况下才允许移除一台备库。也就是说Clup要求至少一主两备。

在点击添加数据库时,只需要提供数据库所在的ip地址、数据库端口、用户名和数据库的数据库目录即可。

在切换主备库的时候,选择一个正常的备库,点击切换为主库的按钮,在确认之后,会有弹出层实时显示后台的任务进程。

移除数据库备库的操作也要注意,不要在业务高峰阶段进行。

而查看延时和LSN的子菜单里面,可以查看当前的延时和时间线等信息。

3.1.3 HA日志查看页面

这个页面用来显示关于集群和数据库高可用相关操作的日志信息。支持根据不同的条件来搜索日志。

点击显示详情,可以查看该任务当时在后台所做的操作过程。

3.1.4 注册码管理页面

这里着重介绍一下注册码,Clup开源版暂时支持一个集群,需要注册码才能添加多个集群应用。如果需要,请联系: 杭州乘数科技有限公司。

3.2 命令的基本使用方法

3.2.1 集群服务命令

/opt/clup/bin/clupserver是乘数CSCluster for PostgreSQL软件的主程序。

启动命令:

正常启动时,直接运行命令“/opt/clup/bin/clupserver start”即可。

如果要以调试的模式运行,可以加上以下几个参数:

  • -f: 前台运行,不进入daemon模式
  • -l debug: 打印debug日志信息

用/opt/clup/bin/clupserver status可以查看集群软件是否运行:

(pyenv) [root@clup lib]# clupserver status
2017-12-08 08:58:39,817 INFO Clup v2.0.0  Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
2017-12-08 08:58:39,817 INFO Start loading configuration ...
2017-12-08 08:58:39,818 INFO Complete configuration loading.
Program(332) is running.

运行/opt/clup/bin/clupserver stop可以停止集群软件的运行:

(pyenv) [root@clup lib]# clupserver stop
2017-12-08 08:59:11,450 INFO Clup v2.0.0  Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
2017-12-08 08:59:11,450 INFO Start loading configuration ...
2017-12-08 08:59:11,450 INFO Complete configuration loading.
Wait 20 second for program stopped...
Wait 20 second for program stopped...
Wait 20 second for program stopped...
Program stopped.

3.2.2 clupadm程序的使用

clupadm是主要的管理工具,主要功能可以见下:

Clup v2.0.0 Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
usage: clup_adm.py <command> [options]
    command can be one of the following:
      show            : list all database.
      get_last_lsn    : get all database lsn.
      repl_delay      : show replication delay.
      log             : display log.
      froze           : froze the HA.
      repair          : repair fault node.
      unfroze         : unfroze the HA.
      switch          : switch primary database to another node
      change_meta     : change cluster meta data, it is dangerous, be careful!!!
      get_meta        : get cluster meta info.
      show_task       : show task information.
      task_log        : show task log.
  • show: 展示集群中的节点信息。
  • get_last_lsn:显示每个结点的最后的LSN号。
  • repl_delay:显示各个备库的WAL日志的延迟情况。
  • log:显示本机中clupserver的日志。
  • froze: “冻结”cluster,当cluster进入此状态后,不会再做相应的检查工作,就象cluster停止工作了一样。
  • unfroze: 解冻cluster
  • switch: 把主库切换到另一台机器上。
  • repair: 修复一台已离线的节点
  • get_meta: 显示集群配置的元数据,在一些特殊情况下使用,一般不使用。
  • change_meta: 修改集群的元数据,只在一些特殊情况下使用,一般不使用。
  • show_task: 一些长时间的操作如repair、switch都会是后台任务执行的,即使clupadm异常结束,这些后台任务仍然会继续执行,通过这个命令可以查看这些后台执行的任务。
  • task_log: 查看长时间操作的后台任务的日志信息。

clupadm的命令的的正常运行,需要clupserver先运行了。

如果没有启动clupserver,会出现如错误:

[root@clup ~]# clupadm show
Clup v2.0.0 Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
Can not connect clupserver: [Errno 111] Connection refused

下面展示一些命令的运行情况:

clupadm get_last_lsn的运行情况:

[root@clup ~]# clupadm get_last_lsn -c 2
Clup v2.0.0 Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
                   db list
--------------------------------------------------
id        host       primary timeline       lsn
-- ----------------- ------- -------- ----------------
 1 192.168.56.61           1       14        0/18AC490
 2 192.168.56.62           0       14        0/18AC490
 3 192.168.56.63           0       14        0/18AC490
 4 192.168.56.64           0       14        0/18AC490

注意上面命令中的“-c 2”中的“2”是指定集群ID,集群ID可以在“clupadm show”命令的结果中看到。

在上面的结果中可以看到每个数据库的时间线(timeline)。

clupadm repl_delay的运行情况:

[root@clup ~]# clupadm repl_delay -c 2
Clup v2.0.0 Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
       host       primary   state   current_lsn  sent_delay  write_delay  flush_delay  replay_delay   is_sync     repl_state
----------------- ------- --------- ----------- ------------ ------------ ------------ ------------ ------------ ------------
192.168.56.61           1    Normal         N/A          N/A          N/A          N/A          N/A          N/A          N/A
192.168.56.62           0    Normal   0/18AC9E0            0            0            0            0        async    streaming
192.168.56.63           0    Normal   0/18AC9E0            0            0            0            0        async    streaming
192.168.56.64           0    Normal   0/18AC9E0            0            0            0            0        async    streaming

上面结果中,如果是主库,这一行显示的都是“N/A”。列is_sync表示这个备库的流复制是同步方式还是异步方式。

需要注意“repl_state”列,如果不是“streaming”,通常表示流复制没有正常传输,如当备库与主库延迟过大,它需要的WAL文件在主库中已经被被清除掉,这时就不会显示“streaming”状态。

clupadm log命令是显示clupserver上的日志,程序内部是通过“tail -f ”的方式显示,如果要退出,请按“Ctrl+C”:

[root@clup clup]# clupadm log
Clup v2.0.0 Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
tail -f /opt/clup/logs/clupserver.log
2018-07-25 18:06:02,314 INFO Clup v2.0.0 Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
2018-07-25 18:06:02,314 INFO Start loading configuration ...
2018-07-25 18:06:02,315 INFO Complete configuration loading.
2018-07-25 18:06:02,567 INFO ========== 乘数Cluster for PostgreSQL starting ==========
2018-07-25 18:06:02,571 INFO Start ha checking thread...
2018-07-25 18:06:02,577 INFO Start health checking thread...
2018-07-25 18:06:02,578 INFO Complete the startup of health checking thread.
2018-07-25 18:06:02,579 INFO Starting web server...
2018-07-25 18:06:02,608 INFO Web Server listen at port 8080...

3.3 故障的模拟以及恢复方法

3.3.1 数据库宕掉,但主机还正常的情况

这时集群软件会检查到这个数据库出现了问题,但因为主机还是正常运行的,集群软件会自动把数据库拉起来,这种故障不需要人工参与。

当我们把一台数据库停掉(如停掉4号数据库):

[postgres@pg04 ~]$ pg_ctl stop
waiting for server to shut down.... done
server stopped

在log中可以看到类似“Host(192.168.56.64) is ok, only database(192.168.56.64:5432) failed, restart database ….”的信息:

[root@pg01 ~]# clupadm log
2018-07-25 18:07:35,922 INFO Cluster(2): Find database(192.168.56.64:5432) failed, begin failover ...
2018-07-25 18:07:36,081 INFO Cluster(2): Host(192.168.56.64) is ok, only database(192.168.56.64:5432) failed, restart database ....
2018-07-25 18:07:38,258 INFO Cluster(2): successful start up database(192.168.56.64:5432)

我们再看数据库时,发现数据库已经自动拉起来了:

[postgres@pg04 ~]$ ps -ef|grep postgres
root       816   241  0 18:07 pts/4    00:00:00 su - postgres
postgres   817   816  0 18:07 pts/4    00:00:00 -bash
postgres   868     1  0 18:07 ?        00:00:00 /usr/pgsql-10/bin/postgres -D /home/postgres/pgdata
postgres   869   868  0 18:07 ?        00:00:00 postgres: logger process
postgres   870   868  0 18:07 ?        00:00:00 postgres: startup process   recovering 0000000E0000000000000001
postgres   871   868  0 18:07 ?        00:00:00 postgres: checkpointer process
postgres   872   868  0 18:07 ?        00:00:00 postgres: writer process
postgres   873   868  0 18:07 ?        00:00:00 postgres: stats collector process
postgres   874   868  0 18:07 ?        00:00:00 postgres: wal receiver process   streaming 0/18B5738
postgres   899   817  0 18:08 pts/4    00:00:00 ps -ef
postgres   900   817  0 18:08 pts/4    00:00:00 grep --color=auto postgres

3.3.2 备库主机宕掉

这时集群软件会检查到这个备数据库出现了问题,同时也会发现主机也出问题了。集群软件会自动把这个数据库从负载均衡器中去掉,然后把结点标记为坏。

我们把一台备库重启来模拟这个故障:

在重启之前,我们先看一下负载均衡器的配置:

[root@clup clup]# curl http://192.168.56.62:8082/backend/list?token=1e82ff78-d73f-11e7-8a50-60f81dd129c2
{"192.168.56.62:5432":{"State":0,"NextAddress":"192.168.56.64:5432","PreAddress":"192.168.56.63:5432"},"192.168.56.63:5432":{"State":0,"NextAddress":"192.168.56.62:5432","PreAddress":"192.168.56.64:5432"},"192.168.56.64:5432":{"State":0,"NextAddress":"192.168.56.63:5432","PreAddress":"192.168.56.62:5432"}}

从上面可以看到,每台备库的IP地址都有。等后面我们把机器pg04关机之后,会看到“192.168.56.64”这台机器从“read_cluser”中移除掉。

我们把pg04主机关掉:

[root@pg04 ~]# poweroff
Connection to 192.168.56.64 closed by remote host.
Connection to 192.168.56.64 closed.

这时我们在日志中可以看到如下信息:

2018-07-25 18:11:26,034 INFO Cluster(2): Find database(192.168.56.64:5432) failed, begin failover ...
2018-07-25 18:11:31,078 INFO Cluster(2): Host(192.168.56.64) is not ok, failover database(192.168.56.64:5432)...
2018-07-25 18:11:31,089 INFO Failover database(192.168.56.64:5432): read vip(192.168.56.67) switch to host(192.168.56.62).
2018-07-25 18:11:31,098 INFO Failover database(192.168.56.64:5432): begin remove bad host(192.168.56.64:5432) from cstlb ...
2018-07-25 18:11:31,113 INFO Failover database(192.168.56.64:5432): remove bad host(192.168.56.64:5432) from cstlb finished.
2018-07-25 18:11:31,124 INFO Failover database(192.168.56.64:5432): save node state to meta server...
2018-07-25 18:11:31,135 INFO Failover database(192.168.56.64:5432): save node state to meta server completed.
2018-07-25 18:11:31,141 INFO Failover database(192.168.56.64:5432): all commpleted.

同时我们用命令clupadm show命令也可以看到这个结点的变成了“Fault”:

[root@clup clup]# clupadm show
Clup v2.0.0 Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
                                                            cluster list
=====================================================================================================================================
cluster_id cluster_name   state       write_vip          read_vip       read_vip_host                  cstlb_list
---------- ------------ --------- ----------------- ----------------- ----------------- ----------------------------------------
         2 cluster01       Normal 192.168.56.68     192.168.56.67     192.168.56.62     ['192.168.56.62:8082', '192.168.56.63:8082']

                                                             db list
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cluster_id db_id   state    primary        host        port  os_user               pgdata
---------- ----- ---------- ------- ----------------- ----- ---------- ------------------------------
         2     1 Normal           1 192.168.56.61      5432 postgres   /home/postgres/pgdata
         2     2 Normal           0 192.168.56.62      5432 postgres   /home/postgres/pgdata
         2     3 Normal           0 192.168.56.63      5432 postgres   /home/postgres/pgdata
         2     4 Fault            0 192.168.56.64      5432 postgres   /home/postgres/pgdata       

注意上面最后一行中“state”状态从“Normal”已变成了“Fault”。

这时我们再看负载均衡器cstlb中配置,发现“192.168.56.64”这台机器的配置不见了:

[root@clup clup]# curl http://192.168.56.62:8082/backend/list?token=1e82ff78-d73f-11e7-8a50-60f81dd129c2
{"192.168.56.62:5432":{"State":0,"NextAddress":"192.168.56.63:5432","PreAddress":"192.168.56.63:5432"},"192.168.56.63:5432":{"State":0,"NextAddress":"192.168.56.62:5432","PreAddress":"192.168.56.62:5432"}}

当这台机器恢复后,clup不会自动把这台数据库加进来,我们需要手工把备库启动起来:

[postgres@pg04 ~]$ pg_ctl start
waiting for server to start....2018-07-25 10:13:17.999 UTC [274] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2018-07-25 10:13:17.999 UTC [274] LOG:  listening on IPv6 address "::", port 5432
2018-07-25 10:13:18.002 UTC [274] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2018-07-25 10:13:18.010 UTC [274] LOG:  redirecting log output to logging collector process
2018-07-25 10:13:18.010 UTC [274] HINT:  Future log output will appear in directory "log".
 done
server started

完成上面操作之后,我们就可以用clupadm repair命令把这个节点重新加入集群中:

[root@clup clup]# clupadm repair -c 2 -i 4
Clup v2.0.0 Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
taskid=12, task_name=repair 2(dbid=4)
2018-07-25 18:14:04:INFO: Repair(cluster=2, dbid=4): begin get cluster lock...
2018-07-25 18:14:04:INFO: Repair(cluster=2, dbid=4): begin...
2018-07-25 18:14:04:INFO: Repair(cluster=2, dbid=4): find primary database ...
2018-07-25 18:14:04:INFO: Repair(cluster=2, dbid=4): primary database: {'id': 1, 'state': 1, 'os_user': 'postgres', 'pgdata': '/home/postgres/pgdata', 'is_primary': 1, 'repl_app_name': 'stb61', 'host': '192.168.56.61', 'port': 5432}
2018-07-25 18:14:04:INFO: Repair(cluster=2, dbid=4): find fault database ...
2018-07-25 18:14:04:INFO: Repair: fault database: {'id': 4, 'state': 2, 'os_user': 'postgres', 'pgdata': '/home/postgres/pgdata', 'is_primary': 0, 'repl_app_name': 'stb64', 'host': '192.168.56.64', 'port': 5432}
2018-07-25 18:14:04:INFO: Repair(cluster=2, host=192.168.56.64): checking repair database replication steaming is ok...
2018-07-25 18:14:04:INFO: Repair(cluster=2, host=192.168.56.64): create recovery.conf for this database...
2018-07-25 18:14:04:INFO: Repair(cluster=2, host=192.168.56.64): create recovery.conf for this database completed
2018-07-25 18:14:04:INFO: Repair(cluster=2, host=192.168.56.64): change fault database node to normal ...
2018-07-25 18:14:04:INFO: Repair(cluster=2, host=192.168.56.64): all completed.

上面的最后的命令clupadm show中可以看到此节点的状态又从“Fault”变回了“Normal”。

到负载均衡器中,可以看到此节点重新加回来了:

[root@clup clup]# curl http://192.168.56.62:8082/backend/list?token=1e82ff78-d73f-11e7-8a50-60f81dd129c2
{"192.168.56.62:5432":{"State":0,"NextAddress":"192.168.56.64:5432","PreAddress":"192.168.56.63:5432"},"192.168.56.63:5432":{"State":0,"NextAddress":"192.168.56.62:5432","PreAddress":"192.168.56.64:5432"},"192.168.56.64:5432":{"State":0,"NextAddress":"192.168.56.63:5432","PreAddress":"192.168.56.62:5432"}}

3.3.3 主库主机宕掉

这时集群软件会检查到这个主数据库出现了问题,同时也会发现主机也出问题了。集群软件会从备库中挑选一台机器做为新主库。原主库标记为坏:

目前是192.168.56.61是主库,我们把这台机器停掉:

[root@pg01 ~]# poweroff
Connection to 192.168.56.61 closed by remote host.
Connection to 192.168.56.61 closed.

然后我们用clupadm show查看状态:

[root@clup clup]# clupadm show
Clup v2.0.0 Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
                                                            cluster list
=====================================================================================================================================
cluster_id cluster_name   state       write_vip          read_vip       read_vip_host                  cstlb_list
---------- ------------ --------- ----------------- ----------------- ----------------- ----------------------------------------
         2 cluster01       Normal 192.168.56.68     192.168.56.67     192.168.56.62     ['192.168.56.62:8082', '192.168.56.63:8082']

                                                             db list
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cluster_id db_id   state    primary        host        port  os_user               pgdata
---------- ----- ---------- ------- ----------------- ----- ---------- ------------------------------
         2     1 Normal           1 192.168.56.61      5432 postgres   /home/postgres/pgdata
         2     2 Normal           0 192.168.56.62      5432 postgres   /home/postgres/pgdata
         2     3 Normal           0 192.168.56.63      5432 postgres   /home/postgres/pgdata
         2     4 Normal           0 192.168.56.64      5432 postgres   /home/postgres/pgdata
         
[root@clup clup]# clupadm show
Clup v2.0.0 Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
                                                            cluster list
=====================================================================================================================================
cluster_id cluster_name   state       write_vip          read_vip       read_vip_host                  cstlb_list
---------- ------------ --------- ----------------- ----------------- ----------------- ----------------------------------------
         2 cluster01     Failover 192.168.56.68     192.168.56.67     192.168.56.62     ['192.168.56.62:8082', '192.168.56.63:8082']

                                                             db list
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cluster_id db_id   state    primary        host        port  os_user               pgdata
---------- ----- ---------- ------- ----------------- ----- ---------- ------------------------------
         2     1 Fault            0 192.168.56.61      5432 postgres   /home/postgres/pgdata
         2     2 Normal           1 192.168.56.62      5432 postgres   /home/postgres/pgdata
         2     3 Normal           0 192.168.56.63      5432 postgres   /home/postgres/pgdata
         2     4 Normal           0 192.168.56.64      5432 postgres   /home/postgres/pgdata

同时在日志中也可以看到如下信息:

2018-07-25 18:17:16,315 INFO Cluster(2): Find database(192.168.56.61:5432) failed, begin failover ...
2018-07-25 18:17:19,480 INFO Cluster(2): Host(192.168.56.61) is not ok, failover database(192.168.56.61:5432)...
2018-07-25 18:17:19,510 INFO Switch primary database(192.168.56.61:5432): begin delete write vip(192.168.56.68) ...
2018-07-25 18:17:22,494 INFO Switch primary database(192.168.56.61:5432): delete write vip(192.168.56.68) completed.
2018-07-25 18:17:23,531 INFO Failover primary database(192.168.56.61:5432): switch to new host(192.168.56.62)...
2018-07-25 18:17:23,552 INFO Failover primary database(192.168.56.61:5432): save node state to meta server completed.
2018-07-25 18:17:23,561 INFO Failover primary database(192.168.56.61:5432): begin remove new primary database(192.168.56.62) from cstlb ...
2018-07-25 18:17:23,571 INFO Failover primary database(192.168.56.61:5432): remove new primary database(192.168.56.61) from cstlb finished.
2018-07-25 18:17:23,577 INFO Failover primary database(192.168.56.61:5432): change all standby database upper level primary database to host(192.168.56.62)...
2018-07-25 18:17:32,163 INFO Failover primary database(192.168.56.61:5432): change all standby database upper level primary database to host(192.168.56.62) completed.
2018-07-25 18:17:32,178 INFO Failover primary database(192.168.56.61:5432): Promote standby database(192.168.56.62) to primary...
2018-07-25 18:17:33,697 INFO Failover primary database(192.168.56.61:5432): Promote standby database(192.168.56.62) to primary completed.
2018-07-25 18:17:33,714 INFO Failover primary database(192.168.56.61:5432): switch to new host(192.168.56.62) completed.

这时在192.168.56.62上可以看到,写vip(192.168.56.68)也切换到这台机器上了:

[root@pg02 cstlb]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
8: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    link/ether 00:16:3e:a7:13:6c brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.56.62/24 brd 192.168.56.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.56.67/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.56.68/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fea7:136c/64 scope link
       valid_lft forever preferred_lft forever

等192.168.56.61机器恢复之后,这台机器想加入集群,只能做为备库加入了,这时我们需要在这台机器上重新搭建备库,如果没有搭建好备库,我们运行repair命令会有如下结果:

[root@clup logs]# clupadm repair -c 2 -i 1
Clup v2.0.0 Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
taskid=14, task_name=repair 2(dbid=1)
2018-07-25 18:20:22:INFO: Repair(cluster=2, dbid=1): begin get cluster lock...
2018-07-25 18:20:22:INFO: Repair(cluster=2, dbid=1): begin...
2018-07-25 18:20:22:INFO: Repair(cluster=2, dbid=1): find primary database ...
2018-07-25 18:20:22:INFO: Repair(cluster=2, dbid=1): primary database: {'id': 2, 'state': 1, 'os_user': 'postgres', 'pgdata': '/home/postgres/pgdata', 'is_primary': 1, 'repl_app_name': 'stb62', 'host': '192.168.56.62', 'port': 5432}
2018-07-25 18:20:22:INFO: Repair(cluster=2, dbid=1): find fault database ...
2018-07-25 18:20:22:INFO: Repair: fault database: {'id': 1, 'state': 2, 'os_user': 'postgres', 'pgdata': '/home/postgres/pgdata', 'is_primary': 0, 'repl_app_name': 'stb61', 'host': '192.168.56.61', 'port': 5432}
2018-07-25 18:20:22:INFO: Repair(cluster=2, host=192.168.56.61): checking repair database replication steaming is ok...
2018-07-25 18:20:22:INFO: Repair(cluster=2, host=192.168.56.61): repair database replication steaming not ok, please retry after configuration replication streaming.
2018-07-25 18:20:22:INFO: Repair(cluster=2, host=192.168.56.61):  请在坏的节点上搭建好备库后,再重试:

搭建备库的命令如下:
pg_basebackup -D /home/postgres/pgdata -Upostgres -h 192.168.56.62 -p 5432 -x

recovery.conf的内容如下:
standby_mode = 'on'
recovery_target_timeline = 'latest'
primary_conninfo = 'application_name=stb61 user=postgres host=192.168.56.62 port=5432 password=****** sslmode=disable sslcompression=1'

通常我们不能把这个数据库直接转成备库,因为旧的主库是异常宕机,有可能超前原先的备库,这时如果我们添加recovery.conf,启动这个原主库时,发现无法与新主库建立流复制,同时会在数据库日志上可以类似如下的日志:

2018-07-25 10:23:09.298 UTC [297] DETAIL:  End of WAL reached on timeline 14 at 0/18BAA30.
2018-07-25 10:23:09.298 UTC [293] LOG:  new timeline 15 forked off current database system timeline 14 before current recovery point 0/18BAAA0
2018-07-25 10:23:14.285 UTC [297] LOG:  restarted WAL streaming at 0/1000000 on timeline 14
2018-07-25 10:23:14.300 UTC [297] LOG:  replication terminated by primary server
2018-07-25 10:23:14.300 UTC [297] DETAIL:  End of WAL reached on timeline 14 at 0/18BAA30.
2018-07-25 10:23:14.300 UTC [293] LOG:  new timeline 15 forked off current database system timeline 14 before current recovery point 0/18BAAA0
2018-07-25 10:23:19.291 UTC [297] LOG:  restarted WAL streaming at 0/1000000 on timeline 14
2018-07-25 10:23:19.310 UTC [297] LOG:  replication terminated by primary server
2018-07-25 10:23:19.310 UTC [297] DETAIL:  End of WAL reached on timeline 14 at 0/18BAA30.
2018-07-25 10:23:19.311 UTC [293] LOG:  new timeline 15 forked off current database system timeline 14 before current recovery point 0/18BAAA0

上面的信息中“new timeline X forked off current database system timeline Y before current recovery point 0/XXXXXXX”这样的信息,就表明这个新主库与旧主库走到了不同的分支上,旧主库不能再做为新主库的备库使用了。

同时用clupadm repl_delay命令看到这个节点的流复制的状态始终是“startup”,不能变成正常状态“streaming”:

[root@clup logs]# clupadm repl_delay -c 2
Clup v2.0.0 Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
       host       primary   state   current_lsn  sent_delay  write_delay  flush_delay  replay_delay   is_sync     repl_state
----------------- ------- --------- ----------- ------------ ------------ ------------ ------------ ------------ ------------
192.168.56.61           0     Fault   0/18BD440        10768        10768        10656        10656        async      startup
192.168.56.62           1    Normal         N/A          N/A          N/A          N/A          N/A          N/A          N/A
192.168.56.63           0    Normal   0/18BD440            0            0            0            0        async    streaming
192.168.56.64           0    Normal   0/18BD440            0            0            0            0        async    streaming

这时可以尝试用pg_rewind把此旧主库转成standby,如果pg_rewind不行,只能在此机器上重新搭建备库了。

运行pg_rewind,这台旧数据库上的数据库不能启动,使用pg_rewind的命令如下:

[postgres@pg01 pg_log]$ pg_rewind --target-pgdata $PGDATA --source-server='host=192.168.56.62 port=5432 user=postgres dbname=template1 password=postgres' -P
connected to server
servers diverged at WAL position 0/B49CF78 on timeline 1
rewinding from last common checkpoint at 0/B49A850 on timeline 1
reading source file list
reading target file list
reading WAL in target
need to copy 151 MB (total source directory size is 187 MB)
155617/155617 kB (100%) copied
creating backup label and updating control file
syncing target data directory
initdb: could not stat file "/home/postgres/pgdata/postmaster.pid": No such file or directory
Done!

通常pg_rewind需要事先把数据库的参数“wal_log_hints”打开,具体请参考相关的文档。如果没有打开,则会报如下的错误。

[postgres@pg01 ~]$ pg_rewind --target-pgdata $PGDATA --source-server='host=192.168.56.62 port=5432 user=postgres dbname=template1 password=postgres' -P
connected to server

target server needs to use either data checksums or "wal_log_hints = on"
Failure, exiting

如果数据库不是很大,可以直接用pg_basebackup重新搭建备库:

[postgres@pg01 ~]$ pg_basebackup -D /home/postgres/pgdata -Upostgres -h 192.168.56.62 -p 5432 -X stream -P
Password:
32611/32611 kB (100%), 1/1 tablespace

同时把recovery.conf文件中的内容配置如下:

standby_mode = 'on'
recovery_target_timeline = 'latest'
primary_conninfo = 'application_name=stb61 user=postgres host=192.168.56.62 port=5432 password=xxxxxx sslmode=disable sslcompression=1'

当在192.168.56.61上面搭建后备库后,我们需要到主库192.168.56.62上检查此备库与主库的流复制是否正常:

[postgres@pg02 pgdata]$ psql
psql (10.4)
Type "help" for help.

postgres=# select * from pg_stat_replication;
 pid  | usesysid | usename  | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lsn  | write_l
sn | flush_lsn | replay_lsn |    write_lag    |    flush_lag    |   replay_lag    | sync_priority | sync_state
------+----------+----------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+--------
---+-----------+------------+-----------------+-----------------+-----------------+---------------+------------
 4732 |       10 | postgres | stb61            | 192.168.56.61 |                 |       57048 | 2018-07-25 10:33:17.407004+00 |         3962 | streaming | 0/4001160 | 0/40011
60 | 0/4001160 | 0/4001160  | 00:00:00.003356 | 00:00:00.004247 | 00:00:00.004248 |             0 | async
 3698 |       10 | postgres | stb64            | 192.168.56.64 |                 |       46080 | 2018-07-25 10:17:30.51685+00  |         3962 | streaming | 0/4001160 | 0/40011
60 | 0/4001160 | 0/4001160  | 00:00:00.004305 | 00:00:00.004305 | 00:00:00.004305 |             0 | async
 3697 |       10 | postgres | stb63            | 192.168.56.63 |                 |       50308 | 2018-07-25 10:17:26.065478+00 |         3963 | streaming | 0/4001160 | 0/40011
60 | 0/4001160 | 0/4001160  | 00:00:00.004358 | 00:00:00.004359 | 00:00:00.004359 |             0 | async
(3 rows)

也可以用我们的命令“clupadm repl_delay”检查:

[root@clup logs]# clupadm repl_delay -c 2
Clup v2.0.0 Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
       host       primary   state   current_lsn  sent_delay  write_delay  flush_delay  replay_delay   is_sync     repl_state
----------------- ------- --------- ----------- ------------ ------------ ------------ ------------ ------------ ------------
192.168.56.61           0     Fault   0/40015F8            0            0            0            0        async    streaming
192.168.56.62           1    Normal         N/A          N/A          N/A          N/A          N/A          N/A          N/A
192.168.56.63           0    Normal   0/40015F8            0            0            0            0        async    streaming
192.168.56.64           0    Normal   0/40015F8            0            0            0            0        async    streaming

如果流复制都正常后,我们就可以用repair命令重新把节点加入集群中:

[root@clup ~]# clupadm repair -c 2 -i 1
Clup v2.0.0 Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
taskid=15, task_name=repair 2(dbid=1)
2018-07-25 18:37:14:INFO: Repair(cluster=2, dbid=1): begin get cluster lock...
2018-07-25 18:37:14:INFO: Repair(cluster=2, dbid=1): begin...
2018-07-25 18:37:14:INFO: Repair(cluster=2, dbid=1): find primary database ...
2018-07-25 18:37:14:INFO: Repair(cluster=2, dbid=1): primary database: {'id': 2, 'state': 1, 'os_user': 'postgres', 'pgdata': '/home/postgres/pgdata', 'is_primary': 1, 'repl_app_name': 'stb62', 'host': '192.168.56.62', 'port': 5432}
2018-07-25 18:37:14:INFO: Repair(cluster=2, dbid=1): find fault database ...
2018-07-25 18:37:14:INFO: Repair: fault database: {'id': 1, 'state': 2, 'os_user': 'postgres', 'pgdata': '/home/postgres/pgdata', 'is_primary': 0, 'repl_app_name': 'stb61', 'host': '192.168.56.61', 'port': 5432}
2018-07-25 18:37:14:INFO: Repair(cluster=2, host=192.168.56.61): checking repair database replication steaming is ok...
2018-07-25 18:37:14:INFO: Repair(cluster=2, host=192.168.56.61): create recovery.conf for this database...
2018-07-25 18:37:15:INFO: Repair(cluster=2, host=192.168.56.61): create recovery.conf for this database completed
2018-07-25 18:37:15:INFO: Repair(cluster=2, host=192.168.56.61): change fault database node to normal ...
2018-07-25 18:37:15:INFO: Repair(cluster=2, host=192.168.56.61): all completed.

然后再用命令clupadm show看到节点的状态就变成“Normal”了:

[root@clup ~]# clupadm show
Clup v2.0.0 Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
                                                            cluster list
=====================================================================================================================================
cluster_id cluster_name   state       write_vip          read_vip       read_vip_host                  cstlb_list
---------- ------------ --------- ----------------- ----------------- ----------------- ----------------------------------------
         2 cluster01       Normal 192.168.56.68     192.168.56.67     192.168.56.62     ['192.168.56.62:8082', '192.168.56.63:8082']

                                                             db list
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cluster_id db_id   state    primary        host        port  os_user               pgdata
---------- ----- ---------- ------- ----------------- ----- ---------- ------------------------------
         2     1 Normal           0 192.168.56.61      5432 postgres   /home/postgres/pgdata
         2     2 Normal           1 192.168.56.62      5432 postgres   /home/postgres/pgdata
         2     3 Normal           0 192.168.56.63      5432 postgres   /home/postgres/pgdata
         2     4 Normal           0 192.168.56.64      5432 postgres   /home/postgres/pgdata   

这样集群就恢复成正常状态。只是现在主库变成了192.168.56.62。

3.4.4 人工切换主库

有时我们需求把主库切换到另一台机器上,这时可以用switch命令,如下所示:

clupadm switch -c 2 -i 1

上面的把主库切换到结点1。 执行上面的命令会返回如下信息:

[root@clup ~]# clupadm switch -c 2 -i 1
Clup v2.0.0 Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
Task_id=16, task_info: switch_primary 2(dbid=1)
Please use "clupadm task_log -t 16" show switch detail log.

按上面的提示,运行“clupadm task_log -t 16”,可以看到切换过程听详细日志:

[root@clup ~]# clupadm task_log -t 16
Clup v2.0.0 Copyright (c) 2018 HangZhou CSTech.Ltd. All rights reserved.
2018-07-25 18:38:29:INFO: Switch to primary(cluster=2, dbid=1): begin ...
2018-07-25 18:38:30:INFO: Switch to primary(cluster=2, dbid=1): begin get cluster lock...
2018-07-25 18:38:30:INFO: Switch to primary(cluster=2, dbid=1):  the cluster lock has been obtained.
2018-07-25 18:38:30:INFO: Switch to primary(cluster=2, host=192.168.56.61): find current primary database...
2018-07-25 18:38:30:INFO: Switch to primary(cluster=2, host=192.168.56.61): drop write vip from current primary database(192.168.56.62)
2018-07-25 18:38:30:INFO: Switch to primary(cluster=2, host=192.168.56.61): begin stopping current primary database(192.168.56.62)...
2018-07-25 18:38:31:INFO: Switch to primary(cluster=2, host=192.168.56.61): current primary database(192.168.56.62) stopped.
2018-07-25 18:38:31:INFO: Switch to primary(cluster=2, host=192.168.56.61): begin remove new primary database(192.168.56.61) from cstlb ...
2018-07-25 18:38:31:INFO: Switch to primary(cluster=2, host=192.168.56.61): remove new primary database(192.168.56.61) from cstlb finished.
2018-07-25 18:38:31:INFO: Switch to primary(cluster=2, host=192.168.56.61): stop new pirmary database then sync xlog from old primary ...
2018-07-25 18:38:33:INFO: Switch to primary(cluster=2, host=192.168.56.61): stop new pirmary database then sync xlog from old primary finished
2018-07-25 18:38:33:INFO: Switch to primary(cluster=2, host=192.168.56.61): restart new pirmary database and wait it is ready ...
2018-07-25 18:38:35:INFO: Switch to primary(cluster=2, host=192.168.56.61): new pirmary database started and it is ready.
2018-07-25 18:38:35:INFO: Switch to primary(cluster=2, host=192.168.56.61): begin add old primary database(192.168.56.62) to cstlb...
2018-07-25 18:38:35:INFO: Switch to primary(cluster=2, host=192.168.56.61): add old primary database(192.168.56.62) to cstlb finished.
2018-07-25 18:38:35:INFO: Switch to primary(cluster=2, host=192.168.56.61): promote new primary ...
2018-07-25 18:38:36:INFO: Switch to primary(cluster=2, host=192.168.56.61): promote new primary completed.
2018-07-25 18:38:36:INFO: Switch to primary(cluster=2, host=192.168.56.61): add primary vip(192.168.56.68) to new primary(192.168.56.61) ...
2018-07-25 18:38:39:INFO: Switch to primary(cluster=2, host=192.168.56.61): add primary vip(192.168.56.68) to new primary(192.168.56.61) completed.
2018-07-25 18:38:39:INFO: Switch to primary(cluster=2, host=192.168.56.61): change all standby database upper level primary database to new primary...
2018-07-25 18:38:49:INFO: Switch to primary(cluster=2, host=192.168.56.61): change all standby database upper level primary database to new primary completed.
2018-07-25 18:38:49:INFO: Switch to primary(cluster=2, host=192.168.56.61): success.

3.5 注意事项

3.5.1 recovery.conf的注意事项

当发生切换时,原主库会转换成备库,会自动生成一个recovery.conf文件覆盖原先的recovery.conf文件。所以不能在recovery.conf中添加一些自定义的内容,因为在每次切换后这些内容会被覆盖掉。

3.5.2 主备库之间WAL落后太多,导致备库失效的问题

通常主库发生一次checkpoint点这后,这个checkpoint点之前的WAL日志会被删除掉,但这些日志如果还没有来得及传到备库,会导致备库失效掉。这导致发生故障时无法让备库成为主库。

解决办法:可以在数据库的配置文件postgresql.conf中设置:

max_wal_size = 8GB
min_wal_size = 6GB

保证min_wal_size有一定的大小。