PostgreSQL9.5：pg_rewind 快速恢復備節點

hersion 發布于2019-07-25 12:34 / 1917人閱讀

摘要：上操作備注執行拋出以上錯誤，錯誤內容很明顯。再次上操作備注成功。啟動原主庫，上操作數據驗證上操作備注成功，原主庫現在是以備庫角色啟動，而且數據表也同步過來了。三原理四參考的主備切換使用搭建流復制環境

了解 PG 的朋友應該知道 PG 的主備切換并不容易，步驟較嚴謹，在激活備節點前需主動關閉主節點，否則再想以備節點角色拉起主節點會比較困難，之前博客介紹過主備切換，PostgreSQL HOT-Standby 的主備切換，PG 9.5 版本已經將 pg_rewind 加入到源碼，當主備發生切換時，可以將原來主庫通過同步模式恢復，避免重做備庫。這樣對于較大的庫來說，節省了大量重做備庫時間。

pg_rewind 會將目標庫的數據文件，配置文件復制到本地目錄，由于 pg_rewind 不會讀取所有未發生變化的數據塊，所以速度比重做備庫要快很多，

一環境準備

流復制環境
192.168.2.37/1931 主節點(主機名 db1)
192.168.2.38/1931 備節點(主機名 db2)
備注：流復制環境參考 PostgreSQL：使用 pg_basebackup 搭建流復制環境，本文略。

--pg_rewind 前提條件
1 full_page_writes
2 wal_log_hints 設置成 on 或者 PG 在初始化時開啟 checksums 功能

二主備切換

--備節點 recovery.conf 配置: db2 上操作

[pg95@db2 pg_root]$ grep ^[a-z] recovery.conf 
recovery_target_timeline = "latest"
standby_mode = on
primary_conninfo = "host=192.168.2.37 port=1931 user=repuser"           # e.g. "host=localhost port=5432"

--激活備節點: db2 上操作

[pg95@db2 pg_root]$ pg_ctl promote -D $PGDATA
server promoting

[pg95@db2 pg_root]$ pg_controldata | grep cluster
Database cluster state:               in production

--備節點激活后，創建一張測試表并插入數據

[pg95@db2 pg_root]$ psql
psql (9.5alpha1)
Type "help" for help.

postgres=# create table test_2(id int4);
CREATE TABLE
                   
postgres=# insert into test_2(id) select n from generate_series(1,10000) n;
INSERT 0 10000

--停原來主節點: db1 上操作

[pg95@db1 ~]$ pg_controldata | grep cluster
Database cluster state:               in production

[pg95@db1 ~]$ pg_ctl stop -m fast -D $PGDATA
waiting for server to shut down....... done
server stopped

備注：停完原主庫后，千萬不能立即以備節點形式拉起老庫，否則在執行 pg_rewind 時會報，"target server must be shut down cleanly" 錯誤。

--pg_rewind: db1 上操作

[pg95@db1 pg_root]$ pg_ctl stop -m fast -D $PGDATA
waiting for server to shut down......... done
server stopped

[pg95@db1 pg_root]$ pg_rewind --target-pgdata $PGDATA --source-server="host=192.168.2.38 port=1931 user=postgres dbname=postgres" -P 
connected to server
target server needs to use either data checksums or "wal_log_hints = on"

備注：執行 pg_rewind 拋出以上錯誤，錯誤內容很明顯。

--pg_rewind 代碼分析

  364     /*
  365      * Target cluster need to use checksums or hint bit wal-logging, this to
  366      * prevent from data corruption that could occur because of hint bits.
  367      */
  368     if (ControlFile_target.data_checksum_version != PG_DATA_CHECKSUM_VERSION &&
  369         !ControlFile_target.wal_log_hints)
  370     {
  371         pg_fatal("target server needs to use either data checksums or "wal_log_hints = on"
");
  372     }
  373

備注：數據庫在 initdb 時需要開啟 checksums 或者設置 "wal_log_hints = on"，接著設置主，備節點的 wal_log_hints 參數并重啟數據庫。

--再次 pg_rewind, db1 上操作

[pg95@db1 pg_root]$ pg_rewind --target-pgdata $PGDATA --source-server="host=192.168.2.38 port=1931 user=postgres dbname=postgres" -P
connected to server
The servers diverged at WAL position 0/1300CEB0 on timeline 5.
Rewinding from last common checkpoint at 0/1200008C on timeline 5
reading source file list
reading target file list
reading WAL in target
need to copy 59 MB (total source directory size is 76 MB)
61185/61185 kB (100%) copied
creating backup label and updating control file
Done!

備注：pg_rewind 成功。

--調整 recovery.conf 文件: db1 操作
[pg95@db1 ~]$ cd $PGDATA
[pg95@db1 pg_root]$ mv recovery.done recovery.conf

備注：注意是否需要修改 primary_conninfo 配置。

[pg95@db1 pg_root]$ grep ^[a-z] recovery.conf 
recovery_target_timeline = "latest"
standby_mode = on
primary_conninfo = "host=192.168.2.38 port=1931 user=repuser"           # e.g. "host=localhost port=5432"

--啟動原主庫， db1 上操作

[pg95@db1 pg_root]$ pg_ctl start -D $PGDATA
server starting

[pg95@db1 pg_root]$ pg_controldata | grep cluster
Database cluster state:               in archive recovery

--數據驗證, db1 上操作

[pg95@db1 pg_root]$ psql
psql (9.5alpha1)
Type "help" for help.

postgres=# select count(*) from test_2;
 count 
-------
 10000
(1 row)

備注：pg_rewind 成功，原主庫現在是以備庫角色啟動，而且數據表 test_2 也同步過來了。

三 pg_rewind 原理

The basic idea is to copy everything from the new cluster to the old cluster, except for the blocks that we know to be the same.

    1)Scan the WAL log of the old cluster, starting from the last checkpoint before the point where the new cluster"s timeline history forked off from the old cluster. For each WAL record, make a note of the data blocks that were touched. This yields a list of all the data blocks that were changed in the old cluster, after the new cluster forked off.

    2)Copy all those changed blocks from the new cluster to the old cluster.

    3)Copy all other files like clog, conf files etc. from the new cluster to old cluster. Everything except the relation files.

    4) Apply the WAL from the new cluster, starting from the checkpoint created at failover. (Strictly speaking, pg_rewind doesn"t apply the WAL, it just creates a backup label file indicating that when PostgreSQL is started, it will start replay from that checkpoint and apply all the required WAL.)

四參考

PostgreSQL HOT-Standby 的主備切換
PostgreSQL：使用 pg_basebackup 搭建流復制環境
pg_rewind

云數據庫MySQL 云數據庫 PostgreSQL 快速恢復精力 MySQL快速恢復 mysql 快速備份與恢復 pg_rewind

文章版權歸作者所有，未經允許請勿轉載,若此文章存在違規行為，您可以聯系管理員刪除。

轉載請注明本文地址：http://specialneedsforspecialkids.com/yun/38973.html

企業數字化轉型，數據如何保護

摘要：在企業這場數字化轉型的馬拉松賽跑中，聰明的正在尋求新的技術方案以保護企業的數據和業務安全，而英方不管在技術方案還是在實踐案例方面，都以全新的奔跑姿態與們在同一條跑道的同一水平上。企業數字化轉型就像一場馬拉松賽跑，在漫長的賽道上，哪怕最頂級的選手，也有可能會被后來者趕超。因為在數字化進程中，除了業務方向跑對之外，企業的信息安全是會影響企業戰略大局的關鍵。這絕非危言聳聽，而是有事實依據。美國德克...

bovenson 2019-04-29 19:22 評論0 收藏0
私有災備云解決方案

摘要：災備服務支持本地災備異地災備公有云災備兩地三中心等多種服務方式，可根據業務特點和需求，靈活選擇災備方式，保證業務的和。公有云災備架構公有云災備服務支持多種業務部署方式，為云平臺業務提供不同指標，控制云平臺業務災備成本。UCloudStack 云平臺通過分布式存儲系統保證本地數據的安全性，同時通過遠程數據備份服務，為用戶提供遠程數據備份和容災備服務，可以將本地云端數據統一歸檔、備份至遠程云...

youkede 2022-06-28 15:00 評論0 收藏0
Mongo、Redis、Memcached對比及知識總結

摘要：當重啟時，將會讀取文件進行重放以恢復到關閉前的最后時刻。伸縮性受到線程數的限制，線程數的調度對也是不小的負擔。所以允許我們設置線程池的大小，對需要從文件中加載相應數據的讀取請求進行并發操作，減少阻塞的時間。存儲原理（持久化） Mongo Mongo的數據將會保存在底層文件系統，因此存儲容量遠大于redis和memcached。一個database中所有的collections以及索...

Vultr 2019-08-06 12:25 評論0 收藏0