某系統oracle數據庫 RAC有2個節點,節點2本地硬盤故障,/u01目錄無法打開,導致節點2 grid和oracle相關軟件全部丟失。下面記錄了恢復節點2 的故障處理及恢復過程。本文章同樣適用于刪除和添加節點。
grid: GRID_HOME 名稱為 ORACLE_HOME ,路徑為:/u01/app/11.2.0/grid
▼▼▼
[root@wbtdb1 ~]# su - grid
[grid@wbtdb1 ~]$ echo $ORACLE_HOME
/u01/11.2.0/grid
[grid@wbtdb1 ~]$ echo $ORACLE_BASE
/u01/app/oracle
▼▼▼
[root@wbtdb2 ~]# su - oracle
[oracle@wbtdb2 ~]$ echo $ORACLE_HOME
/u01/app/oracle/product/11.2.0/db_1
[oracle@wbtdb2 ~]$ echo $ORACLE_BASE
/u01/app/oracle
[oracle@wbtdb2 ~]
從節點2來看,軟件已經沒有了,任何oracle相關命令都無法執行了,因為oracle相關軟件目錄已損壞。
▼▼▼
[grid@wbtdb2 ~]$ crsctl stat res -t
-bash: crsctl: command not found
[grid@wbtdb2 ~]$
▼▼▼
[grid@wbtdb1 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCHDG.dg
ONLINE ONLINE wbtdb1
ONLINE ONLINE wbtdb2
ora.DATADG.dg
ONLINE ONLINE wbtdb1
ONLINE ONLINE wbtdb2
ora.LISTENER.lsnr
ONLINE ONLINE wbtdb1
ONLINE INTERMEDIATE wbtdb2
ora.OCRVOTING.dg
ONLINE ONLINE wbtdb1
ONLINE ONLINE wbtdb2
ora.asm
ONLINE ONLINE wbtdb1 Started
ONLINE ONLINE wbtdb2 Started
ora.gsd
OFFLINE OFFLINE wbtdb1
OFFLINE OFFLINE wbtdb2
ora.net1.network
ONLINE ONLINE wbtdb1
ONLINE ONLINE wbtdb2
ora.ons
ONLINE ONLINE wbtdb1
ONLINE INTERMEDIATE wbtdb2
ora.registry.acfs
ONLINE ONLINE wbtdb1
ONLINE ONLINE wbtdb2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE wbtdb1
ora.cvu
1 ONLINE ONLINE wbtdb2
ora.oc4j
1 ONLINE INTERMEDIATE wbtdb2
ora.scan1.vip
1 ONLINE ONLINE wbtdb1
ora.wbtdb.db
1 ONLINE ONLINE wbtdb1 Open
2 ONLINE ONLINE wbtdb2 Open
ora.wbtdb1.vip
1 ONLINE ONLINE wbtdb1
ora.wbtdb2.vip
1 ONLINE ONLINE wbtdb2
[grid@wbtdb1 ~]$
刪除節點2 oracle實例并更新oracle_home數據庫列表
節點2服務器壞掉,從節點1上,oracle用戶下執行dbca
▼▼▼
[root@wbtdb1 ~]# xhost +
access control disabled, clients can connect from any host
[root@wbtdb1 ~]# export DISPLAY=192.168.1.234:0.0
[root@wbtdb1 ~]# su - oracle
[oracle@wbtdb1 ~]$ xhost +
access control disabled, clients can connect from any host
xhost: must be on local machine to enable or disable access control.
[oracle@wbtdb1 ~]$
[oracle@wbtdb1 ~]$ dbca
大致步驟如下:
▼▼▼
dbca -silent -deleteInstance [-nodeList node_name] -gdbName gdb_name -instanceName instance_name -sysDBAUserName sysdba -sysDBAPassword password
-gdbName gdb_name 這里的gdb_name是global_name
select * from global_name; 可以查看該值
node_name 是刪除節點名
gdb_name 是全局數據庫名
instance_name 是刪除的實例名
sysdba 是擁有sysdba權限的oracle用戶名稱
password 是sysdba用戶的密碼
[oracle@wbtdb1 ~]$ dbca -silent -deleteInstance -nodeList wbtdb2 -gdbName wbtdb -instanceName wbtdb2 -sysDBAUserName sys -sysDBAPassword oracle
Deleting instance
1% complete
2% complete
6% complete
13% complete
20% complete
26% complete
33% complete
40% complete
46% complete
53% complete
60% complete
66% complete
Completing instance management.
100% complete
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/wbtdb.log" for further details.
-gdbName wbtdb 這里的wbtdb是global_name
select * from global_name; 可以查看該值
node_name 是刪除節點名
gdb_name 是全局數據庫名
instance 是刪除的實例名
sysdba 是擁有sysdba權限的oracle用戶名稱
passwd 是sysdba用戶的密碼
節點1 切換oracle用戶下:
▼▼▼
[oracle@wbtdb1 db_1]$ echo $ORACLE_HOME
/u01/app/oracle/product/11.2.0/db_1
[oracle@wbtdb1 db_1]$ cd $ORACLE_HOME/oui/bin
[oracle@wbtdb1 bin]$ ./runInstaller -updateNodeList ORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1 "CLUSTER_NODES={wbtdb1}"
--這里是填寫保留的節點(正常的節點)
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB. Actual 4093 MB Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oraInventory
UpdateNodeList was successful.
▼▼▼
set line 200
select thread#,status,instance from v$thread;
THREAD# STATUS INSTANCE
---------- ----------- ---------------
1 OPEN wbtdb1
如果還有節點2的redo log ,請使用以下命令:
▼▼▼
ALTER DATABASE DISABLE THREAD 2;
驗證OCR中 數據庫信息,語法如下:
▼▼▼
srvctl config database -d db_unique_name
▼▼▼
[oracle@wbtdb1 ~]$ srvctl config database -d wbtdb
Database unique name: wbtdb
Database name: wbtdb
Oracle home: /u01/app/oracle/product/11.2.0/db_1
Oracle user: oracle
Spfile: +DATADG/wbtdb/spfilewbtdb.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: wbtdb
Database instances: wbtdb1
Disk Groups: DATADG
Mount point paths:
Services:
Type: RAC
Database is administrator managed
節點1,grid用戶下:
▼▼▼
[grid@wbtdb1 ~]$ srvctl status listener -l listener -n wbtdb2
Listener LISTENER is enabled on node(s): wbtdb2
Listener LISTENER is running on node(s): wbtdb2
▼▼▼
[grid@wbtdb1 ~]$ srvctl disable listener -l listener -n wbtdb2
[grid@wbtdb1 ~]$ srvctl stop listener -l listener -n wbtdb2
PRCR-1014 : Failed to stop resource ora.LISTENER.lsnr
PRCR-1065 : Failed to stop resource ora.LISTENER.lsnr
CRS-2675: Stop of ora.LISTENER.lsnr on wbtdb2 failed
CRS-2678: ora.LISTENER.lsnr on wbtdb2 has experienced an unrecoverable failure
CRS-0267: Human intervention required to resume its availability.
[grid@wbtdb1 ~]$ srvctl status listener -l listener -n wbtdb2
Listener LISTENER is disabled on node(s): wbtdb2
Listener LISTENER is not running on node(s): wbtdb2
▼▼▼
$./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={remaining_node_list}"
▼▼▼
cd $ORACLE_HOME/oui/bin
[grid@wbtdb1 bin]$ ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={wbtdb1}"
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB. Actual 4095 MB Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oraInventory
UpdateNodeList was successful.
3.1 確認節點狀態是否是Unpinned
ROOT 或者grid 執行
▼▼▼
[grid@wbtdb1 bin]$ olsnodes -s -t
wbtdb1 Active Unpinned
wbtdb2 Active Unpinned
[grid@wbtdb1 bin]$
官方文檔如下:
提別提醒:很多網絡上資料不正確,如果Unpinned(不固定的),根本不需要執行unpin 命令
▼▼▼
crsctl unpin css -n
例如:
▼▼▼
crsctl unpin css -n wbtdb2
/u01/11.2.0/grid/bin/crsctl unpin css -n wbtdb2
crsctl status res -t
首先停止節點2的VIP:(VIP_name 為/etc/hosts里的名稱 rac2-vip)
▼▼▼
srvctl stop vip -i vip_name -f
▼▼▼
[root@wbtdb1 ~]# /u01/11.2.0/grid/bin/srvctl stop vip -i wbtdb2-vip -f
[root@wbtdb1 ~]#
清除vip信息:srvctl remove vip -i vip_name -f
[root@wbtdb1 ~]# /u01/11.2.0/grid/bin/srvctl remove vip -i wbtdb2-vip -f
查看VIP:
/u01/11.2.0/grid/bin/crsctl status res -t
▼▼▼
[grid@wbtdb1 ~]$ olsnodes -s -t
wbtdb1 Active Unpinned
wbtdb2 Active Unpinned
正常節點1上root執行刪除節點2命令:
▼▼▼
[root@wbtdb1 ~]# /u01/11.2.0/grid/bin/crsctl delete node -n wbtdb2
CRS-4661: Node wbtdb2 successfully deleted.
驗證:
▼▼▼
[grid@wbtdb1 ~]$ olsnodes -s -t
wbtdb1 Active Unpinned
注:如果節點2刪除失敗報CRS-4658、CRS-4000錯誤,可以將節點2 crs相關進程殺掉即可
▼▼▼
[root@wbtdb1 ~]# /u01/11.2.0/grid/bin/crsctl delete node -n wbtdb2
CRS-4658: The clusterware stack on node wbtdb2 is not completely down.
CRS-4000: Command Delete failed, or completed with errors.
發現刪除失敗,檢查crs啟動情況:
▼▼▼
[root@wbtdb1 bin]# ./crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCHDG.dg
ONLINE ONLINE wbtdb1
ONLINE ONLINE wbtdb2
ora.DATADG.dg
ONLINE ONLINE wbtdb1
ONLINE ONLINE wbtdb2
ora.LISTENER.lsnr
ONLINE ONLINE wbtdb1
OFFLINE UNKNOWN wbtdb2
ora.OCRVOTING.dg
ONLINE ONLINE wbtdb1
ONLINE ONLINE wbtdb2
ora.asm
ONLINE ONLINE wbtdb1 Started
ONLINE ONLINE wbtdb2 Started
ora.gsd
OFFLINE OFFLINE wbtdb1
OFFLINE OFFLINE wbtdb2
ora.net1.network
ONLINE ONLINE wbtdb1
ONLINE ONLINE wbtdb2
ora.ons
ONLINE ONLINE wbtdb1
OFFLINE UNKNOWN wbtdb2
ora.registry.acfs
ONLINE ONLINE wbtdb1
ONLINE ONLINE wbtdb2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE wbtdb1
ora.cvu
1 ONLINE ONLINE wbtdb2
ora.oc4j
1 ONLINE INTERMEDIATE wbtdb2
ora.scan1.vip
1 ONLINE ONLINE wbtdb1
ora.wbtdb.db
1 ONLINE ONLINE wbtdb1 Open
ora.wbtdb1.vip
1 ONLINE ONLINE wbtdb1
[root@wbtdb1 bin]#
發現節點2的資源都已經停止了。但是查看節點2 crs相關進程都還在,那是因為我們是crs運行正常是刪除的軟件目錄,軟件雖然刪除了,但是進程還未清掉。清掉crs進程后節點2就可以刪除了
▼▼▼
[root@wbtdb2 ~]# ps -ef|grep d.bin
root 2546 1 4 13:10 ? 00:03:22 /u01/11.2.0/grid/bin/ohasd.bin reboot
grid 3632 1 0 13:11 ? 00:00:06 /u01/11.2.0/grid/bin/oraagent.bin
grid 3643 1 0 13:11 ? 00:00:00 /u01/11.2.0/grid/bin/mdnsd.bin
grid 3673 1 0 13:11 ? 00:00:00 /u01/11.2.0/grid/bin/gpnpd.bin
grid 3683 1 0 13:11 ? 00:00:07 /u01/11.2.0/grid/bin/gipcd.bin
root 3685 1 0 13:11 ? 00:00:06 /u01/11.2.0/grid/bin/orarootagent.bin
root 3698 1 1 13:11 ? 00:00:46 /u01/11.2.0/grid/bin/osysmond.bin
root 3717 1 0 13:11 ? 00:00:02 /u01/11.2.0/grid/bin/cssdmonitor
root 3740 1 0 13:11 ? 00:00:02 /u01/11.2.0/grid/bin/cssdagent
grid 3751 1 0 13:11 ? 00:00:09 /u01/11.2.0/grid/bin/ocssd.bin
root 3974 1 0 13:11 ? 00:00:07 /u01/11.2.0/grid/bin/octssd.bin reboot
grid 3997 1 0 13:11 ? 00:00:07 /u01/11.2.0/grid/bin/evmd.bin
root 4408 1 30 13:12 ? 00:22:08 /u01/11.2.0/grid/bin/crsd.bin reboot
grid 4484 3997 0 13:12 ? 00:00:00 /u01/11.2.0/grid/bin/evmlogger.bin -o /u01/11.2.0/grid/evm/log/evmlogger.info -l /u01/11.2.0/grid/evm/log/evmlogger.log
grid 4519 1 27 13:12 ? 00:19:39 /u01/11.2.0/grid/bin/oraagent.bin
root 4525 1 22 13:12 ? 00:16:20 /u01/11.2.0/grid/bin/orarootagent.bin
grid 4712 1 0 13:12 ? 00:00:00 /u01/11.2.0/grid/bin/scriptagent.bin
grid 4814 1 0 13:12 ? 00:00:00 /u01/11.2.0/grid/bin/tnslsnr LISTENER -inherit
root 23792 5354 0 14:24 pts/0 00:00:00 grep d.bin
[root@wbtdb2 ~]# kill -9 2546 3632 3643 3673 3683 3685 3698 3717 3740 3751 3997 3974 4519 4525 4712 4814
[root@wbtdb2 ~]# ps -ef|grep d.bin
root 30270 5354 0 14:25 pts/0 00:00:00 grep d.bin
3.4 更新集群節點信息
▼▼▼
[grid@rac11g1 ~]$ cd $ORACLE_HOME/oui/bin
[grid@rac11g1 bin]$ echo $ORACLE_HOME
/u01/11.2.0/grid
$ ./runInstaller -updateNodeList ORACLE_HOME=Grid_home "CLUSTER_NODES={remaining_nodes_list}" CRS=TRUE -silent
▼▼▼
$ ./runInstaller -updateNodeList ORACLE_HOME=Grid_home "CLUSTER_NODES={rac1,rac3……}" CRS=TRUE -silent
▼▼▼
[grid@wbtdb1 bin]$ /u01/11.2.0/grid/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={wbtdb1}" CRS=TRUE -silent
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB. Actual 4095 MB Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oraInventory
UpdateNodeList was successful.
--檢查:
▼▼▼
[grid@wbtdb1 bin]$ olsnodes -s -t
wbtdb1 Active Unpinned
[grid@wbtdb1 bin]$
▼▼▼
[grid@wbtdb1 bin]$ cluvfy stage -post nodedel -n wbtdb2 -verbose
Performing post-checks for node removal
Checking CRS integrity...
Clusterware version consistency passed
The Oracle Clusterware is healthy on node "wbtdb1"
CRS integrity check passed
Result:
Node removal check passed
Post-check for node removal was successful.
[grid@racdb1 ~]$ crsctl status resource -t
[grid@wbtdb1 ~]$ crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCHDG.dg
ONLINE ONLINE wbtdb1
ora.DATADG.dg
ONLINE ONLINE wbtdb1
ora.LISTENER.lsnr
ONLINE ONLINE wbtdb1
ora.OCRVOTING.dg
ONLINE ONLINE wbtdb1
ora.asm
ONLINE ONLINE wbtdb1 Started
ora.gsd
OFFLINE OFFLINE wbtdb1
ora.net1.network
ONLINE ONLINE wbtdb1
ora.ons
ONLINE ONLINE wbtdb1
ora.registry.acfs
ONLINE ONLINE wbtdb1
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE wbtdb1
ora.cvu
1 ONLINE ONLINE wbtdb1
ora.oc4j
1 ONLINE ONLINE wbtdb1
ora.scan1.vip
1 ONLINE ONLINE wbtdb1
ora.wbtdb.db
1 ONLINE ONLINE wbtdb1 Open
ora.wbtdb1.vip
1 ONLINE ONLINE wbtdb1
[grid@wbtdb1 ~]$
至此集群中的節點2的信息完全清除完畢!
后面可以自行驗證保留的集群資源,以及實例狀態是否正常。
更多精彩干貨分享
點擊下方名片關注
IT那活兒
文章版權歸作者所有,未經允許請勿轉載,若此文章存在違規行為,您可以聯系管理員刪除。
轉載請注明本文地址:http://specialneedsforspecialkids.com/yun/129933.html
記一次Rac節點故障處理及恢復(下篇) img{ display:block; margin:0 auto !important; width:100%; } body{ width:75%; ...
閱讀 1346·2023-01-11 13:20
閱讀 1684·2023-01-11 13:20
閱讀 1132·2023-01-11 13:20
閱讀 1860·2023-01-11 13:20
閱讀 4100·2023-01-11 13:20
閱讀 2704·2023-01-11 13:20
閱讀 1385·2023-01-11 13:20
閱讀 3597·2023-01-11 13:20