環境:11.2.0.4 RAC linux-x86-64
顯示節點1無法響應ocr請求,但查看節點1集群alert日志,無任何報錯。
▼▼▼
2019-08-08 08:18:33.474:
[/u01/oracle/11.2.0.4/grid/bin/oraagent.bin(7471)]CRS-5818:Aborted command check for resource ora.LISTENER_SCAN1.lsnr. Details at (:CRSAGF00113:) {1:8722:25962}
in /u01/oracle/11.2.0.4/grid/log/db2/agent/crsd/oraagent_grid/oraagent_grid.log.
2019-08-08 08:21:33.504:
[/u01/oracle/11.2.0.4/grid/bin/oraagent.bin(7471)]CRS-5818:Aborted command check for resource ora.LISTENER_SCAN1.lsnr. Details at (:CRSAGF00113:) {1:8722:25962}
in /u01/oracle/11.2.0.4/grid/log/db2/agent/crsd/oraagent_grid/oraagent_grid.log.
2019-08-08 08:21:33.873:
[crsd(7362)]CRS-1008:Node db1 is not responding to OCR requests. Details in /u01/oracle/11.2.0.4/grid/log/db2/crsd/crsd.log.
2019-08-08 08:23:07.028:
[crsd(7362)]CRS-1008:Node db1 is not responding to OCR requests. Details in /u01/oracle/11.2.0.4/grid/log/db2/crsd/crsd.log.
2019-08-08 08:24:33.534:
[/u01/oracle/11.2.0.4/grid/bin/oraagent.bin(7471)]CRS-5818:Aborted command check for resource ora.LISTENER_SCAN1.lsnr. Details at (:CRSAGF00113:) {1:8722:25962}
in /u01/oracle/11.2.0.4/grid/log/db2/agent/crsd/oraagent_grid/oraagent_grid.log.
2019-08-08 08:24:40.196:
[crsd(7362)]CRS-1008:Node db1 is not responding to OCR requests. Details in /u01/oracle/11.2.0.4/grid/log/db2/crsd/crsd.log.
2019-08-08 08:24:56.120:
[/u01/oracle/11.2.0.4/grid/bin/orarootagent.bin(6818)]CRS-5018:(:CLSN00037:) Removed unused HAIP route: 169.254.95.0 / 255.255.255.0 / 0.0.0.0 / usb0
2019-08-08 08:26:13.355:
[crsd(7362)]CRS-1008:Node db1 is not responding to OCR requests. Details in /u01/oracle/11.2.0.4/grid/log/db2/crsd/crsd.log.
2019-08-08 08:27:33.573:
[/u01/oracle/11.2.0.4/grid/bin/oraagent.bin(7471)]CRS-5818:Aborted command check for resource ora.LISTENER_SCAN1.lsnr. Details at (:CRSAGF00113:) {1:8722:25962}
in /u01/oracle/11.2.0.4/grid/log/db2/agent/crsd/oraagent_grid/oraagent_grid.log.
2019-08-08 08:27:46.562:
[crsd(7362)]CRS-1008:Node db1 is not responding to OCR requests. Details in /u01/oracle/11.2.0.4/grid/log/db2/crsd/crsd.log.
2019-08-08 08:29:19.768:
[crsd(7362)]CRS-1008:Node db1 is not responding to OCR requests. Details in /u01/oracle/11.2.0.4/grid/log/db2/crsd/crsd.log.
2019-08-08 08:29:26.214:
[crsd(7362)]CRS-5825:Agent /u01/oracle/11.2.0.4/grid/bin/oraagent_grid is unresponsive and will be restarted. Details at (:CRSAGF00131:) {2:3004:2} in /u01/oracle
/11.2.0.4/grid/log/db2/crsd/crsd.log.
2019-08-08 08:29:26.270:
[crsd(7362)]CRS-5825:Agent /u01/oracle/11.2.0.4/grid/bin/orarootagent_root is unresponsive and will be restarted. Details at (:CRSAGF00131:) {2:3004:2} in /u01/or
acle/11.2.0.4/grid/log/db2/crsd/crsd.log.
2019-08-08 08:29:32.890:
[crsd(7362)]CRS-5825:Agent /u01/oracle/11.2.0.4/grid/bin/oraagent_oracle is unresponsive and will be restarted. Details at (:CRSAGF00131:) {2:3004:2} in /u01/orac
le/11.2.0.4/grid/log/db2/crsd/crsd.log.
2019-08-08 08:30:52.932:
[crsd(7362)]CRS-1008:Node db1 is not responding to OCR requests. Details in /u01/oracle/11.2.0.4/grid/log/db2/crsd/crsd.log.
2019-08-08 08:32:26.110:
[crsd(7362)]CRS-1008:Node db1 is not responding to OCR requests. Details in /u01/oracle/11.2.0.4/grid/log/db2/crsd/crsd.log.
日志顯示均為gipc相關錯誤,推測crsd無法啟動與gipc通信異常有關,排查分析網卡、路由問題;防火墻;selinux等問題,均未發現異常。進一步查看gipc日志。
▼▼▼
2019-08-08 09:06:37.851: [UiServer][1329567488] CS(0x7f91d400d240)set Properties ( grid,0x7f92181a90f0)
2019-08-08 09:06:37.862: [UiServer][1331668736]{2:3004:13874} Sending message to PE. ctx= 0x7f91e001d3c0, Client PID: 6787
2019-08-08 09:12:16.990: [GIPCHDEM][2006959872] gipchaDaemonProcessDisconnect: daemon has disconnected endp 0x7f9268009e20 [0000000000000179] { gipcEndpoint : local
Addr ipc, remoteAddr ipc://gipcd_db2, numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 6821, readyRef (nil), ready 0, wobj
0x7f926800be60, sendp 0x7f926800bca0flags 0xa61e, usrFlags 0x24020 }, hctx 0x1a36ec0 [0000000000000010] { gipchaContext : host db2, name 1183-8050-f823-3e2
1, luid 2b043d36-00000000, numNode 1, numInf 1, usrFlags 0x0, flags 0x5 }
2019-08-08 09:12:16.990: [GIPCHDEM][2006959872] gipchaDaemonWork: daemon thread exiting by request hctx 0x1a36ec0 [0000000000000010] { gipchaContext : host db
2, name 1183-8050-f823-3e21, luid 2b043d36-00000000, numNode 1, numInf 1, usrFlags 0x0, flags 0xd }
2019-08-08 09:12:16.990: [GIPCHDEM][2006959872] gipchaDaemonThreadEntry: daemon thread exiting state gipchaThreadStateDead (4)
2019-08-08 09:12:16.990: [GIPCXCPT][1962833664] gipchaInternalResolve: failed to resolve ret gipcretDaemonLost (34), host db1, port CLSFRAME_1, hctx 0x1a36
ec0 [0000000000000010] { gipchaContext : host db2, name 1183-8050-f823-3e21, luid 2b043d36-00000000, numNode 1, numInf 1, usrFlags 0x0, flags 0xd }, ret
gipcretDaemonLost (34)
2019-08-08 09:12:16.990: [GIPCHGEN][1962833664] gipchaResolveF [gipcmodGipcResolve : gipcmodGipc.c : 806]: EXCEPTION[ ret gipcretDaemonLost (34) ] failed to resolv
e ctx 0x1a36ec0 [0000000000000010] { gipchaContext : host db2, name 1183-8050-f823-3e21, luid 2b043d36-00000000, numNode 1, numInf 1, usrFlags 0x0, flags
0xd }, host db1, port CLSFRAME_1, flags 0x0
2019-08-08 09:12:16.991: [GIPCXCPT][1962833664] gipcmodGipcConnect: cannot connect to address that failed internal resolve 0x7f920c0085d0 [0000000001d09778] { gipcA
ddress : name gipcha://db1:CLSFRAME_1, objFlags 0x0, addrFlags 0x1 }, ret gipcretAddressNotFound (21)
2019-08-08 09:12:16.991: [GIPCXCPT][1962833664] gipchaInternalEndpointClose: endpoint close failed due to failure in worker thread 0x7f920c00af20 [0000000001d09789]
{ gipchaEndpoint : port 0a6b-7001-e948-0386, peer :, srcCid 00000000-00000000, dstCid 00000000-00000000, numSend 0, maxSend 100, groupListType 1, hagroup 0x1c
b1c70, usrFlags 0x4000, flags 0x0 }, ret gipcretDaemonLost (34)
2019-08-08 09:12:16.991: [GIPCHGEN][1962833664] gipchaEndpointCloseF [gipcmodGipcDisconnect : gipcmodGipc.c : 925]: EXCEPTION[ ret gipcretDaemonLost (34) ] failed
to close endpoint ctx 0x1a36ec0 [0000000000000010] { gipchaContext : host db2, name 1183-8050-f823-3e21, luid 2b043d36-00000000, numNode 1, numInf 1, usr
Flags 0x0, flags 0xd }, endp 0x7f920c00af20 [0000000001d09789] { gipchaEndpoint : port 0a6b-7001-e948-0386, peer :, srcCid 00000000-00000000, dstCid 00000000-0
0000000, numSend 0, maxSend 100, groupListType 1, hagroup 0x1cb1c70, usrFlags 0x4000, flags 0x0 }, flags 0x0
2019-08-08 09:12:16.991: [GIPCTRAC][1962833664] gipcConnectF [clsCclGipcConnect : clsCclCommHandler.c : 3966]: EXCEPTION[ ret gipcretAddressNotFound (21) ] failed
connect endp 0x7f920c007a80 [0000000001d09774] { gipcEndpoint : localAddr gipcha://db2:0a6b-7001-e948-0386, remoteAddr gipcha://db1:CLSFRAME_1, numPen
d 0, numReady 1, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj 0x7f920c009520, sendp (nil)flags 0xb860a, usrFlags 0x20
020 }, addr 0x7f920c00aa10 [0000000001d0977b] { gipcAddress : name gipcha://db1:CLSFRAME_1, objFlags 0x0, addrFlags 0x0 }, cookie 0x7f9214009f50, flags 0x0
2019-08-08 09:12:16.991: [ CRSCCL][1962833664]clsCclGipcConnect: gipcConnect() failed to connect. rc= 21.
2019-08-08 09:12:16.991: [ CRSCCL][1962833664]clsCclEstablishP2PConn: Failed to establish connection with node db1
節點2 gipc進程日志:
▼▼▼
2019-08-08 08:22:33.642: [GIPCDCLT][3304625920] gipcdClientThread: req from local client of type gipcdmsgtypeLookupName, endp 0000000000000742
2019-08-08 08:22:33.642: [GIPCDCLT][3304625920] gipcdClientLookupName: Received type(gipcdmsgtypeLookupName), endp(0000000000000742), len(1032), buf(0x7f05bc0905c8):[hostname(db2), portstr: (0a6b-7001-e948-0386), haname(), retStatus(gipcretSuccess)]
2019-08-08 08:22:33.642: [ GIPCLIB][3304625920] gipclibMapSearch: gipcMapSearch() -> gipcMapGetNodeAddr() failed: ret:gipcretKeyNotFound (36), ht:0x7f05bc02eb50, idxPtr:0x7f05bc030460, key:0x7f05c4f87b10, flags:0x0
2019-08-08 08:22:33.642: [GIPCDCLT][3304625920] gipcdClientThread: Sent req to local client on endp 0000000000000742:(type(gipcdmsgtypeLookupNameAck), endp(0000000000000742), len(1032), buf(0x7f05bc26c440):[hostname(db2), portstr: (0a6b-7001-e948-0386), haname(), retStatus(gipcretKeyNotFound)])
2019-08-08 08:22:33.643: [GIPCDCLT][3304625920] gipcdClientThread: req from local client of type gipcdmsgtypeCreateName, endp 0000000000000742
2019-08-08 08:22:33.643: [GIPCDCLT][3304625920] gipcdClientCreateName: Received type(gipcdmsgtypeCreateName), endp(0000000000000742), len(1032), buf(0x7f05bc0905c8):[hostname(db2), portstr: (0a6b-7001-e948-0386), haname(1183-8050-f823-3e21), retStatus(gipcretSuccess)]
2019-08-08 08:22:33.643: [GIPCDCLT][3304625920] gipcdClientThread: Sent req to local client on endp 0000000000000742:(type(gipcdmsgtypeCreateNameAck), endp(0000000000000742), len(1032), buf(0x7f05bc26c440):[hostname(), portstr: (), haname(), retStatus(gipcretSuccess)])
2019-08-08 08:22:33.644: [GIPCDCLT][3304625920] gipcdClientThread: req from local client of type gipcdmsgtypeLookupName, endp 0000000000000742
2019-08-08 08:22:33.644: [GIPCDCLT][3304625920] gipcdClientLookupName: Received type(gipcdmsgtypeLookupName), endp(0000000000000742), len(1032), buf(0x7f05bc0905c8):[hostname(db1), portstr: (CLSFRAME_1), haname(), retStatus(gipcretSuccess)]
2019-08-08 08:22:33.644: [ GIPCLIB][3304625920] gipclibMapSearch: gipcMapSearch() -> gipcMapGetNodeAddr() failed: ret:gipcretKeyNotFound (36), ht:0x7f05bc02eb50, idxPtr:0x7f05bc0525c0, key:0x7f05c4f87b10, flags:0x0
2019-08-08 08:22:33.644: [GIPCDCLT][3304625920] gipcdEnqueueMsgForNode: Enqueuing for NodeThread (gipcdReqTypeLookupName)
2019-08-08 08:22:33.644: [GIPCDNDE][3302524672] gipcdProcessClientRequest: Dequeued req for host (db1), type(gipcdReqTypeLookupName), id (0000000000000742, 0000000000000000)
2019-08-08 08:22:33.644: [GIPCDNDE][3302524672] gipcdSendReq: recvd msg clnt header: (req:[hostname(db1), id (0000000000000742, 0000000000000000), len(352), req cookie(3155644168), type(gipcdReqTypeLookupName)])
2019-08-08 08:22:33.644: [ GIPCLIB][3302524672] gipclibMapSearch: gipcMapSearch() -> gipcMapGetNodeAddr() failed: ret:gipcretKeyNotFound (36), ht:0x7f05b4007a10, idxPtr:0x7f05b4026f60, key:0x7f05bc26d2d8, flags:0x0
2019-08-08 08:22:33.646: [ GIPCLIB][3302524672] gipclibMapSearch: gipcMapSearch() -> gipcMapGetNodeAddr() failed: ret:gipcretKeyNotFound (36), ht:0x7f05b4007a10, idxPtr:0x7f05b4026f60, key:0x7f05bc26d2d8, flags:0x0
節點1 gipc進程日志:
▼▼▼
2019-08-08 08:23:05.717: [GIPCDCLT][193832704] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 000000000000032d
2019-08-08 08:23:06.130: [GIPCDCLT][193832704] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 0000000000000b46
2019-08-08 08:23:06.185: [GIPCDCLT][193832704] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 000000000000046c
2019-08-08 08:23:06.584: [ CLSINET][189630208] Returning NETDATA: 1 interfaces
2019-08-08 08:23:06.584: [ CLSINET][189630208] # 0 Interface eth7,ip=172.16.100.49,mac=08-94-ef-68-7d-5b,mask=255.255.255.0,net=172.16.100.0,use=cluster_interconnect
2019-08-08 08:23:06.719: [GIPCHALO][187528960] gipchaLowerDropMsg: dropping because of sequence timeout, waited 30060, msg 0x7f65f4054e88 { len 1160, seq 4, type gipchaHdrTypeRecvEstablish (5), lastSeq 0, lastAck 0, minAck 3, flags 0x1, srcLuid 7216c33f-91fd2a9d, dstLuid 00000000-00000000, msgId 4 }, node 0x7f65e81f3230 { host db2, haName gipcd_ha_name, srcLuid d3211c9e-689d3940, dstLuid 7216c33f-91fd2a9d numInf 1, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [31 : 31], createTime 3173499080, sentRegister 1, localMonitor 1, flags 0x0 }
2019-08-08 08:23:06.719: [GIPCHALO][187528960] gipchaLowerDropMsg: dropping because of sequence timeout, waited 30060, msg 0x7f65f4026d08 { len 1160, seq 4, type gipchaHdrTypeRecvEstablish (5), lastSeq 0, lastAck 0, minAck 3, flags 0x1, srcLuid 7216c33f-91fd2a9d, dstLuid 00000000-00000000, msgId 4 }, node 0x7f65e81f3230 { host db2, haName gipcd_ha_name, srcLuid d3211c9e-689d3940, dstLuid 7216c33f-91fd2a9d numInf 1, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [31 : 31], createTime 3173499080, sentRegister 1, localMonitor 1, flags 0x0 }
2019-08-08 08:23:06.719: [GIPCHALO][187528960] gipchaLowerDropMsg: dropping because of sequence timeout, waited 30060, msg 0x7f65f4020f88 { len 1160, seq 4, type gipchaHdrTypeRecvEstablish (5), lastSeq 0, lastAck 0, minAck 3, flags 0x1, srcLuid 7216c33f-91fd2a9d, dstLuid 00000000-00000000, msgId 4 }, node 0x7f65e81f3230 { host db2, haName gipcd_ha_name, srcLuid d3211c9e-689d3940, dstLuid 7216c33f-91fd2a9d numInf 1, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [31 : 31], createTime 3173499080, sentRegister 1, localMonitor 1, flags 0x0 }
2019-08-08 08:23:07.720: [GIPCHALO][187528960] gipchaLowerDropMsg: dropping because of sequence timeout, waited 30060, msg 0x7f65f40522c8 { len 1160, seq 5, type gipchaHdrTypeRecvEstablish (5), lastSeq 0, lastAck 0, minAck 4, flags 0x1, srcLuid 7216c33f-91fd2a9d, dstLuid 00000000-00000000, msgId 5 }, node 0x7f65e81f3230 { host db2, haName gipcd_ha_name, srcLuid d3211c9e-689d3940, dstLuid 7216c33f-91fd2a9d numInf 1, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [32 : 32], createTime 3173499080, sentRegister 1, localMonitor 1, flags 0x0 }
2019-08-08 08:23:07.720: [GIPCHALO][187528960] gipchaLowerDropMsg: dropping because of sequence timeout, waited 30060, msg 0x7f65f4029598 { len 1160, seq 5, type gipchaHdrTypeRecvEstablish (5), lastSeq 0, lastAck 0, minAck 4, flags 0x1, srcLuid 7216c33f-91fd2a9d, dstLuid 00000000-00000000, msgId 5 }, node 0x7f65e81f3230 { host db2, haName gipcd_ha_name, srcLuid d3211c9e-689d3940, dstLuid 7216c33f-91fd2a9d numInf 1, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [32 : 32], createTime 3173499080, sentRegister 1, localMonitor 1, flags 0x0 }
查看MOS發現以下文檔與我們的現象匹配,是BUG導致。
CRSD not starting on non-first node: gipchaLowerDropMsg: dropping because of sequence timeout (文檔 ID 2189195.1)
以上文章完全匹配我們的問題現象。
MOS提供的臨時workround方案是啟停兩個節點,但這需要停業務。
為了不停業務并快速恢復使用,根據經驗我們采取了如下措施:
kill節點1 gipcd.bin進程,由于gipc進程僅負責節點間通信的,殺掉后會自啟,不會影響數據庫狀態,集群evm及crsd資源狀態可能會切換一下(不影響)。殺掉后節點2可以正常啟動。
apply patch 17452638
文章版權歸作者所有,未經允許請勿轉載,若此文章存在違規行為,您可以聯系管理員刪除。
轉載請注明本文地址:http://specialneedsforspecialkids.com/yun/129884.html
摘要:問題九庫控制文件擴展報錯庫的擴展報錯,用的是裸設備,和還是原來大小,主庫的沒有報錯,并且大小沒有變,求解釋。專家解答從報錯可以看出,控制文件從個塊擴展到個塊時報錯,而裸設備最大只支持個塊,無法擴展,可以嘗試將參數改小,避免控制文件報錯。 鏈接描述引言 近期我們在DBASK小程序新關聯了運維之美、高端存儲知識、一森咖記、運維咖啡吧等數據領域的公眾號,歡迎大家閱讀分享。 問答集萃 接下來,...
閱讀 1346·2023-01-11 13:20
閱讀 1684·2023-01-11 13:20
閱讀 1132·2023-01-11 13:20
閱讀 1858·2023-01-11 13:20
閱讀 4100·2023-01-11 13:20
閱讀 2704·2023-01-11 13:20
閱讀 1385·2023-01-11 13:20
閱讀 3597·2023-01-11 13:20