国产xxxx99真实实拍_久久不雅视频_高清韩国a级特黄毛片_嗯老师别我我受不了了小说

資訊專欄INFORMATION COLUMN

雙節(jié)點RAC實例2 HANG 故障分析一例

IT那活兒 / 1857人閱讀
雙節(jié)點RAC實例2 HANG 故障分析一例

一、環(huán)境:

操作系統(tǒng)為:AIX 5.3

數(shù)據(jù)庫版本:Oracle 10.2.0.5


二、第一現(xiàn)場:

1.主機(jī)能夠登入,系統(tǒng)負(fù)載很低

2.數(shù)據(jù)庫實例2登錄不進(jìn)去,實例1能正常提供服務(wù)

3.業(yè)務(wù)不受到影響



三、故障處理:


1、基本狀態(tài)查看

首先查看下主機(jī)情況:

$ uptime

 06:06PM  up 232 days,  19:31,  6 users, load average: 6.15, 6.71, 6.69

 節(jié)點2已經(jīng)運行232天,平均負(fù)載6點多,相對比較輕,最近時刻的負(fù)載也沒有超過7,因此主機(jī)狀態(tài)基本正常。接下來查看是否有大量換頁出現(xiàn):


$ vmstat 3 10

 

System configuration: lcpu=16mem=79360MB

 

kthr   memory              page              faults        cpu   

----- ----------------------------------- ------------ -----------

 r b   avm   fre re  pi  po fr   sr  cy in   sy  cs us sy id wa

 6  04932557 13865294   0   0   0   0   0   0 246 273723 268060 17 1172  0

 5  04932540 13865311   0   0   0   0   0   0 279 265072 263893 17 1172  0

 5  04932503 13865349   0   0   0   0   0   0 311 265326 264567 17 1172  0

 5  04932511 13865340   0   0   0   0   0   0 597 340525 264203 18 1270  0

 8  04932523 13865328   0   0   0   0   0   0 327 265519 260997 17 1172  0

…….



我們同樣發(fā)現(xiàn),CPU內(nèi)存都相對空閑,沒有換頁發(fā)生。

$ ps -ef|grep ora|wc -l

 

    367


數(shù)據(jù)庫進(jìn)程300多,對于16CPU,80G內(nèi)存的主機(jī)系統(tǒng)來說,一切似乎都很正常。

 

 

 

2、信息搜集

鑒于常規(guī)的數(shù)據(jù)庫登錄手段無法進(jìn)入實例2,無法得知數(shù)據(jù)庫狀態(tài),因此采用Oracle10g開始提供的“后門“,進(jìn)入數(shù)據(jù)庫進(jìn)行信息搜集:

$ sqlplus -prelim / as sysdba

 

SQL*Plus: Release 10.2.0.5.0 -Production on Thu Jun 14 18:01:13 2012

 

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

 

SQL>

SQL> oradebug hanganalyze 3

Hang Analysis in /oracle/admin/bxxx/udump/bxxx2_ora_1011948.trc

SQL> oradebug setmypid

Statement processed.

SQL> oradebug dump systemstate 266

Statement processed.

SQL> set time on

18:02:57 SQL> set timing on

18:03:01 SQL>  oradebug dump systemstate 266

Statement processed.

18:04:13 SQL> oradebug hanganalyze5

Hang Analysis in /oracle/admin/bxxx/udump/bxxx2_ora_1011948.trc

18:04:24 SQL>

18:04:51 SQL>  oradebug dump systemstate 266

Statement processed.

18:07:12 SQL> exit

Disconnected from ORACLE


由于其余數(shù)據(jù)庫實例正常,因此信息搜集我們僅限于本實例,以免影響到其他實例的正常運行。



3、實例重啟
信息搜集完畢,在征得應(yīng)用部門及DBA確認(rèn)后,重新啟動實例2。

先強(qiáng)制關(guān)閉:

$ sqlplus -prelim / as sysdba

 

SQL*Plus: Release 10.2.0.5.0 - Productionon Thu Jun 14 18:10:11 2012

 

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

 

SQL> shutdown abort

ORACLE instance shut down.

SQL> exit

Disconnected from ORACLE

 

再正常啟動:

$ sqlplus / as sysdba

 

SQL*Plus: Release 10.2.0.5.0 - Productionon Thu Jun 14 18:10:34 2012

 

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

 

Connected to an idle instance.

 

SQL> startup

ORACLE instance started.

 

Total System Global Area 1610612736bytes

Fixed Size                  2096736 bytes

Variable Size            1308623264 bytes

Database Buffers          285212672 bytes

Redo Buffers               14680064 bytes

Database mounted.

Database opened.

實例重啟后,所有應(yīng)用能正常連接到實例2。

接下來進(jìn)行原因的深入分析,以防同樣的故障再次出現(xiàn)。

4、故障分析

 

首先我們來看hanganalyze的信息輸出:

*** 2012-06-14 18:01:25.716

==============

HANG ANALYSIS:

==============

Found 177 objects waitingfor

<1/1014/60449/0x6fbfedb8/925832/SGA:allocation forcing componen>

Found 33 objects waiting for

<1/1093/1/0x6fbe9868/295728/NoWait>

Open chains found:

Chain 1 : :

<1/1093/1/0x6fbe9868/295728/NoWait>

 -- <1/1014/60449/0x6fbfedb8/925832/SGA:allocation forcing componen>

 -- <1/674/1309/0x6fc258d8/689108/librarycache load lock>

本次的故障原因從這個輸出就一目了然,Oracle進(jìn)行內(nèi)部內(nèi)存自動分配時,遲遲沒有結(jié)束,導(dǎo)致177個對象等待著新的內(nèi)存空間。

進(jìn)一步的從systemdump輸出:

 

…….
waiting for SGA: allocation forcing component growthwait_time=0, seconds since wait started=3
….

SO: 700000050f2a528, type: 50, owner:70000006d500910, flag: INIT/-/-/0x00

     row cache enqueue: count=1 session=70000006d45d6e0object=700000058c461b8, mode=S

     savepoint=0x7a5dd06

     row cache parent object: address=700000058c461b8cid=8(dc_objects)

     hash=875b7c55 typ=11 transaction=0 flags=00008000

     own=700000058c46288[700000050f2a558,700000050f2a558]

……

可以看出,由于Oracle在進(jìn)行內(nèi)存分配時,將一些對象從內(nèi)存中踢出去了,新的SQL解析時,需要裝載新的對象,然而沒有可用內(nèi)存空間,因此數(shù)據(jù)庫實例hang住。

 

 

根據(jù)以往經(jīng)驗,發(fā)生這種等待,通常是采用了Oracle的ASMM(自動共享內(nèi)存管理),即SGA自動管理引起。

從實例2的告警日志alert_bxxx2.log,我們發(fā)現(xiàn):

Thu Jun 14 18:10:38 BEIST 2012

Starting ORACLE instance (normal)

sskgpgetexecname failed to get name

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Interface type 1 en1 172.16.1.0configured from OCR for use as a cluster interconnect

Interface type 1 en8 10.153.246.128configured from OCR for use as  a publicinterface

Picked latch-free SCN scheme 3

Using LOG_ARCHIVE_DEST_1 parameterdefault value as /oracle/product/10.2.0/db/dbs/arch

LICENSE_MAX_USERS = 0

SYS auditing is disabled

ksdpec: called for event 13740 priorto event group initialization

Starting up ORACLE RDBMS Version:10.2.0.5.0.

System parameters with non-defaultvalues:

 processes                = 1000

…….

 nls_territory            = CHINA

  sga_target               = 1610612736

 control_files            =/oradata1/control01.ctl, /oradata2/control02.ctl, /oradata3/control03.ctl

 db_block_size            = 8192

 __db_cache_size          =285212672

 compatible               =10.2.0.5.0

 db_files                 = 2000

 db_file_multiblock_read_count= 16

 cluster_database         = TRUE

 cluster_database_instances= 2

 thread                   = 2

…….

db_name                  = bxxx

 open_cursors             = 300

 pga_aggregate_target     =8311013376

 

確實采用了SGA自動管理(sga_target不為0)。

且總共80G的物理內(nèi)存,只分配給Oracle SGA 2G不到,而本機(jī)唯一的任務(wù)就是確保Oracle系統(tǒng)正常。

5故障分析建議

 

結(jié)合本次故障發(fā)生的原因及新炬在移動行業(yè)維護(hù)經(jīng)驗,我們給出如下三點配置修改建議:

1.將SGA管理改為手動。

2.數(shù)據(jù)庫會話在400個左右事,將SGA從2G增加到40G,其中:

shared_pool_size=1G

db_cache_size=35G

SGA_MAX_SIZE=40G

SGA_TARGET=0

3.關(guān)閉DRM特性:

_gc_affinity_time=0                                  # Only if DBversion is 10.1 or 10.2

_gc_undo_affinity=FALSE                       # Only if Db version is10.2

 

注:上述建議都需要停止數(shù)據(jù)庫,因此需安排計劃性停機(jī)。

 

同時,我們開發(fā)了數(shù)據(jù)庫實例hang住時的自動采集腳本,供數(shù)據(jù)庫發(fā)生hang住的情況下,快速搜集相關(guān)信息(存成文件放到Oracle用戶下的目錄即可):

#

#auto_hang_analyze.sh

#created by shsnc @20120614

#

#any question please sendemailto:master@shsnc.com

#

#!/usr/bin/ksh

 

. ~/.profile

 

#自動終止上次運行的進(jìn)程

 

for line in `ps -ef|grepauto_hang_analyze.sh|grep -v grep|awk {print $2}`

 

do

 

      for line1 in `ps -ef|grep $line|grep sqlplus|awk {print $2}`

 

      do

 

         for line2 in `ps -ef|grep $line1|grepLOCAL=YES|awk {print $2}`

 

         do

 

           echo $line2

 

           kill -9 $line2

 

         done

 

      done

 

done

 

 

 

count=`ps -ef |grep $0 |grep -v grep|wc -l`

 

if [[ count -gt 2 ]]; then

 

       echo $0 already running!

 

       exit

 

fi

 

echo `date`

 

# 判斷是否需要進(jìn)行自動執(zhí)行hang analyze

 

sqlplus -prelim  / as sysdba <

 

set feedback off

 

set termout off;

 

ttitle off;

 

btitle off;

 

set heading off

 

set timing off;

 

set verify off;

 

set echo off;

 

spool session_event_cnt.out

 

select get_event_cnt from dual;

 

spool off;

 

exit

 

EOF

 

event_cnt=`grep -v SQLsession_event_cnt.out|awk {print $1}`

 

if [[ event_cnt -lt 30 ]]; then

 

       echo no need hang analyze!

 

       exit

 

fi

 

#進(jìn)行hang analyze

 

 

 

sqlplus -prelim / as sysdba<

 

set feedback off

 

set termout off;

 

ttitle off;

 

btitle off;

 

set heading off

 

set timing off;

 

set verify off;

 

set echo off;

 

spool hang_analyze.out

 

oradebug setmypid

 

oradebug unlimit;

 

oradebug hanganalyze 3;

oradebug dump systemstate  1;

 

 

 

spool off;

 

exit

 

EOF

 

#取hang analyze 文件名

 

hanganalyze_file=`grep -i -E HangAnalysis hang_analyze.out|awk {print $4}`

 

if test -z "$hanganalyze_file";then

 

 exit

 

fi

 

echo $hanganalyze_file

 

#生成自動kill腳本

 

awk{if(index($0,"Found")>0) {printf"%s ",$0} else {print$0}} $hanganalyze_file|grep Found|awk {if($2>20) {{split($7,A,"/")}{ print "ps -ef|grep "A[5]"|grepLOCAL=NO|awk 47{print "

 

-9 "$2}47|xargs kill"}}}>kill_hang_process.sh

 

 

 

#執(zhí)行自動終止腳本并備份腳本

 

if [ `cat kill_hang_process.sh|wc -l`-gt 0 ]

 

then

 

sh kill_hang_process.sh

 

echo $hanganalyze_file>>kill_hang_process.sh

 

cp kill_hang_process.sh ./ak_log/kill_hang_process.sh.`date+%b_%d_%H_%M_%S`

 

fi

 

echo `date`



文章版權(quán)歸作者所有,未經(jīng)允許請勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請注明本文地址:http://specialneedsforspecialkids.com/yun/130246.html

相關(guān)文章

發(fā)表評論

0條評論

IT那活兒

|高級講師

TA的文章

閱讀更多
最新活動
閱讀需要支付1元查看
<