在一個風和日麗陽光明媚的早上,剛到公司屁股還沒有坐熱,就接到報告:XX系統一條SQL執行效率很低,趕緊上去經過一番犀利操作,啪啪啪敲下熟悉的命令,獲取SQL信息如下:
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID dkusf44y9g1yv, child number 0
-------------------------------------
SELECT A.NO_ID,A.BILL_ID1 FROM MM_USER_INFO A, MM_ATOBSTESTSTATE_INFO B
WHERE A.NO_ID=B.NO_ID AND B.DONETIME < ADD_MONTHS(SYSDATE,-6) AND
B.RUN_IDNO=W AND A.KKK_SERVICE_ID=1111
Plan hash value: 3801554394 ------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 64561 (100)| | | |
|* 1 | HASH JOIN | | 255K| 11M| 9744K| 64561 (2)| 00:12:55 | | |
| 2 | TABLE ACCESS BY GLOBAL INDEX ROWID| MM_USER_INFO | 255K| 6741K| | 26342 (1)| 00:05:17 | ROWID | ROWID |
|* 3 | INDEX RANGE SCAN | IDX_MMUSER_SERVICEID | 255K| | | 649 (1)| 00:00:08 | | |
| 4 | PARTITION RANGE ALL | | 475K| 8821K| | 37038 (2)| 00:07:25 | 1 | 17 |
|* 5 | TABLE ACCESS FULL | MM_ATOBSTESTSTATE_INFO | 475K| 8821K| | 37038 (2)| 00:07:25 | 1 | 17 |
--------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."NO_ID"="B"."NO_ID")
3 - access("A"."KKK_SERVICE_ID"=1111)
5 - filter(("B"."RUN_IDNO"=W AND "B"."DONETIME" 26 rows selected.
|
從執行計劃上大體可以看出,這條語句走HASHJOIN,主要慢在ID=2和ID=5。值得注意一點:這里的執行計劃是指標都是估算的,估算的東西就是可能不準確,特別是執行計劃有問題的時候,正是因為各種不準的指標才導致執行計劃走的不好,那么我們要分析問題的根源,最好可以通過PredicateInformation里的條件去計算真實的Rows,或用gather_plan_statistics或altersession setstatistics_level=all去看A-Rows,A-Time等信息,從而準確判斷慢在哪一步,最終找出慢的根源。
通過上述理論指導,下面趕緊看下真實執行計劃(這里有真實的Starts,A-Rows等):
Planhash value: 3801554394
------------------------------------------------------------------------------------------------------------------------------------------------------------
|Id | Operation | Name |Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 842 |00:01:18.55 | 670K| 570K| | | |
|* 1 | HASH JOIN | | 1 | 255K| 842 |00:01:18.55 | 670K| 570K| 130M| 13M| 126M (0)|
| 2 | TABLE ACCESS BY GLOBAL INDEX ROWID| UR_USER_INFO | 1 | 255K| 2113K|00:00:41.52 | 299K| 200K| | | |
|* 3 | INDEX RANGE SCAN | IDX_USERINFO_SERV_ID | 1 | 255K| 2113K|00:00:07.14 | 12844 | 12843 | | | |
| 4 | PARTITION RANGE ALL | | 1 | 475K| 842 |00:00:34.24 | 370K| 370K| | | |
|* 5 | TABLE ACCESS FULL | UR_CRMTOBOSSSTATE_INFO | 17 | 475K| 842 |00:00:34.23 | 370K| 370K| | | |
------------------------------------------------------------------------------------------------------------------------------------------------------------
PredicateInformation (identified by operation id):
---------------------------------------------------
1- access("A"."ID_NO"="B"."ID_NO")
3- access("A"."MASTER_SERV_ID"=2063)
5- filter(("B"."RUN_CODE"=W AND"B"."OP_TIME"
通過查看真實執行計劃,一眼定位出的確慢在ID=2和ID=5兩個步驟,其中ID=2回表慢,從索引查找7s回表變為41s(也就是回表操作需要34s),而ID=5是全表慢。其中ID=5的估算行數是47.5w行,實際上只有842行(這就是估算的行和真實行差別超大,一般統計信息不準),因此適合建立索引,通過下面分析適合建立組合索引。
SQL> select count(*) from MM_ATOBSTESTSTATE_INFO
2 where DONETIME < ADD_MONTHS(SYSDATE, -6);
COUNT(*)
----------
14539090
SQL> select/*+parallel(16)*/ count(*) from MM_ATOBSTESTSTATE_INFO
2 where RUN_IDNO = W;
COUNT(*)
----------
59675
SQL> select/*+parallel(16)*/ count(*) from MM_ATOBSTESTSTATE_INFO
2 where RUN_IDNO = W and DONETIME < ADD_MONTHS(SYSDATE,-6);
COUNT(*)
----------
842
|
索引創建如下:
CREATE INDEX "OPERTIADM"."IDX1_MM_ATOBSTESTSTATE_INFO" ON "OPERTIADM"."MM_ATOBSTESTSTATE_INFO" ("RUN_IDNO","DONETIME")
LOCAL TABLESPACE "TBS_IDX_TKO" parallel 16;
alter index "OPERTIADM"."IDX1_MM_ATOBSTESTSTATE_INFO" noparallel;
|
然后令人郁悶的事情發生了,執行計劃,它竟然沒有變:
Execution Plan
----------------------------------------------------------
Plan hash value: 3801554394
--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 255K| 11M| | 64561 (2)| 00:12:55 | | |
|* 1 | HASH JOIN | | 255K| 11M| 9744K| 64561 (2)| 00:12:55 | | |
| 2 | TABLE ACCESS BY GLOBAL INDEX ROWID| MM_USER_INFO | 255K| 6741K| | 26342 (1)| 00:05:17 | ROWID | ROWID |
|* 3 | INDEX RANGE SCAN | IDX_MMUSER_SERVICEID | 255K| | | 649 (1)| 00:00:08 | | |
| 4 | PARTITION RANGE ALL | | 475K| 8821K| | 37038 (2)| 00:07:25 | 1 | 17 |
|* 5 | TABLE ACCESS FULL | MM_ATOBSTESTSTATE_INFO | 475K| 8821K| | 37038 (2)| 00:07:25 | 1 | 17 |
--------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."NO_ID"="B"."NO_ID")
3 - access("A"."KKK_SERVICE_ID"=1111)
5 - filter("B"."RUN_IDNO"=W AND "B"."DONETIME"<="" add_months(sysdate@!,-6))
|
回頭一想也正常,執行計劃未變,主要是ID=5的cardinality估算不準確。既然cardinality不準確,那么就收集統計信息,收集完統計信息的執行計劃如下:
Plan hash value: 1403561594 ---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 0 | | 0 |00:00:00.01 | 0 | 0 |
|* 1 | TABLE ACCESS BY LOCAL INDEX ROWID | MM_USER_INFO | 1 | 1 | 842 |00:00:10.81 | 302K| 5 |
| 2 | NESTED LOOPS | | 1 | 1349K| 1685 |00:00:10.81 | 301K| 5 |
| 3 | PARTITION RANGE SUBQUERY | | 1 | 1349K| 842 |00:00:10.80 | 300K| 5 |
| 4 | TABLE ACCESS BY LOCAL INDEX ROWID| MM_ATOBSTESTSTATE_INFO | 6 | 1349K| 842 |00:00:00.01 | 842 | 0 |
|* 5 | INDEX RANGE SCAN | IDX1_MM_ATOBSTESTSTATE_INFO | 6 | 1349K| 842 |00:00:00.01 | 77 | 0 |
| 6 | PARTITION RANGE ITERATOR | | 842 | 1 | 842 |00:00:00.01 | 1741 | 0 |
|* 7 | INDEX UNIQUE SCAN | PK_MM_USER_INFO | 842 | 1 | 842 |00:00:00.01 | 1741 | 0 |
---------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("A"."KKK_SERVICE_ID"=1111)
5 - access("B"."RUN_IDNO"=W AND "B"."DONETIME" 7 - access("A"."NO_ID"="B"."NO_ID")
|
現在走NESTEDLOOPS了,而且使用了剛建立的索引,并且原來慢的ID=2變成被驅動了,走UNIQUESCAN,但是還是需要10s多,效率沒有啥提升,通過執行計劃可以看出主要慢在 PARTITIONRANGE SUBQUERY上,這個查詢轉換(QueryTransformation)的玩意,沒有啥好辦法了,只能關閉這個功能對應的參數,當然,最好的是使用SQLPROFILE綁定到這條語句,在語句級進行修改,類似加了下面的HINTS:
SELECT/*+leading(b) use_nl(a) index(b IDX1_MM_ATOBSTESTSTATE_INFO)
opt_param(_subquery_pruning_enabled false)*/ A.NO_ID, A.BILL_ID1
FROM MM_USER_INFO A, MM_ATOBSTESTSTATE_INFO B WHERE A.NO_ID = B.NO_ID
AND B.DONETIME < ADD_MONTHS(SYSDATE, -6) AND B.RUN_IDNO = W
AND A.KKK_SERVICE_ID = 1111
|
然后用上述加了HINTS的SQL執行計劃,綁定到原SQL,最終SQL執行計劃如下:
Plan hash value: 4290111086 ------------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 0 | | 0 |00:00:00.01 | 0 | |* 1 | TABLE ACCESS BY LOCAL INDEX ROWID | MM_USER_INFO | 1 | 1 | 842 |00:00:00.02 | 3436 | | 2 | NESTED LOOPS | | 1 | 1349K| 1685 |00:00:00.02 | 2594 | | 3 | PARTITION RANGE ALL | | 1 | 1349K| 842 |00:00:00.01 | 853 | | 4 | TABLE ACCESS BY LOCAL INDEX ROWID| MM_ATOBSTESTSTATE_INFO | 17 | 1349K| 842 |00:00:00.01 | 853 | |* 5 | INDEX RANGE SCAN | IDX1_MM_ATOBSTESTSTATE_INFO | 17 | 1349K| 842 |00:00:00.01 | 88 | | 6 | PARTITION RANGE ITERATOR | | 842 | 1 | 842 |00:00:00.01 | 1741 | |* 7 | INDEX UNIQUE SCAN | PK_MM_USER_INFO | 842 | 1 | 842 |00:00:00.01 | 1741 | ------------------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("A"."KKK_SERVICE_ID"=1111) 5 - access("B"."RUN_IDNO"=W AND "B"."DONETIME" 7 - access("A"."NO_ID"="B"."NO_ID")
|
非常完美,使用到我們建立的索引,并且消除了PARTITIONRANGESUBQUERY,執行效率由原來的10s多降低為0.01s,效率提升上千倍。SQL優化的方式N種,唯有準確找到問題根源才能快速解決,這里我通過分析真實的執行計劃快速找到問題的ROOTCAUSE,從而解決之。
總結下這個案例:先通過分析得知需要建立索引,但是建立索引后執行計劃未變,發現是cardinality估算不準,那么收集統計信息,收集完畢后,走索引和NL,但是卻出現了PARTITIONRANGE SUBQUERY影響效率,通過收集統計信息等方式已經解決不了,那么只能在語句級先關閉這個參數,通過SQLPROFILE綁定,從而達到解決問題的目的。
附:關于PARITITION的詳細內容可以參考
VLDBand Partitioning Guide :
https://docs.oracle.com/en/database/oracle/oracle-database/19/vldbg/index.html
里面的AdvancedPartition Pruning Techniques有關于PARTITIONRANGE SUBQUERY的內容。