PostgreSQL黑科技BRIN索引

IT那活兒發(fā)布于2023-01-11 13:20 / 3631人閱讀

PostgreSQL黑科技BRIN索引

前言

Oracle的位圖索引適用于列的基數(shù)很少，可枚舉，重復(fù)值很多，數(shù)據(jù)不會(huì)被經(jīng)常更新的列。

它的索引結(jié)構(gòu)是一個(gè)鍵值對(duì)應(yīng)很多行(rowid)，對(duì)于報(bào)表類數(shù)據(jù)庫(kù)，重復(fù)率高的數(shù)據(jù)，特定類型的查詢例如count、or、and等邏輯操作，只需要進(jìn)行位運(yùn)算即可得到我們需要的結(jié)果，可以說是相當(dāng)?shù)男省?/p>

最近項(xiàng)目在進(jìn)行上云工作，有許多Oracle到PostgreSQL的遷移要做。其中涉及到位圖索引，

然而PostgreSQL沒有位圖索引。怎么改造這種索引，來實(shí)現(xiàn)相應(yīng)的索引場(chǎng)景呢？

今天我們來聊一聊PostgreSQL中的黑科技Brin索引。

索引原理

BRIN索引是塊級(jí)索引，有別于B-TREE等索引，BRIN記錄并不是以行號(hào)為單位記錄索引明細(xì)，而是記錄每個(gè)數(shù)據(jù)塊或者每段連續(xù)的數(shù)據(jù)塊的統(tǒng)計(jì)信息。因此BRIN索引空間占用特別的小，對(duì)數(shù)據(jù)寫入、更新、刪除的影響也很小。

BRIN索引的掃描原理很簡(jiǎn)單，掃描BRIN的元數(shù)據(jù)，根據(jù)元數(shù)據(jù)和用戶輸入的條件進(jìn)行比較，過濾不符合條件的HEAPPAGE，只掃描需要掃描的HEAPPAGE。

如果數(shù)據(jù)排列的比較隨機(jī)時(shí)，那么索引效果就非常差。達(dá)不到索引快速掃描的效果。

索引測(cè)試

我們創(chuàng)建兩張表，一張順序插入，一張亂序插入：

--順序插入tab_brin1：

otter_pg=# create table tab_brin1(id int,name varchar(40),c_time timestamp);

CREATE TABLE

otter_pg=# insert into tab_brin1 select *,md5(random()::text),clock_timestamp() from generate_series(1,10000000);

INSERT 0 10000000

--亂序插入tab_brin2：

otter_pg=# create table tab_brin2(id int,name varchar(40),c_time timestamp);

CREATE TABLE

tter_pg=# insert into tab_brin2 select (random()*(10^6))::integer,md5(random()::text),timestamp 2019-01-10 20:00:00 + random() * (timestamp 2019-01-20 20:00:00 - timestamp 2021-01-10 10:00:00) from generate_series(1,10000000);

INSERT 0 10000000

--兩張表都創(chuàng)建BRIN索引和BTREE索引

otter_pg=# create index idx1_tab_brin1 on tab_brin1 using brin(c_time);

CREATE INDEX

otter_pg=# create index idx1_tab_brin2 on tab_brin2 using brin(c_time);

CREATE INDEX

otter_pg=# create index idx2_tab_brin1 on tab_brin1 using btree(c_time);

CREATE INDEX

otter_pg=# create index idx2_tab_brin2 on tab_brin1 using btree(c_time);

CREATE INDEX

--我們看看索引大小

可以看到表為700M，BTREE索引需要214M，而BRIN索引只有40K。

otter_pg=# select pg_size_pretty(pg_relation_size(tab_brin1));

pg_size_pretty

----------------

730 MB

otter_pg=# select pg_size_pretty(pg_relation_size(idx1_tab_brin1));

pg_size_pretty

----------------

40 kB

otter_pg=# select pg_size_pretty(pg_relation_size(idx2_tab_brin1));

pg_size_pretty

----------------

214 MB

--我們?cè)趤砜纯碆RIN索引的使用。

首先看看兩表的離散度,如下可以看出tab_brin1表的邏輯順序和物理順序一致性更好些。

otter_pg=# select correlation from pg_stats where tablename=tab_brin1;

correlation

---------------------

0.0048282277

otter_pg=# select correlation from pg_stats where tablename=tab_brin2;

correlation

---------------------

0.0010042704

-0.002086642

0.006167772

對(duì)比下使用兩表BRIN索引時(shí)的效率，這里我們需要?jiǎng)h除前面創(chuàng)建的BTREE索引。

--tab_brin1的執(zhí)行計(jì)劃如下：可以看到耗時(shí)0.6ms。

--tab_brin2的執(zhí)行計(jì)劃如下：可以看到耗時(shí)21ms。

經(jīng)過分析，物理順序和邏輯順序越一致，該列更適合建立BRIN索引。

Pages_per_range參數(shù)調(diào)優(yōu)

BRIN索引有一個(gè)參數(shù)pages_per_range可以用來近一步提升Brin索引的性能。

pages_per_range是粒度，默認(rèn)為128(表示每128個(gè)數(shù)據(jù)塊統(tǒng)計(jì)一次邊界)，它影響B(tài)RIN索引的精確度和 BRIN索引的大小。

--精度為1時(shí)，耗時(shí)46.6ms

otter_pg=# create index idx1_tab_brin1 on tab_brin1 using brin(c_time) with (pages_per_range=1);

CREATE INDEX

otter_pg=#

otter_pg=# explain (analyze ,verbose,timing,costs,buffers)

select * from tab_brin1 where c_time between 2019-01-10 20:00:00 and 2020-01-10 20:00:00;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-----

Bitmap Heap Scan on public.tab_brin1 (cost=424.40..543.34 rows=1 width=45) (actual time=46.544..46.544 rows=0 loops=1)

Output: id, name, c_time

Recheck Cond: ((tab_brin1.c_time >= 2019-01-10 20:00:00::timestamp without time zone) AND (tab_brin1.c_time <= 2020-01-10 20:00:00::timestamp without time zone))

Buffers: shared hit=527

-> Bitmap Index Scan on idx1_tab_brin1 (cost=0.00..424.40 rows=107 width=0) (actual time=46.536..46.536 rows=0 loops=1)

Index Cond: ((tab_brin1.c_time >= 2019-01-10 20:00:00::timestamp without time zone) AND (tab_brin1.c_time <= 2020-01-10 20:00:00::timestamp without time zo

ne))

Buffers: shared hit=527

Planning Time: 0.632 ms

Execution Time: 46.639 ms

(9 rows)

--精度為50時(shí)，耗時(shí)1.18ms

otter_pg=# create index idx1_tab_brin1 on tab_brin1 using brin(c_time) with (pages_per_range=50);

CREATE INDEX

otter_pg=# explain (analyze ,verbose,timing,costs,buffers)

select * from tab_brin1 where c_time between 2019-01-10 20:00:00 and 2020-01-10 20:00:00;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-----

Bitmap Heap Scan on public.tab_brin1 (cost=10.91..5688.47 rows=1 width=45) (actual time=1.115..1.115 rows=0 loops=1)

Output: id, name, c_time

Recheck Cond: ((tab_brin1.c_time >= 2019-01-10 20:00:00::timestamp without time zone) AND (tab_brin1.c_time <= 2020-01-10 20:00:00::timestamp without time zone))

Buffers: shared hit=11

-> Bitmap Index Scan on idx1_tab_brin1 (cost=0.00..10.91 rows=5348 width=0) (actual time=1.105..1.105 rows=0 loops=1)

Index Cond: ((tab_brin1.c_time >= 2019-01-10 20:00:00::timestamp without time zone) AND (tab_brin1.c_time <= 2020-01-10 20:00:00::timestamp without time zo

ne))

Buffers: shared hit=11

Planning Time: 0.566 ms

Execution Time: 1.186 ms

(9 rows)

pages_per_range定義數(shù)據(jù)塊的數(shù)量，為BRIN索引的每條記錄統(tǒng)計(jì)的數(shù)據(jù)塊范圍。默認(rèn)值為128。

如果這個(gè)值很大，則索引就會(huì)很小，索引掃描就會(huì)很迅速，但是后續(xù)內(nèi)存中的Recheck就會(huì)很多，因?yàn)榘汛罅康牟幌嚓P(guān)數(shù)據(jù)拉到內(nèi)存中了。

如果這個(gè)值很小，索引的過濾性越好，但索引也會(huì)越大。由于每篩選一次字段PostgreSQL 都要掃描全部的BRIN索引，所花費(fèi)的時(shí)間也會(huì)變長(zhǎng)，因此需要根據(jù)表的大小與應(yīng)用場(chǎng)景去調(diào)整其值的大小。

BRIN主要適用于類似時(shí)序數(shù)據(jù)之類的，有著天然的順序，而且都是添加寫的場(chǎng)景。相比于BTREE索引，它的體積小得多，非常適用于大數(shù)據(jù)量的場(chǎng)景。

文章版權(quán)歸作者所有，未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址：http://specialneedsforspecialkids.com/yun/130048.html

發(fā)表評(píng)論

登陸后可評(píng)論

0條評(píng)論

IT那活兒

男|高級(jí)講師

我要關(guān)注我要私信

TA的文章

消息中間件故障分析一例

閱讀 1346·2023-01-11 13:20
RAC雙節(jié)點(diǎn)crash回復(fù)一例

閱讀 1684·2023-01-11 13:20
ORA-600處理一例

閱讀 1132·2023-01-11 13:20
雙節(jié)點(diǎn)RAC實(shí)例2 HANG 故障分析一例

閱讀 1858·2023-01-11 13:20
RAC集群節(jié)點(diǎn)1重啟分析一例

閱讀 4100·2023-01-11 13:20
CRS啟動(dòng)報(bào)錯(cuò)CRS-1656處理分享

閱讀 2704·2023-01-11 13:20
oracle 12CR2打補(bǔ)丁報(bào)錯(cuò)處理一例

閱讀 1385·2023-01-11 13:20
分布式緩存組件故障分析及監(jiān)控優(yōu)化

閱讀 3597·2023-01-11 13:20

国产xxxx99真实实拍_久久不雅视频_高清韩国a级特黄毛片_嗯老师别我我受不了了小说

資訊專欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來選購(gòu)！

PostgreSQL黑科技BRIN索引

索引測(cè)試

Pages_per_range參數(shù)調(diào)優(yōu)

相關(guān)文章

PostgreSQL JSONB 使用入門

PostgreSQL JSONB 使用入門

**新書推薦 |《PostgreSQL實(shí)戰(zhàn)》出版(提供樣章下載)**

發(fā)表評(píng)論

0條評(píng)論

IT那活兒

男|高級(jí)講師

TA的文章

消息中間件故障分析一例

RAC雙節(jié)點(diǎn)crash回復(fù)一例

ORA-600處理一例

雙節(jié)點(diǎn)RAC實(shí)例2 HANG 故障分析一例

RAC集群節(jié)點(diǎn)1重啟分析一例

CRS啟動(dòng)報(bào)錯(cuò)CRS-1656處理分享

oracle 12CR2打補(bǔ)丁報(bào)錯(cuò)處理一例

分布式緩存組件故障分析及監(jiān)控優(yōu)化

最新活動(dòng)