MongoDB入門之索引篇

jsbintask 發布于2019-06-26 16:57 / 2145人閱讀

摘要：排序方向并不重要，可以從任意方向對索引進行遍歷。其中可以使用指定要使用的索引。即為唯一索引，并且不能刪除。索引過期后，相應的數據會被刪除。

索引就像書的目錄，如果查找某內容在沒有目錄的幫助下，只能全篇查找翻閱，這導致效率非常的低下；如果在借助目錄情況下，就能很快的定位具體內容所在區域，效率會直線提高。

索引簡介

首先打開命令行，輸入mongo。默認mongodb會連接名為test的數據庫。

?  ~  mongo
MongoDB shell version: 2.4.9
connecting to: test
> show collections
>

可以使用show collections/tables查看數據庫為空。

然后在mongodb shell執行如下代碼

> for(var i=0;i<100000;i++) {
... db.users.insert({username:"user"+i})
... }
> show collections
system.indexes
users
>

再查看數據庫發現多了system.indexes 和 users兩個表，前者即所謂的索引，后者為新建的數據庫表。
這樣user表中即有了10萬條數據。

> db.users.find()
{ "_id" : ObjectId("5694d5da8fad9e319c5b43e4"), "username" : "user0" }
{ "_id" : ObjectId("5694d5da8fad9e319c5b43e5"), "username" : "user1" }
{ "_id" : ObjectId("5694d5da8fad9e319c5b43e6"), "username" : "user2" }
{ "_id" : ObjectId("5694d5da8fad9e319c5b43e7"), "username" : "user3" }
{ "_id" : ObjectId("5694d5da8fad9e319c5b43e8"), "username" : "user4" }
{ "_id" : ObjectId("5694d5da8fad9e319c5b43e9"), "username" : "user5" }

現在需要查找其中任意一條數據,比如

> db.users.find({username: "user1234"})
{ "_id" : ObjectId("5694d5db8fad9e319c5b48b6"), "username" : "user1234" }

發現這條數據成功找到，但需要了解詳細信息，需要加上explain方法

> db.users.find({username: "user1234"}).explain()
{
    "cursor" : "BasicCursor",
    "isMultiKey" : false,
    "n" : 1,
    "nscannedObjects" : 100000,
    "nscanned" : 100000,
    "nscannedObjectsAllPlans" : 100000,
    "nscannedAllPlans" : 100000,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 30,
    "indexBounds" : {
        
    },
    "server" : "root:27017"
}

參數很多，目前我們只關注其中的"nscanned" : 100000和"millis" : 30這兩項。
nscanned表示mongodb在完成這個查詢過程中掃描的文檔總數?？梢园l現，集合中的每個文檔都被掃描了，并且總時間為30毫秒。
如果數據有1000萬個，如果每次查詢文檔都遍歷一遍。呃，時間也是相當可觀。

對于此類查詢，索引是一個非常好的解決方案。

> db.users.ensureIndex({"username": 1})

其中數字1或-1表示索引的排序方向，一般都可以。
然后再查找user1234

> db.users.ensureIndex({"username": 1})
> db.users.find({username: "user1234"}).explain()
{
    "cursor" : "BtreeCursor username_1",
    "isMultiKey" : false,
    "n" : 1,
    "nscannedObjects" : 1,
    "nscanned" : 1,
    "nscannedObjectsAllPlans" : 1,
    "nscannedAllPlans" : 1,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "username" : [
            [
                "user1234",
                "user1234"
            ]
        ]
    },
    "server" : "root:27017"
}

的確有點不可思議，查詢在瞬間完成，因為通過索引只查找了一條數據，而不是100000條。

當然使用索引是也是有代價的：對于添加的每一條索引，每次寫操作（插入、更新、刪除）都將耗費更多的時間。這是因為，當數據發生變化時，不僅要更新文檔，還要更新級集合上的所有索引。因此，mongodb限制每個集合最多有64個索引。通常，在一個特定的集合上，不應該擁有兩個以上的索引。

小技巧

如果一個非常通用的查詢，或者這個查詢造成了性能瓶頸，那么在某字段（比如username）建立索引是非常好的選擇。但只是給管理員用的查詢（不太在意查詢耗費時間），就不該對這個字段建立索引。

復合索引

索引的值是按一定順序排列的，所以使用索引鍵對文檔進行排序非常快。

db.users.find().sort({"age": 1, "username": 1})

這里先根據age排序再根據username排序，所以username在這里發揮的作用并不大。為了優化這個排序，可能需要在age和username上建立索引。

db.users.ensureIndex({"age":1, "username": 1})

這就建立了一個復合索引（建立在多個字段上的索引），如果查詢條件包括多個鍵，這個索引就非常有用。

建立復合索引后，每個索引條目都包括一個age字段和一個username字段，并且指向文檔在磁盤上的存儲位置。
此時，age字段是嚴格升序排列的，如果age相等時再按照username升序排列。

查詢方式 點查詢（point query）

用于查詢單個值（盡管包含這個值的文檔可能有多個）

db.users.find({"age": 21}).sort({"username": -1})

因為我們已經建立好復合索引，一個age一個username，建立索引時使用的是升序排序（即數字1），當使用點查詢查找{age：21},假設仍然是10萬條數據?？赡苣挲g是21的很多人，因此會找到不只一條數據。然后sort({"username": -1})會對這些數據進行逆序排序，本意是這樣。但我們不要忘記建立索引時"username"：1是升序（從小到大）,如果想得到逆序只要對數據從最后一個索引開始，依次遍歷即可得到想要的結果。

排序方向并不重要，mongodb可以從任意方向對索引進行遍歷。

綜上，復合索引在點查詢這種情況非常高效，直接定位年齡，不需要對結果進行排序即可返回結果。

多值查詢（multi-value-query）

db.users.find({"age": {"$gte": 21, "$lte": 30}})

查找多個值相匹配的文檔。多值查詢也可以理解為多個點查詢。
如上，要查找年齡介于21到30之間。monogdb會使用索引的中的第一個鍵"age"得到匹配的結果，而結果通常是按照索引順序排列的。

db.users.find({"age": {"$gte": 21, "$lte": 30}}).sort({"username": 1})

與上一個類似，這次需要對結果排序。
在沒有sort時，我們查詢的結果首先是根據age等于21，age等于22..這樣從小到大排序，當age等于21有多個時，在進行usernameA-Z（0-9）這樣排序。所以，sort({"username": 1})，要將所有結果通過名字升序排列，這次不得不先在內存中進行排序，然后返回。效率不如上一個高。

當然，在文檔非常少的情況，排序也花費不了多少時間。
如果結果集很大，比如超過32MB，MongoDB會拒絕對如此多的數據進行排序工作。

還有另外一種解決方案

也可以建立另外一個索引{"username": 1, "age": 1}, 如果先對username建立索引，如果再sortusername,相當沒有進行排序。但是需要在整個文檔查找age等于21的帥哥美女，所以搜尋時間就長了。

但哪個效率更高呢？

如果建立多個索引，如何選擇使用哪個呢？

效率高低是分情況的，如果在沒有限制的情況下，不進行排序但需要搜索整個集合時間會遠超過前者。但是在返回部分數據（比如limit（1000）），新的贏家就產生了。

>db.users.find({"age": {"$gte": 21, "$lte": 30}}).
sort({username": 1}).
limit(1000).
hint({"age": 1, "username": 1})
explain()["millis"]

2031ms

>db.users.find({"age": {"$gte": 21, "$lte": 30}}).
sort({username": 1}).
limit(1000).
hint({"username": 1, "age": 1}).
explain()["millis"]

181ms

其中可以使用hint指定要使用的索引。
所以這種方式還是很有優勢的。比如一般場景下，我們不會把所有的數據都取出來，只是去查詢最近的，所以這種效率也會更高。

索引類型 單鍵索引

最普通索引，如

db.users.ensureIndex({"username": 1})

唯一索引

可以確保集合的每個文檔的指定鍵都有唯一值。

db.users.ensureIndex({"username": 1, unique: true})

如果插入2個相同都叫張三的數據，第二次插入的則會失敗。_id即為唯一索引，并且不能刪除。
這和使用mongoose框架很相似，比如在定義schema時，即可指定unique: true

company: { // 公司名稱
    type: String,
    required: true,
    unique: true
}

多鍵索引

如果某個鍵的值在文檔中是一個數組，那么這個索引就會被標記為多鍵索引。
比如現在members文檔中隨便添加有3條數據：

> db.members.find()
{ "_id" : ObjectId("1"), "tags" : [  "ame",  "fear",  "big" ] }
{ "_id" : ObjectId("2"), "tags" : [  "ame",  "fear",  "big",  "chi" ] }
{ "_id" : ObjectId("3"), "tags" : [  "ame",  "jr",  "big",  "chi" ] }

當我查找tags="jr"數據時，db會查找所有文檔，所以nscanned=3,并且返回一條，此時n=1。

>db.members.find({tags: "jr"}).explain()
{
    "cursor" : "BasicCursor",
    "isMultiKey" : false,
    "n" : 1,
    "nscanned" : 3,
}

然后建立索引

> db.members.ensureIndex({tags:1})

之后我們在對tags="jr"進行查找，此時nscanned=1，并且isMultiKey由原來的false變為true。所以可以說明，mongodb對數組做了多個鍵的索引，即把所有的數組元素都做了索引。

> db.members.find({tags: "jr"}).explain()
{
    "cursor" : "BtreeCursor tags_1",
    "isMultiKey" : true,
    "n" : 1,
    "nscannedObjects" : 1,
    "nscanned" : 1,
}

過期索引

是在一段時間后會過期的索引。索引過期后，相應的數據會被刪除。適合存儲一些在一段時間失效的數據比如用戶的登錄信息，存儲的日志等。
和設置單鍵索引很類似，只是多個expireAfterSeconds參數，單位是秒。

db.collectionName.ensureIndex({key: 1}, {expireAfterSeconds: 10})

首先我們先建立一下索引，數據會在30秒后刪除

> db.members.ensureIndex({time:1}, {expireAfterSeconds: 30})

插入數據

> db.members.insert({time: new Date()})

查詢

> db.members.find()

{ "_id" : ObjectId("4"), "time" : ISODate("2016-01-16T12:27:20.171Z") }

30秒后再次查詢，數據則消失了。

存儲的值必須是ISODate時間類型（比如new Date()），如果存儲的非時間類型，則不會自動刪除。
過期索引不能是復合索引。
刪除的時間不精確，因為刪除過程每60秒后臺程序跑一次，而且刪除也需要一些時間，存在誤差。

稀疏索引

使用sparse可以創建稀疏索引和唯一索引

>db.users.ensureIndex({"email": 1}, {"unique": true, "sparse": true})

下面來自官網的問候

Sparse Index with Unique Constraint（約束）

Consider a collection scores that contains the following documents:

{ "_id" : ObjectId("523b6e32fb408eea0eec2647"), "userid" : "newbie" }
{ "_id" : ObjectId("523b6e61fb408eea0eec2648"), "userid" : "abby", "score" : 82 }
{ "_id" : ObjectId("523b6e6ffb408eea0eec2649"), "userid" : "nina", "score" : 90 }

You could create an index with a unique constraint and sparse filter on the score field using the following operation:

db.scores.createIndex( { score: 1 } , { sparse: true, unique: true } )

This index would permit the insertion of documents that had unique values for the score field or did not include a score field.
所以索引會允許不同score的文檔或根本沒有score這個字段的文檔插入成功。

As such, given the existing documents in the scores collection, the index permits the following insert operations:
以下插入成功：

db.scores.insert( { "userid": "AAAAAAA", "score": 43 } )
db.scores.insert( { "userid": "BBBBBBB", "score": 34 } )
db.scores.insert( { "userid": "CCCCCCC" } )
db.scores.insert( { "userid": "DDDDDDD" } )

However, the index would not permit the addition of the following documents since documents already exists with score value of 82 and 90:

db.scores.insert( { "userid": "AAAAAAA", "score": 82 } )
db.scores.insert( { "userid": "BBBBBBB", "score": 90 } )

索引管理

system.indexes集合中包含了每個索引的詳細信息

db.system.indexes.find()

創建索引 Mongo shell

ensureIndex()

createIndex()

example

db.users.ensureIndex({"username": 1})

后臺創建索引，這樣數據庫再創建索引的同時，仍然能夠處理讀寫請求，可以指定background選項。

db.test.ensureIndex({"username":1},{"background":true})

Schema

var animalSchema = new Schema({
  name: String,
  type: String,
  tags: { type: [String], index: true } // field level
});

animalSchema.index({ name: 1, type: -1 }); // schema level

在Schema中，官方不推薦在生成環境直接創建索引

When your application starts up, Mongoose automatically calls ensureIndex for each defined index in your schema. Mongoose will call ensureIndex for each index sequentially, and emit an "index" event on the model when all the ensureIndex calls succeeded or when there was an error. While nice for development, it is recommended this behavior be disabled in production since index creation can cause a significant performance impact . Disable the behavior by setting the autoIndex option of your schema to false, or globally on the connection by setting the option config.autoIndex to false.

2.getIndexes()查看索引

db.collectionName.getIndexes()

db.users.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "ns" : "test.users",
        "name" : "_id_"
    },
    {
        "v" : 1,
        "key" : {
            "username" : 1
        },
        "ns" : "test.users",
        "name" : "username_1"
    }
]

其中v字段只在內部使用，用于標識索引版本。

3.dropIndex刪除索引

> db.users.dropIndex("username_1")
{ "nIndexesWas" : 2, "ok" : 1 }

或

> db.users.dropIndex({"username":1})

云服務器 GPU云服務器之基礎篇入門篇 ASPNET入門數據篇機器人制作入門篇

文章版權歸作者所有，未經允許請勿轉載,若此文章存在違規行為，您可以聯系管理員刪除。

轉載請注明本文地址：http://specialneedsforspecialkids.com/yun/18804.html

數據庫收集 - 收藏集 - 掘金

摘要：前言在使用加載數據數據庫常見的優化操作后端掘金一索引將放第一位，不用說，這種優化方式我們一直都在悄悄使用，那便是主鍵索引。 Redis 內存壓縮實戰 - 后端 - 掘金在討論Redis內存壓縮的時候，我們需要了解一下幾個Redis的相關知識。壓縮列表 ziplist Redis的ziplist是用一段連續的內存來存儲列表數據的一個數據結構，它的結構示例如下圖 zlbytes: 記錄整...

Little_XM 2019-06-25 18:25 評論0 收藏0

發表評論

登陸后可評論

0條評論

jsbintask

男|高級講師

我要關注我要私信

TA的文章

【三子棋（井字棋）】如何用C語言實現

閱讀 3054·2021-11-11 16:55
Shotcut 一款跨平臺支持的免費開源視頻剪輯軟件（支持簡體中文）

閱讀 3170·2021-10-18 13:34
性能測試，你需要了解這款工具

閱讀 592·2021-10-14 09:42
獨立顯卡二季度賣出118億美元：同比暴漲1.5倍

閱讀 1642·2021-09-03 10:30
CYUN：2021年盛夏促銷來襲，全系列服務器產品新購8.5折，低至24.65元/月

閱讀 848·2021-08-05 10:02
兩個盒子垂直水平居中，并且相距距離一樣的實現

閱讀 970·2019-08-30 11:27
小卡片左右滑動的實現

閱讀 3484·2019-08-29 15:14
CSS重置樣式

閱讀 1254·2019-08-29 13:02

国产xxxx99真实实拍_久久不雅视频_高清韩国a级特黄毛片_嗯老师别我我受不了了小说

資訊專欄INFORMATION COLUMN

上云采購季！| 2核2G4M爆款云服務器低至59元/年，更有多臺、長期優惠，快來選購！

MongoDB入門之索引篇

相關文章

數據庫收集 - 收藏集 - 掘金

發表評論

0條評論

jsbintask

男|高級講師

TA的文章

【三子棋（井字棋）】如何用C語言實現

Shotcut 一款跨平臺支持的免費開源視頻剪輯軟件（支持簡體中文）

性能測試，你需要了解這款工具

獨立顯卡二季度賣出118億美元：同比暴漲1.5倍

CYUN：2021年盛夏促銷來襲，全系列服務器產品新購8.5折，低至24.65元/月

兩個盒子垂直水平居中，并且相距距離一樣的實現

小卡片左右滑動的實現

CSS重置樣式

最新活動

資訊專欄INFORMATION COLUMN

上云采購季！| 2核2G4M爆款云服務器低至59元/年，更有多臺、長期優惠，快來選購！

MongoDB入門之索引篇

相關文章

發表評論

0條評論

男|高級講師

TA的文章

最新活動

上云采購季！| 2核2G4M爆款云服務器低至59元/年，更有多臺、長期優惠，快來選購！