Python中collections模塊的使用

xorpay 發布于2019-06-26 18:39 / 1918人閱讀

摘要：這里提示一下，有些函數對隊列進行操作，但返回值是，比如反轉隊列，將隊列中元素向右移位，尾部的元素移到頭部。比如字典中的鍵映射多個值輸出結果如下三用途創建命名字段的元組。四用途統計可哈希的對象。

本文將詳細講解collections模塊中的所有類，和每個類中的方法，從源碼和性能的角度剖析。

一個模塊主要用來干嘛，有哪些類可以使用，看__init__.py就知道

"""This module implements specialized container datatypes providing
alternatives to Python"s general purpose built-in containers, dict,
list, set, and tuple.

* namedtuple   factory function for creating tuple subclasses with named fields
* deque        list-like container with fast appends and pops on either end
* ChainMap     dict-like class for creating a single view of multiple mappings
* Counter      dict subclass for counting hashable objects
* OrderedDict  dict subclass that remembers the order entries were added
* defaultdict  dict subclass that calls a factory function to supply missing values
* UserDict     wrapper around dictionary objects for easier dict subclassing
* UserList     wrapper around list objects for easier list subclassing
* UserString   wrapper around string objects for easier string subclassing

"""

__all__ = ["deque", "defaultdict", "namedtuple", "UserDict", "UserList",
            "UserString", "Counter", "OrderedDict", "ChainMap"]

collections模塊實現一些特定的數據類型，可以替代Python中常用的內置數據類型如dict, list, set, tuple，簡單說就是對基本數據類型做了更上一層的處理。

一、deque

用途：雙端隊列，頭部和尾部都能以O(1)時間復雜度插入和刪除元素。類似于列表的容器

所謂雙端隊列，就是兩端都能操作，與Python內置的list區別在于：頭部插入與刪除的時間復雜度為O(1)，來個栗子感受一下：

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# __author__ = "liao gao xiang"

"""
保留最后n個元素
"""
from collections import deque


def search(file, pattern, history=5):
    previous_lines = deque(maxlen=history)
    for l in file:
        if pattern in l:
            yield l, previous_lines  # 使用yield表達式的生成器函數，將搜索過程的代碼和搜索結果的代碼解耦
        previous_lines.append(l)


with open(b"file.txt", mode="r", encoding="utf-8") as f:
    for line, prevlines in search(f, "Python", 5):
        for pline in prevlines:
            print(pline, end="")
        print(line, end="")

d = deque()
d.append(1)
d.append("2")
print(len(d))
print(d[0], d[1])
d.extendleft([0])
print(d)
d.extend([6, 7, 8])
print(d)

d2 = deque("12345")
print(len(d2))
d2.popleft()
print(d2)
d2.pop()
print(d2)

# 在隊列兩端插入或刪除元素時間復雜度都是 O(1) ，區別于列表，在列表的開頭插入或刪除元素的時間復雜度為 O(N)
d3 = deque(maxlen=2)
d3.append(1)
d3.append(2)
print(d3)
d3.append(3)
print(d3)

輸出結果如下

人生苦短
我用Python
2
1 2
deque([0, 1, "2"])
deque([0, 1, "2", 6, 7, 8])
5
deque(["2", "3", "4", "5"])
deque(["2", "3", "4"])
deque([1, 2], maxlen=2)
deque([2, 3], maxlen=2)

因此，如果你遇到經常操作列表頭的場景，使用deque最好。deque類的所有方法，自行操作一遍就知道了。

class deque(object):
    """
    deque([iterable[, maxlen]]) --> deque object
    
    A list-like sequence optimized for data accesses near its endpoints.
    """
    def append(self, *args, **kwargs): # real signature unknown
        """ Add an element to the right side of the deque. """
        pass

    def appendleft(self, *args, **kwargs): # real signature unknown
        """ Add an element to the left side of the deque. """
        pass

    def clear(self, *args, **kwargs): # real signature unknown
        """ Remove all elements from the deque. """
        pass

    def copy(self, *args, **kwargs): # real signature unknown
        """ Return a shallow copy of a deque. """
        pass

    def count(self, value): # real signature unknown; restored from __doc__
        """ D.count(value) -> integer -- return number of occurrences of value """
        return 0

    def extend(self, *args, **kwargs): # real signature unknown
        """ Extend the right side of the deque with elements from the iterable """
        pass

    def extendleft(self, *args, **kwargs): # real signature unknown
        """ Extend the left side of the deque with elements from the iterable """
        pass

    def index(self, value, start=None, stop=None): # real signature unknown; restored from __doc__
        """
        D.index(value, [start, [stop]]) -> integer -- return first index of value.
        Raises ValueError if the value is not present.
        """
        return 0

    def insert(self, index, p_object): # real signature unknown; restored from __doc__
        """ D.insert(index, object) -- insert object before index """
        pass

    def pop(self, *args, **kwargs): # real signature unknown
        """ Remove and return the rightmost element. """
        pass

    def popleft(self, *args, **kwargs): # real signature unknown
        """ Remove and return the leftmost element. """
        pass

    def remove(self, value): # real signature unknown; restored from __doc__
        """ D.remove(value) -- remove first occurrence of value. """
        pass

    def reverse(self): # real signature unknown; restored from __doc__
        """ D.reverse() -- reverse *IN PLACE* """
        pass

    def rotate(self, *args, **kwargs): # real signature unknown
        """ Rotate the deque n steps to the right (default n=1).  If n is negative, rotates left. """
        pass

這里提示一下，有些函數對隊列進行操作，但返回值是None，比如reverse()反轉隊列，rotate(1)將隊列中元素向右移1位，尾部的元素移到頭部。

二、defaultdict

用途：帶有默認值的字典。父類為Python內置的dict

字典帶默認值有啥好處？舉個栗子，一般來講，創建一個多值映射字典是很簡單的。但是，如果你選擇自己實現的話，那么對于值的初始化可能會有點麻煩，你可能會像下面這樣來實現：

d = {}
for key, value in pairs:
    if key not in d:
        d[key] = []
    d[key].append(value)

如果使用 defaultdict 的話代碼就更加簡潔了：

d = defaultdict(list)
for key, value in pairs:
    d[key].append(value)

defaultdict 的一個特征是它會自動初始化每個 key 剛開始對應的值，所以你只需要關注添加元素操作了。比如：

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# __author__ = "liao gao xiang"

# 字典中的鍵映射多個值
from collections import defaultdict

d = defaultdict(list)
print(d)
d["a"].append([1, 2, 3])
d["b"].append(2)
d["c"].append(3)

print(d)

d = defaultdict(set)
print(d)
d["a"].add(1)
d["a"].add(2)
d["b"].add(4)

print(d)

輸出結果如下：

defaultdict(, {})
defaultdict(, {"a": [[1, 2, 3]], "b": [2], "c": [3]})
defaultdict(, {})
defaultdict(, {"a": {1, 2}, "b": {4}})

三、namedtuple()

用途：創建命名字段的元組。工廠函數

namedtuple主要用來產生可以使用名稱來訪問元素的數據對象，通常用來增強代碼的可讀性，在訪問一些tuple類型的數據時尤其好用。

比如我們用戶擁有一個這樣的數據結構，每一個對象是擁有三個元素的tuple。使用namedtuple方法就可以方便的通過tuple來生成可讀性更高也更好用的數據結構。

from collections import namedtuple

websites = [
    ("Sohu", "http://www.sohu.com/", u"張朝陽"),
    ("Sina", "http://www.sina.com.cn/", u"王志東"),
    ("163", "http://www.163.com/", u"丁磊")
]

Website = namedtuple("Website", ["name", "url", "founder"])

for website in websites:
    website = Website._make(website)
    print website


# 輸出結果:
Website(name="Sohu", url="http://www.sohu.com/", founder=u"u5f20u671du9633")
Website(name="Sina", url="http://www.sina.com.cn/", founder=u"u738bu5fd7u4e1c")
Website(name="163", url="http://www.163.com/", founder=u"u4e01u78ca")

注意，namedtuple是函數，不是類。

四、Counter

用途：統計可哈希的對象。父類為Python內置的dict

尋找序列中出現次數最多的元素。假設你有一個單詞列表并且想找出哪個單詞出現頻率最高：

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# __author__ = "liao gao xiang"

from collections import Counter

words = [
    "look", "into", "my", "eyes", "look", "into", "my", "eyes",
    "the", "eyes", "the", "eyes", "the", "eyes", "not", "around", "the",
    "eyes", "don"t", "look", "around", "the", "eyes", "look", "into",
    "my", "eyes", "you"re", "under"
]

word_counts = Counter(words)

# 出現頻率最高的三個單詞
top_three = word_counts.most_common(3)
print(top_three)
# Outputs [("eyes", 8), ("the", 5), ("look", 4)]
print(word_counts["eyes"])

morewords = ["why", "are", "you", "not", "looking", "in", "my", "eyes"]

# 如果你想手動增加計數，可以簡單的用加法：
for word in morewords:
    print(word)
    word_counts[word] += 1
print(word_counts["eyes"])

結果如下：

[("eyes", 8), ("the", 5), ("look", 4)]
8
why
are
you
not
looking
in
my
eyes
9

因為Counter繼承自dict，所有dict有的方法它都有（defaultdict和OrderedDict也是的），Counter自己實現或重寫了6個方法：

most_common(self, n=None),

elements(self)

fromkeys(cls, iterable, v=None)

update(*args, **kwds)

subtract(*args, **kwds)

copy(self)

五、OrderedDict

用途：排序的字段。父類為Python內置的dict

OrderedDict在迭代操作的時候會保持元素被插入時的順序，OrderedDict內部維護著一個根據鍵插入順序排序的雙向鏈表。每次當一個新的元素插入進來的時候，它會被放到鏈表的尾部。對于一個已經存在的鍵的重復賦值不會改變鍵的順序。

需要注意的是，一個OrderedDict的大小是一個普通字典的兩倍，因為它內部維護著另外一個鏈表。所以如果你要構建一個需要大量OrderedDict 實例的數據結構的時候(比如讀取100,000行CSV數據到一個 OrderedDict 列表中去)，那么你就得仔細權衡一下是否使用 OrderedDict帶來的好處要大過額外內存消耗的影響。

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# __author__ = "liao gao xiang"

from collections import OrderedDict

d = OrderedDict()
d["foo"] = 1
d["bar"] = 2
d["spam"] = 3
d["grok"] = 4
# d["bar"] = 22 #對于一個已經存在的鍵，重復賦值不會改變鍵的順序
for key in d:
    print(key, d[key])

print(d)

import json

print(json.dumps(d))

結果如下：

foo 1
bar 2
spam 3
grok 4
OrderedDict([("foo", 1), ("bar", 2), ("spam", 3), ("grok", 4)])
{"foo": 1, "bar": 2, "spam": 3, "grok": 4}

OrderDict實現或重寫了如下方法。都是干嘛的？這個留給大家當課后作業了^_^

clear(self)

popitem(self, last=True)

move_to_end(self, key, last=True)

keys(self)

items(self)

values(self)

pop(self, key, default=__marker)

setdefault(self, key, default=None)

copy(self)

fromkeys(cls, iterable, value=None)

六、ChainMap

用途：創建多個可迭代對象的集合。類字典類型

很簡單，如下：

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# __author__ = "liao gao xiang"

from collections import ChainMap
from itertools import chain

# 不同集合上元素的迭代
a = [1, 2, 3, 4]
b = ("x", "y", "z")
c = {1, "a"}

# 方法一，使用chain
for i in chain(a, b, c):
    print(i)
print("--------------")
# 方法二，使用chainmap
for j in ChainMap(a, b, c):
    print(j)

# 這兩種均為節省內存，效率更高的迭代方式

一個 ChainMap 接受多個字典并將它們在邏輯上變為一個字典。然后，這些字典并不是真的合并在一起了，ChainMap 類只是在內部創建了一個容納這些字典的列表并重新定義了一些常見的字典操作來遍歷這個列表。大部分字典操作都是可以正常使用的，比如：

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# __author__ = "liao gao xiang"

# 合并多個字典和映射
a = {"x": 1, "z": 3}
b = {"y": 2, "z": 4}
# 現在假設你必須在兩個字典中執行查找操作
# (比如先從 a 中找，如果找不到再在 b 中找)。
# 一個非常簡單的解決方案就是使用collections模塊中的ChainMap類
from collections import ChainMap

c = ChainMap(a, b)

print(c)
a["x"] = 11  # 使用ChainMap時，原字典做了更新，這種更新會合并到新的字典中去

print(c)  # 按順序合并兩個字典
print(c["x"])
print(c["y"])
print(c["z"])

# 對于字典的更新或刪除操作影響的總是列中的第一個字典。
c["z"] = 10
c["w"] = 40
del c["x"]
print(a)
# del c["y"]將出現報錯

# ChainMap對于編程語言中的作用范圍變量（比如globals,locals等）
# 是非常有用的。事實上，有一些方法可以使它變得簡單：
values = ChainMap()  # 默認會創建一個空字典
print("	", values)
values["x"] = 1
values = values.new_child()  # 添加一個空字典
values["x"] = 2
values = values.new_child()
values["x"] = 30
# values = values.new_child()
print(values, values["x"])  # values["x"]輸出最后一次添加的值
values = values.parents  # 刪除上一次添加的字典
print(values["x"])
values = values.parents
print(values)

a = {"x": 1, "y": 2}
b = {"y": 2, "z": 3}
merge = dict(b)
merge.update(a)
print(merge["x"], merge["y"], merge["z"])
a["x"] = 11
print(merge["x"])

輸出結果如下：

ChainMap({"x": 1, "z": 3}, {"y": 2, "z": 4})
ChainMap({"x": 11, "z": 3}, {"y": 2, "z": 4})
11
2
3
{"z": 10, "w": 40}
     ChainMap({})
ChainMap({"x": 30}, {"x": 2}, {"x": 1}) 30
2
ChainMap({"x": 1})
1 2 3
1

作為ChainMap的替代，你可能會考慮使用 update() 方法將兩個字典合并。這樣也能行得通，但是它需要你創建一個完全不同的字典對象(或者是破壞現有字典結構)。同時，如果原字典做了更新，這種改變不會反應到新的合并字典中去。

ChainMap實現或重寫了如下方法：

get(self, key, default=None)

fromkeys(cls, iterable, *args)

copy(self)

new_child(self, m=None)

parents(self)

popitem(self)

pop(self, key, *args)

clear(self)

七、UserDict、UserList、UserString

這三個類是分別對 dict、list、str 三種數據類型的包裝，其主要是為方便用戶實現自己的數據類型。在 Python2 之前，這三個類分別位于 UserDict、UserList、UserString 三個模塊中，需要用類似于 from UserDict import UserDict 的方式導入。在 Python3 之后則被挪到了 collections 模塊中。這三個類都是基類，如果用戶要擴展這三種類型，只需繼承這三個類即可。

程序員交流群，干貨分享，加我拉你入群。

云服務器 GPU云服務器單獨使用webrtc中視頻模塊 python中模塊 python模塊使用 python中】模塊

文章版權歸作者所有，未經允許請勿轉載,若此文章存在違規行為，您可以聯系管理員刪除。

轉載請注明本文地址：http://specialneedsforspecialkids.com/yun/19852.html

不可不知的python模塊--collections

摘要：原生的也可以從頭部添加和取出對象就像這樣但是值得注意的是，對象的這兩種用法的時間復雜度是，也就是說隨著元素數量的增加耗時呈線性上升。基本介紹 Python擁有一些內置的數據類型，比如str, int, list, tuple, dict等， collections模塊在這些內置數據類型的基礎上，提供了幾個額外的數據類型： namedtuple(): 生成可以使用名字來訪問元素內容的...

韓冰 2019-07-30 16:18 評論0 收藏0
Python中collections模塊的使用

摘要：這里提示一下，有些函數對隊列進行操作，但返回值是，比如反轉隊列，將隊列中元素向右移位，尾部的元素移到頭部。比如字典中的鍵映射多個值輸出結果如下三用途創建命名字段的元組。四用途統計可哈希的對象。本文將詳細講解collections模塊中的所有類，和每個類中的方法，從源碼和性能的角度剖析。一個模塊主要用來干嘛，有哪些類可以使用，看__init__.py就知道 This module i...

sf190404 2019-06-27 18:58 評論0 收藏0
Python中collections模塊的使用

摘要：這里提示一下，有些函數對隊列進行操作，但返回值是，比如反轉隊列，將隊列中元素向右移位，尾部的元素移到頭部。比如字典中的鍵映射多個值輸出結果如下三用途創建命名字段的元組。四用途統計可哈希的對象。本文將詳細講解collections模塊中的所有類，和每個類中的方法，從源碼和性能的角度剖析。一個模塊主要用來干嘛，有哪些類可以使用，看__init__.py就知道 This module i...

wums 2019-07-30 18:35 評論0 收藏0