python模塊之re（正則表達式）

Cheriselalala 發布于2019-07-31 10:15 / 1122人閱讀

摘要：多行模式，改變元字符和的行為。模塊級方法編譯正則表達式，返回一個對象。掃描參數，查找正則表達式產生匹配的第一個結果，返回一個對象。清空正則表達式緩存。和參數意義同與模塊級的相同與模塊級的相同屬性返回一個正整數，表示正則匹配模式。

匹配模式

re.ASCII
同re.A，對應的內聯標識為(?a)，用于向后兼容。使元字符w, W, , B, d, D, s和S僅匹配ASCII字符。該模式只在string模式下有意義，在byte模式下將被忽略。

re.DEBUG
顯示debug信息，沒有對應的內聯標識。

re.IGNORECASE
同re.I，對應的內聯標識是(?i)。忽略大小寫匹配，如表達式[A-Z]也會匹配小寫的字母a-z。對Unicode字符同樣生效(如"ü"可以匹配"ü")，除非指定了re.ASCII禁止匹配非ASCII字符。

當前locale不會改變此標識的效果，除非指定了re.LOCALE。

在string模式下[a-z],[A-Z]和IGNORECASE標識結合使用時，將匹配52個ASCII字母和4個非ASCII字母。

re.LOCALE
同re.L，對應的內聯標識為(?L)。不推薦使用。

re.MULTILINE
同re.M，對應的內聯標識為(?m)。多行模式，改變元字符^和$的行為。
默認^只匹配字符串開始，指定后還會匹配每行的開始（換行符之后）；默認$只匹配字符串結尾，指定后還會匹配每行結尾（換行符之前）。

re.DOTALL
同re.S，對應的內聯標識為(?s)。此模式下，元字符.匹配任意字符，包括換行符。

re.VERBOSE
同re.X，對應的內聯標識為(?x)。冗余模式，此模式下可以在表達式中添加注釋，使其更具可讀性，但在編譯時會忽略多余的空格和注釋。

模塊級方法 re.compile(pattern, flags=0)

編譯正則表達式pattern，返回一個SRE_Pattern對象。flags參數指定匹配模式。

re.search(pattern, string, flags=0)

掃描string參數，查找正則表達式pattern產生匹配的第一個結果，返回一個SRE_Match對象。如果返回None表示匹配失敗

re.match(pattern, string, flags=0)

如果string參數開頭的0個或多個字符匹配正則表達式pattern，返回一個SRE_Match對象。如果返回None表示匹配失敗

即使在MULTILINE模式下，match()函數也只會匹配字符串開頭，而不會匹配每行開頭

re.fullmatch(pattern, string, flags=0)

如果string參數整個匹配正則表達式pattern，返回一個SRE_Match對象。如果返回None表示匹配失敗。

re.split(pattern, string, maxsplit=0, flags=0)

正則表達式pattern作為分隔符拆分string參數，返回拆分后的列表。maxsplit如果不為0，最多拆分maxsplit次，string參數的余下部分將作為列表的最后一個元素返回。如果在pattern中使用了分組(...)，返回列表中還會包含所有匹配的分組本身。

>>> re.split(r"W+", "Words, words, words.")
["Words", "words", "words", ""]

>>> re.split(r"(W+)", "Words, words, words.")
["Words", ", ", "words", ", ", "words", ".", ""]

>>> re.split(r"W+", "Words, words, words.", 1)
["Words", "words, words."]

>>> re.split("[a-f]+", "0a3B9", flags=re.IGNORECASE)
["0", "3", "9"]

如果pattern在字符串的開頭匹配，那么返回列表第一個元素是空字符串；同樣地，如果pattern在字符串末尾匹配，返回列表的最后一個元素是空字符串：

>>> re.split(r"(W+)", "...words, words...")
["", "...", "words", ", ", "words", "...", ""]

re.findall(pattern, string, flags=0)

返回一個列表，按順序排列所有成功的分組匹配。如果pattern參數中只有一個分組，列表元素為所有成功的分組匹配；如果存在超過一個以上的分組，列表元素為元組形式的各個分組匹配。如果返回空列表表示匹配失敗

>>> content = "333STR1666STR299"

>>> regex = r"([A-Z]+(d))"
>>> re.findall(regex, content)
[("STR1", "1"), ("STR2", "2")]

>>> regex1 = r"[A-Z]+(d)"
>>> re.findall(regex1, content)
["1", "2"]

# 如果正則表達式不含分組，視其整體為一個分組
>>> regex2 = r"[A-Z]+d"
>>> re.findall(regex2, content)
["STR1", "STR2"]

>>> regex3 = r"([A-Z]+d)"
>>> re.findall(regex3, content)
["STR1", "STR2"]

re.finditer(pattern, string, flags=0)

查找所有匹配成功的字符串, 返回一個迭代器，元素為SRE_Match對象。如果返回空迭代器表示匹配失敗

content = "333STR1666STR299"
regex = r"([A-Z]+(d))"
result = re.finditer(regex, content)
for i in result:
    print(i.group(0))

# STR1
# STR2

re.sub(pattern, repl, string, count=0, flags=0)

使用pattern匹配原始字符串string，將匹配到的結果用repl替換，返回一個新的字符串。如果沒有匹配返回原字符串。

count是一個正整數，表示字符串替換的最大次數。

repl可以是字符串或函數，如果是字符串，其中的的所有都將進行轉義處理，比如表示換行符，反向引用6表示pattern匹配的第六個分組，而某些無意義的轉義可能原樣保留或導致異常：

>>> re.sub(r"defs+([a-zA-Z_][a-zA-Z_0-9]*)s*(s*):",
...        r"static PyObject*
py_1(void)
{",
...        "def myfunc():")
"static PyObject*
py_myfunc(void)
{"

如果repl是函數，該函數接收單個SRE_Match對象為參數，pattern匹配到一次結果便會調用一次該函數，返回要替換的字符串：

>>> def dashrepl(matchobj):
...     if matchobj.group(0) == "-": return " "
...     else: return "-"
>>> re.sub("-{1,2}", dashrepl, "pro----gram-files")
"pro--gram files"
>>> re.sub(r"sANDs", " & ", "Baked Beans And Spam", flags=re.IGNORECASE)
"Baked Beans & Spam"

re.subn(pattern, repl, string, count=0, flags=0)

同sub()，但返回值為(new_string, number_of_subs_made)

re.escape(pattern)

轉義特殊字符。

re.purge()

清空正則表達式緩存。

異常

exception re.error(msg, pattern=None, pos=None)

屬性

msg：未格式化的錯誤信息

pattern：正則表達式

pos：導致異常的pattern索引位置，可能為None

lineno：pos在第幾行，可能為None

colno：pos在所在行的位置，可能為None

Pattern對象

方法

Pattern.search(string[, pos[, endpos]])

與模塊級的search()類似。pos和endpos表示string參數的前endpos個字符中，從索引為pos的位置開始匹配，如果endpos小于等于pos，返回None

Pattern.match(string[, pos[, endpos]])

與模塊級的match()類似。pos和endpos參數意義同search()

>>> pattern = re.compile("o")
>>> pattern.match("dog")      # No match as "o" is not at the start of "dog".
>>> pattern.match("dog", 1)   # Match as "o" is the 2nd character of "dog".

Pattern.fullmatch(string[, pos[, endpos]])

與模塊級的fullmatch()類似。pos和endpos參數意義同search()

>>> pattern = re.compile("o[gh]")
>>> pattern.fullmatch("dog")      # No match as "o" is not at the start of "dog".
>>> pattern.fullmatch("ogre")     # No match as not the full string matches.
>>> pattern.fullmatch("doggie", 1, 3)   # Matches within given limits.

Pattern.split(string, maxsplit=0)

與模塊級的split()相同

Pattern.findall(string[, pos[, endpos]])

與模塊級的findall()類似。pos和endpos參數意義同search()

Pattern.finditer(string[, pos[, endpos]])

與模塊級的finditer()類似。pos和endpos參數意義同search()

Pattern.sub(repl, string, count=0)

與模塊級的sub()相同

Pattern.subn(repl, string, count=0)

與模塊級的subn()相同

屬性

Pattern.flags：返回一個正整數，表示正則匹配模式。該值是compile()函數中pattern參數中的內聯標識以及flags參數指定的模式，和隱式的re.UNICODE(如果pattern為Unicode字符串)的值的和

>>> re.UNICODE


>>> re.IGNORECASE


# 32 + 2
>>> re.compile("", flags=re.IGNORECASE).flags
34

Pattern.groups：pattern中存在的分組數量

Pattern.groupindex：正則表達式中所有命名分組名稱和對應分組號的映射；如果沒有使用命名分組，返回一個空字典

>>> pattern = re.compile(r"(?Pw+) (?Pw+)")
>>> pattern.groupindex
mappingproxy({"first_name": 1, "last_name": 2})

Pattern.pattern：編譯pattern對象的正則表達式

Match對象

方法

Match.expand(template)

通過對template中的反斜杠引用進行替換，返回替換后的字符串。例如將轉義為換行符，1, g將替換為Match對象中對應的分組：

>>> m = re.search("(b)+(z)?", "cba")
>>> m

>>> m.expand(r"ab1")
"abb"
>>> m.expand(r"ab2")
"ab"
>>> print(m.expand(r"ab
"))
ab

>>>

Match.group([group1, ...])

返回Match對象的一個或多個子分組。如果傳入單個參數，返回單個字符串；如果傳入多個參數，返回一個元組，元組中的每個元素代表每個參數對應的分組。

如果參數為0，返回值為pattern匹配的完整字符串

如果參數在1-99范圍內，返回對應分組匹配的字符串

如果參數為負數或大于pattern中定義的分組數量，拋出IndexError異常

如果對應分組無匹配，返回None

如果一個分組匹配多次，只返回最后一次匹配的結果

>>> m = re.match(r"(w+) (w+)(d+)?", "Isaac Newton, physicist")

>>> m.group(0)       # (1)
"Isaac Newton

>>> m.group(1)       # (2)
"Isaac"
>>> m.group(2)       # (2)
"Newton"
>>> m.group(1, 2)    # Multiple arguments give us a tuple.
("Isaac", "Newton")

>>> type(m.group(3)) # (4)


>>> m = re.match(r"(..)+", "a1b2c3")  # Matches 3 times.
>>> m.group(1)                        # (5)
"c3"

如果正則表達式中使用了(?P...)，group()也支持通過分組名的方式訪問分組，分組名不存在將拋出IndexError異常：

>>> m = re.match(r"(?Pw+) (?Pw+)", "Malcolm Reynolds")
>>> m.group("first_name")
"Malcolm"
>>> m.group("last_name")
"Reynolds"

# 仍然可以通過索引訪問
>>> m.group(1)
"Malcolm"
>>> m.group(2)
"Reynolds"

Match.__getitem__(g)

等同于group()，提供了更簡單的訪問分組的方式：

>>> m = re.match(r"(w+) (w+)", "Isaac Newton, physicist")
>>> m[0]       # The entire match
"Isaac Newton"
>>> m[1]       # The first parenthesized subgroup.
"Isaac"
>>> m[2]       # The second parenthesized subgroup.
"Newton"

>>> m = re.match(r"(?Pw+) (?Pw+)", "Malcolm Reynolds")
>>> m["first_name"]
"Malcolm"

Match.groups(default=None)

返回一個包含所有子分組的元組，元組長度等同于pattern中的分組數量；如果沒有分組，返回空元組。default參數作為分組無匹配值時的默認值，默認為None：

>>> m = re.match(r"(d+).(d+)", "24.1632")
>>> m.groups()
("24", "1632")

>>> m = re.match(r"(d+).?(d+)?", "24")
>>> m.groups()      # Second group defaults to None.
("24", None)
>>> m.groups("0")   # Now, the second group defaults to "0".
("24", "0")

Match.groupdict(default=None)

返回一個字典，key為pattern中定義的分組名稱，value為分組的匹配值；如果沒有使用命名元組，返回空字典。default參數作為分組無匹配值時的默認值，默認為None：

>>> m = re.match(r"(?Pw+) (?Pw+)", "Malcolm Reynolds")
>>> m.groupdict()
{"first_name": "Malcolm", "last_name": "Reynolds"}

Match.start([group]) Match.end([group])

返回由group匹配的子字符串在原始字符串中的開始和結束索引。

group默認為0，表示完整匹配結果。

如果返回-1，表示group存在但沒有匹配值

如果m.start(group)等同于m.end(group)，表示group匹配一個空字符串

>>> m = re.match(r"(w+) (w+)(d)?", "Isaac Newton, physicist")
>>> m


# (1)
>>> m.start()
0
>>> m.end()
12

# (2)
>>> type(m[3])

>>> m.start(3)
-1
>>> m.end(3)
-1

# (3)
>>> m[3]
""
>>> m.start(3)
12
>>> m.end(3)
12

Match.span([group])

返回(m.start(group), m.end(group))形式的元組，如果group不存在對應匹配值，返回(-1, -1)。group默認為0，表示完整匹配結果

屬性

Match.pos：傳遞給Pattern對象的search(), match(), fullmatch()方法的pos參數

Match.endpos：傳遞給Pattern對象的search(), match(), fullmatch()方法的endpos參數

Match.lastindex：具有匹配值的最后一個分組的位置，如果沒有任何分組匹配，返回None。

>>> m = re.search(r"a(z)?", "ab")
>>> type(m.lastindex)


>>> m = re.match(r"(w+) (w+)(d)?", "Isaac Newton, physicist")
>>> m.lastindex
2

Match.lastgroup：具有匹配值的最后一個分組的名稱，如果沒有命名分組或沒有任何分組匹配，返回None

Match.re：創建當前Match對象的Pattern對象

Match.string：進行匹配的原始字符串

3.7版本re模塊新特性

Non-empty matches can now start just after a previous empty match：

# python3.7之前
>>> re.sub("x*", "-", "abxd")
"-a-b-d-"

# python3.7
>>> re.sub("x*", "-", "abxd")
"-a-b--d-"

Unknown escapes in repl consisting of "" and an ASCII letter now are errors：

# python3.7之前
>>> print(re.sub(r"w+", r"d", "ab&xd&"))
d&d&

# python3.7
>>> print(re.sub(r"w+", r"d", "ab&xd&"))
...
re.error: bad escape d at position 0

Only characters that can have special meaning in a regular expression are escaped：

# python3.7之前
>>> print(re.escape("!#$%&"))
!#$\%&

# python3.7
>>> print(re.escape("!#$%&"))
!#$%&

Added support of splitting on a pattern that could match an empty string：

# python3.7之前
>>> re.split(r"", "Words, words, words.")
...
ValueError: split() requires a non-empty pattern match.

>>> re.split(r"W*", "...words...")
["", "words", ""]

>>> re.split(r"(W*)", "...words...")
["", "...", "words", "...", ""]

# python3.7
>>> re.split(r"", "Words, words, words.")
["", "Words", ", ", "words", ", ", "words", "."]

>>> re.split(r"W*", "...words...")
["", "", "w", "o", "r", "d", "s", "", ""]

>>> re.split(r"(W*)", "...words...")
["", "...", "", "", "w", "", "o", "", "r", "", "d", "", "s", "...", "", "", ""]

Added support of copy.copy() and copy.deepcopy(). Match objects are considered atomic

GPU云服務器云服務器 python 正則表達式 python正則表達 python3正則表達式 python數據庫正則表達式

文章版權歸作者所有，未經允許請勿轉載,若此文章存在違規行為，您可以聯系管理員刪除。

轉載請注明本文地址：http://specialneedsforspecialkids.com/yun/43511.html

Python 正則表達式

摘要：今天就專門看看正則表達式。下面是一個正則表達式最簡單的使用例子。這個例子使用了正則表達式模塊的函數，它會返回所有符合模式的列表。查詢標志讓正則表達式具有不同的行為。，按給定正則表達式分割字符串。，正則表達式中捕獲組的數量。最近研究Python爬蟲，很多地方用到了正則表達式，但是沒好好研究，每次都得現查文檔。今天就專門看看Python正則表達式。本文參考了官方文檔 re模塊。模式首...

FrancisSoung 2019-07-25 11:44 評論0 收藏0
Python基礎教程：-正則表達式基本語法以及re模塊

摘要：正則表達式關閉或可選標志。如果所含正則表達式，以表示，在當前位置成功匹配時成功，否則失敗。否則指的是八進制字符碼的表達式。正則表達式是個很牛逼的東西，不管是在javascript，還是在Python web開發（http://www.maiziedu.com/course/python-px...）中，我們都會遇到正則表達式，雖然javascript和Python的正則表達式區別不大...

y1chuan 2019-07-24 18:35 評論0 收藏0
Python正則表達式保姆式教學，帶你精通大名鼎鼎的正則！

摘要：今天來給大家分享一份關于比較詳細的正則表達式寶典，學會之后你將對正則表達式達到精通的狀態。正則表達式是用在方法當中，大多數的字符串檢索都可以通過來完成。導入模塊在使用正則表達式之前，需要導入模塊。 ...

tulayang 2021-09-02 15:11 評論0 收藏0
Python 正則表達式 re 模塊簡明筆記

摘要：假設現在想把字符串你好，，世界中的中文提取出來，可以這么做你好，，世界注意到，我們在正則表達式前面加上了兩個前綴，其中表示使用原始字符串，表示是字符串。本文標題為正則表達式模塊簡明筆記本文鏈接為參考資料正則表達式簡介正則表達式（regular expression）是可以匹配文本片段的模式。最簡單的正則表達式就是普通字符串，可以匹配其自身。比如，正則表達式 hello 可以匹配字符...

lastSeries 2019-07-25 11:16 評論0 收藏0
眾里尋她千百度--正則表達式

摘要：如果經過一系列輸入，最終如果能達到狀態，則輸入內容一定滿足正則表達式。正則表達式可以轉換為，已經有成熟的算法實現這一轉換。不過有時候轉換為可能導致狀態空間的指數增長，因此直接用識別正則表達式。原文地址先來看一個讓人震撼的小故事，故事來自知乎問題PC用戶的哪些行為讓你當時就震驚了？同學在一個化妝品公司上班，旁邊一個大媽（四十多歲）發給他一個exl表，讓他在里面幫忙找一個經銷商的資料...

golden_hamster 2019-07-24 17:37 評論0 收藏0