使用 exec 函數(shù)時(shí)需要注意的一些安全問題

B0B0 發(fā)布于2019-06-21 16:29 / 784人閱讀

摘要：如果一定要用的話，那么就需要注意一下下面這些安全相關(guān)的問題。全局變量和內(nèi)置函數(shù)在執(zhí)行的代碼中，默認(rèn)可以訪問執(zhí)行時(shí)的局部變量和全局變量，同樣也會(huì)修改全局變量。所以我們的檢查代碼可以這樣寫我所知道的使用函數(shù)時(shí)需要注意的安全問題就是這些了。

眾所周知，在 python 中可以使用 exec 函數(shù)來執(zhí)行包含 python 源代碼的字符串:

>>> code = """
   ...: a = "hello"
   ...: print(a)
   ...: """
>>> exec(code)
hello
>>> a
"hello"

exec 函數(shù)的這個(gè)功能很是強(qiáng)大，慎用。如果一定要用的話，那么就需要注意一下下面這些安全相關(guān)的問題。

全局變量和內(nèi)置函數(shù)

在 exec 執(zhí)行的代碼中，默認(rèn)可以訪問執(zhí)行 exec 時(shí)的局部變量和全局變量，同樣也會(huì)修改全局變量。如果 exec 執(zhí)行的代碼是根據(jù)用戶提交的數(shù)據(jù)生產(chǎn)的話，這種默認(rèn)行為就是一個(gè)安全隱患。

如何更改這種默認(rèn)行為呢？可以通過執(zhí)行 exec 函數(shù)的時(shí)候再傳兩個(gè)參數(shù)的方式來修改這種行為（詳見之前關(guān)于 exec 的文章）:

>>> g = {}
>>> l = {"b": "world"}
>>> exec("hello = "hello" + b", g, l)
>>> l
{"b": "world", "hello": "helloworld"}
>>> g
{"__builtins__": {...}}
>>> hello
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
...
NameError: name "hello" is not defined

如果要限制使用內(nèi)置函數(shù)的話，可以在 globals 參數(shù)中定義一下 __builtins__ 這個(gè) key:

>>> g = {}
>>> l = {}
>>> exec("a = int("1")", g, l)
>>> l
{"a": 1}

>>> g = {"__builtins__": {}}
>>> exec("a = int("1")", g, l)
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 1, in 
NameError: name "int" is not defined
>>>

現(xiàn)在我們限制了訪問和修改全局變量以及使用內(nèi)置函數(shù)，難道這樣就萬事大吉了嗎？然而并非如此，還是可以通過其他的方式來獲取內(nèi)置函數(shù)甚至 os.system 函數(shù)。

另辟蹊徑獲取內(nèi)置函數(shù)和 os.system

通過函數(shù)對象:

>>> def a(): pass
...
>>> a.__globals__["__builtins__"]

>>> a.__globals__["__builtins__"].open

通過內(nèi)置類型對象:

>>> for cls in {}.__class__.__base__.__subclasses__():
...     if cls.__name__ == "WarningMessage":
...         b = cls.__init__.__globals__["__builtins__"]
...         b["open"]
...

>>>

獲取 os.system:

>>> cls = [x for x in [].__class__.__base__.__subclasses__() if x.__name__ == "_wrap_close"][0]
>>> cls.__init__.__globals__["path"].os

>>>

對于這兩種辦法又如何應(yīng)對呢？一種辦法就是禁止訪問以 _ 開頭的屬性：

如果可以控制 code 的生成，那么就在生成 code 的時(shí)候判斷

~~如果不能的話，可以通過 dis 模塊分析生成的 code~~ （dist 無法分析嵌套函數(shù)的代碼）

使用 tokenize 模塊:

    In [68]: from io import BytesIO
    In [69]: code = """
       ....: a = "b"
       ....: a.__str__
       ....: def b():
       ....:     b.__get__
       ....: """
    In [70]: t = tokenize(BytesIO(code.encode()).readline)
    In [71]: for x in t:
       ....:     print(x)
       ....:
    TokenInfo(type=59 (ENCODING), string="utf-8", start=(0, 0), end=(0, 0), line="")
    TokenInfo(type=58 (NL), string="
", start=(1, 0), end=(1, 1), line="
")
    TokenInfo(type=1 (NAME), string="a", start=(2, 0), end=(2, 1), line="a = "b"
")
    TokenInfo(type=53 (OP), string="=", start=(2, 2), end=(2, 3), line="a = "b"
")
    TokenInfo(type=3 (STRING), string=""b"", start=(2, 4), end=(2, 7), line="a = "b"
")
    TokenInfo(type=4 (NEWLINE), string="
", start=(2, 7), end=(2, 8), line="a = "b"
")
    TokenInfo(type=1 (NAME), string="a", start=(3, 0), end=(3, 1), line="a.__str__
")
    TokenInfo(type=53 (OP), string=".", start=(3, 1), end=(3, 2), line="a.__str__
")
    TokenInfo(type=1 (NAME), string="__str__", start=(3, 2), end=(3, 9), line="a.__str__
")
    TokenInfo(type=4 (NEWLINE), string="
", start=(3, 9), end=(3, 10), line="a.__str__
")
    TokenInfo(type=1 (NAME), string="def", start=(4, 0), end=(4, 3), line="def b():
")
    TokenInfo(type=1 (NAME), string="b", start=(4, 4), end=(4, 5), line="def b():
")
    TokenInfo(type=53 (OP), string="(", start=(4, 5), end=(4, 6), line="def b():
")
    TokenInfo(type=53 (OP), string=")", start=(4, 6), end=(4, 7), line="def b():
")
    TokenInfo(type=53 (OP), string=":", start=(4, 7), end=(4, 8), line="def b():
")
    TokenInfo(type=4 (NEWLINE), string="
", start=(4, 8), end=(4, 9), line="def b():
")
    TokenInfo(type=5 (INDENT), string="    ", start=(5, 0), end=(5, 4), line="    b.__get__
")
    TokenInfo(type=1 (NAME), string="b", start=(5, 4), end=(5, 5), line="    b.__get__
")
    TokenInfo(type=53 (OP), string=".", start=(5, 5), end=(5, 6), line="    b.__get__
")
    TokenInfo(type=1 (NAME), string="__get__", start=(5, 6), end=(5, 13), line="    b.__get__
")
    TokenInfo(type=4 (NEWLINE), string="
", start=(5, 13), end=(5, 14), line="    b.__get__
")
    TokenInfo(type=6 (DEDENT), string="", start=(6, 0), end=(6, 0), line="")
    TokenInfo(type=0 (ENDMARKER), string="", start=(6, 0), end=(6, 0), line="")

從上面的輸出我們可以知道當(dāng) type 是 OP 并且 string 等于 "." 時(shí)，下一條記錄就是
點(diǎn)之后的屬性名稱。所以我們的檢查代碼可以這樣寫:

    import io
    import tokenize


    def check_unsafe_attributes(string):
        g = tokenize.tokenize(io.BytesIO(string.encode("utf-8")).readline)
        pre_op = ""
        for toktype, tokval, _, _, _ in g:
            if toktype == tokenize.NAME and pre_op == "." and tokval.startswith("_"):
                attr = tokval
                msg = "access to attribute "{0}" is unsafe.".format(attr)
                raise AttributeError(msg)
            elif toktype == tokenize.OP:
                pre_op = tokval

我所知道的使用 exec 函數(shù)時(shí)需要注意的安全問題就是這些了。如果你還知道其他需要注意的安全問題的話，歡迎留言告知。