python綜合學(xué)習(xí)五之Pandas

Miracle 發(fā)布于2019-07-30 17:38 / 571人閱讀

摘要：后者選擇到標(biāo)簽之間的數(shù)據(jù)，并且包括這兩個(gè)標(biāo)簽。根據(jù)位置設(shè)置和我們可以利用索引或者標(biāo)簽確定需要修改值的位置。通過上邊的學(xué)習(xí)，我們學(xué)會(huì)了如何對(duì)中在自己想要的地方賦值或者增加數(shù)據(jù)。

這一節(jié)，主要深入學(xué)習(xí)Pandas的用法。

一、篩選

先建立一個(gè) 6X4 的矩陣數(shù)據(jù)。

dates = pd.date_range("20180830", periods=6)
df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates, columns=["A","B","C","D"])
print(df)

打印：

             A   B   C   D
2018-08-30   0   1   2   3
2018-08-31   4   5   6   7
2018-09-01   8   9  10  11
2018-09-02  12  13  14  15
2018-09-03  16  17  18  19
2018-09-04  20  21  22  23

簡(jiǎn)單的篩選

如果我們想選取 DataFrame 中的數(shù)據(jù)，下面描述了兩種途徑, 他們都能達(dá)到同一個(gè)目的：

print(df["A"])
print(df.A)

"""
2018-08-30     0
2018-08-31     4
2018-09-01     8
2018-09-02    12
2018-09-03    16
2018-09-04    20
Freq: D, Name: A, dtype: int64
"""

讓選擇跨越多行或多列:

print(df[0:3])
 
"""
            A  B   C   D
2018-08-30  0  1   2   3
2018-08-31  4  5   6   7
2018-09-01  8  9  10  11
"""

print(df["20180830":"20180901"])

"""
            A  B   C   D
2018-08-30  0  1   2   3
2018-08-31  4  5   6   7
2018-09-01  8  9  10  11
"""

如果df[3:3]將會(huì)是一個(gè)空對(duì)象。后者選擇20180830到20180901標(biāo)簽之間的數(shù)據(jù)，并且包括這兩個(gè)標(biāo)簽。

根據(jù)標(biāo)簽loc

同樣我們可以使用標(biāo)簽來選擇數(shù)據(jù) loc, 本例子主要通過標(biāo)簽名字選擇某一行數(shù)據(jù)，或者通過選擇某行或者所有行（:代表所有行）然后選其中某一列或幾列數(shù)據(jù)。:

print(df.loc["20130102"])
"""
A    4
B    5
C    6
D    7
Name: 2013-01-02 00:00:00, dtype: int64
"""

print(df.loc[:,["A","B"]]) 
"""
             A   B
2013-01-01   0   1
2013-01-02   4   5
2013-01-03   8   9
2013-01-04  12  13
2013-01-05  16  17
2013-01-06  20  21
"""

print(df.loc["20130102",["A","B"]])
"""
A    4
B    5
Name: 2013-01-02 00:00:00, dtype: int64
"""

根據(jù)序列iloc

另外我們可以采用位置進(jìn)行選擇 iloc, 在這里我們可以通過位置選擇在不同情況下所需要的數(shù)據(jù)例如選某一個(gè)，連續(xù)選或者跨行選等操作。

print(df.iloc[3,1])
# 13

print(df.iloc[3:5,1:3])
"""
             B   C
2013-01-04  13  14
2013-01-05  17  18
"""

print(df.iloc[[1,3,5],1:3])
"""
             B   C
2013-01-02   5   6
2013-01-04  13  14
2013-01-06  21  22

"""

在這里我們可以通過位置選擇在不同情況下所需要的數(shù)據(jù), 例如選某一個(gè)，連續(xù)選或者跨行選等操作。

根據(jù)混合的這兩種 ix

當(dāng)然我們可以采用混合選擇 ix, 其中選擇’A’和’C’的兩列，并選擇前三行的數(shù)據(jù)。

print(df.ix[:3,["A","C"]])
"""
            A   C
2013-01-01  0   2
2013-01-02  4   6
2013-01-03  8  10
"""

二、設(shè)置值

我們可以根據(jù)自己的需求, 用 pandas 進(jìn)行更改數(shù)據(jù)里面的值, 或者加上一些空的,或者有數(shù)值的列.

首先建立了一個(gè) 6X4 的矩陣數(shù)據(jù)。

# -*- coding:utf-8 -*-

"""
@author: Corwien
@file: pd_value.py
@time: 18/8/31 00:59
"""

import pandas as pd
import numpy as np

dates = pd.date_range("20180101", periods=6)
df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates, columns=["A","B","C","D"])

print(df)

"""
             A   B   C   D
2018-01-01   0   1   2   3
2018-01-02   4   5   6   7
2018-01-03   8   9  10  11
2018-01-04  12  13  14  15
2018-01-05  16  17  18  19
2018-01-06  20  21  22  23
"""

根據(jù)位置設(shè)置loc和iloc

我們可以利用索引或者標(biāo)簽確定需要修改值的位置。

df.iloc[2,3] = 1111
df.loc["20180103", "B"] = 2222

print(df)

打印：

             A     B   C     D
2018-01-01   0     1   2     3
2018-01-02   4     5   6     7
2018-01-03   8  2222  10  1111
2018-01-04  12    13  14    15
2018-01-05  16    17  18    19
2018-01-06  20    21  22    23

根據(jù)條件設(shè)置

如果現(xiàn)在的判斷條件是這樣, 我們想要更改B中的數(shù), 而更改的位置是取決于 A 的. 對(duì)于A大于4的位置. 更改B在相應(yīng)位置上的數(shù)為0.

df.B[df.A>4] = 0
print(df)

原數(shù)據(jù)：

             A     B   C     D
2018-01-01   0     1   2     3
2018-01-02   4     5   6     7
2018-01-03   8  2222  10  1111
2018-01-04  12    13  14    15
2018-01-05  16    17  18    19
2018-01-06  20    21  22    23

df.B[df.A>4] = 0更改后的數(shù)據(jù)：

    A  B   C     D
2018-01-01   0  1   2     3
2018-01-02   4  5   6     7
2018-01-03   8  0  10  1111
2018-01-04  12  0  14    15
2018-01-05  16  0  18    19
2018-01-06  20  0  22    23

按行或列設(shè)置

如果對(duì)整列做批處理, 加上一列 ‘F’, 并將 F 列全改為 NaN, 如下:

df["F"] = np.nan
"""
          A  B   C     D   F
2018-01-01   0  1   2     3 NaN
2018-01-02   4  5   6     7 NaN
2018-01-03   8  0  10  1111 NaN
2018-01-04  12  0  14    15 NaN
2018-01-05  16  0  18    19 NaN
2018-01-06  20  0  22    23 NaN
"""

添加數(shù)據(jù)

用上面的方法也可以加上 Series 序列（但是長度必須對(duì)齊）。

        A  B   C     D   F  E
2018-01-01   0  1   2     3 NaN  1
2018-01-02   4  5   6     7 NaN  2
2018-01-03   8  0  10  1111 NaN  3
2018-01-04  12  0  14    15 NaN  4
2018-01-05  16  0  18    19 NaN  5
2018-01-06  20  0  22    23 NaN  6

通過上邊的學(xué)習(xí)，我們學(xué)會(huì)了如何對(duì) DataFrame 中在自己想要的地方賦值或者增加數(shù)據(jù)。

三、處理丟失數(shù)據(jù)

創(chuàng)建含 NaN 的矩陣

有時(shí)候我們導(dǎo)入或處理數(shù)據(jù), 會(huì)產(chǎn)生一些空的或者是 NaN 數(shù)據(jù),如何刪除或者是填補(bǔ)這些 NaN 數(shù)據(jù)就是我們今天所要提到的內(nèi)容.

建立了一個(gè)6X4的矩陣數(shù)據(jù)并且把兩個(gè)位置置為空.

dates = pd.date_range("20130101", periods=6)
df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates, columns=["A","B","C","D"])
df.iloc[0,1] = np.nan
df.iloc[1,2] = np.nan
"""
             A     B     C   D
2013-01-01   0   NaN   2.0   3
2013-01-02   4   5.0   NaN   7
2013-01-03   8   9.0  10.0  11
2013-01-04  12  13.0  14.0  15
2013-01-05  16  17.0  18.0  19
2013-01-06  20  21.0  22.0  23
"""

pd.dropna()

如果想直接去掉有 NaN 的行或列, 可以使用 dropna

df.dropna(
    axis=0,     # 0: 對(duì)行進(jìn)行操作; 1: 對(duì)列進(jìn)行操作
    how="any"   # "any": 只要存在 NaN 就 drop 掉; "all": 必須全部是 NaN 才 drop 
    ) 
"""
             A     B     C   D
2013-01-03   8   9.0  10.0  11
2013-01-04  12  13.0  14.0  15
2013-01-05  16  17.0  18.0  19
2013-01-06  20  21.0  22.0  23
"""

pd.fillna()

如果是將 NaN 的值用其他值代替, 比如代替成 0:

df.fillna(value=0)
"""
             A     B     C   D
2013-01-01   0   0.0   2.0   3
2013-01-02   4   5.0   0.0   7
2013-01-03   8   9.0  10.0  11
2013-01-04  12  13.0  14.0  15
2013-01-05  16  17.0  18.0  19
2013-01-06  20  21.0  22.0  23
"""

pd.isnull()

判斷是否有缺失數(shù)據(jù) NaN, 為 True 表示缺失數(shù)據(jù):

df.isnull() 
"""
                A      B      C      D
2013-01-01  False   True  False  False
2013-01-02  False  False   True  False
2013-01-03  False  False  False  False
2013-01-04  False  False  False  False
2013-01-05  False  False  False  False
2013-01-06  False  False  False  False
"""

檢測(cè)在數(shù)據(jù)中是否存在 NaN, 如果存在就返回 True:

np.any(df.isnull()) == True  
# True

四、導(dǎo)入導(dǎo)出 說明

pandas可以讀取與存取的資料格式有很多種，像csv、excel、json、html與pickle等…，詳細(xì)請(qǐng)看官方說明文件

讀取csv

import pandas as pd #加載模塊

#讀取csv
data = pd.read_csv("student.csv")

#打印出data
print(data)

打印結(jié)果：

/Users/kaiyiwang/anaconda2/bin/python /Users/kaiyiwang/Code/python/baseLearn/pandas/pd_csv.py
    Student ID  name   age  gender
0         1100  Kelly   22  Female
1         1101    Clo   21  Female
2         1102  Tilly   22  Female
3         1103   Tony   24    Male
4         1104  David   20    Male
5         1105  Catty   22  Female
6         1106      M    3  Female
7         1107      N   43    Male
8         1108      A   13    Male
9         1109      S   12    Male
10        1110  David   33    Male
11        1111     Dw    3  Female
12        1112      Q   23    Male
13        1113      W   21  Female

將資料存取成pickle

data.to_pickle("student.pickle")

五、合并concat

pandas處理多組數(shù)據(jù)的時(shí)候往往會(huì)要用到數(shù)據(jù)的合并處理,使用 concat 是一種基本的合并方式.而且concat中有很多參數(shù)可以調(diào)整,合并成你想要的數(shù)據(jù)形式.

axis(合并方向)

axis=0是預(yù)設(shè)值，因此未設(shè)定任何參數(shù)時(shí)，函數(shù)默認(rèn)axis=0。

# -*- coding:utf-8 -*-

"""
@author: Corwien
@file: pd_concat.py
@time: 18/9/1 10:28
"""
   
import pandas as pd #加載模塊
import numpy as np

#定義資料集
df1 = pd.DataFrame(np.ones((3,4))*0, columns=["a","b","c","d"])
df2 = pd.DataFrame(np.ones((3,4))*1, columns=["a","b","c","d"])
df3 = pd.DataFrame(np.ones((3,4))*2, columns=["a","b","c","d"])

# print df1

#concat縱向合并
res = pd.concat([df1, df2, df3], axis=0)

print(res)

結(jié)果打印：

 a    b    c    d
0  0.0  0.0  0.0  0.0
1  0.0  0.0  0.0  0.0
2  0.0  0.0  0.0  0.0
0  1.0  1.0  1.0  1.0
1  1.0  1.0  1.0  1.0
2  1.0  1.0  1.0  1.0
0  2.0  2.0  2.0  2.0
1  2.0  2.0  2.0  2.0
2  2.0  2.0  2.0  2.0

仔細(xì)觀察會(huì)發(fā)現(xiàn)結(jié)果的index是0, 1, 2, 0, 1, 2, 0, 1, 2，若要將index重置，請(qǐng)看例子二。

ignore_index (重置 index)

#承上一個(gè)例子，并將index_ignore設(shè)定為True
res = pd.concat([df1, df2, df3], axis=0, ignore_index=True)

#打印結(jié)果
print(res)
#     a    b    c    d
# 0  0.0  0.0  0.0  0.0
# 1  0.0  0.0  0.0  0.0
# 2  0.0  0.0  0.0  0.0
# 3  1.0  1.0  1.0  1.0
# 4  1.0  1.0  1.0  1.0
# 5  1.0  1.0  1.0  1.0
# 6  2.0  2.0  2.0  2.0
# 7  2.0  2.0  2.0  2.0
# 8  2.0  2.0  2.0  2.0

結(jié)果的index變0, 1, 2, 3, 4, 5, 6, 7, 8。

join (合并方式)

join="outer"為預(yù)設(shè)值，因此未設(shè)定任何參數(shù)時(shí)，函數(shù)默認(rèn)join="outer"。此方式是依照column來做縱向合并，有相同的column上下合并在一起，其他獨(dú)自的column個(gè)自成列，原本沒有值的位置皆以NaN填充。

#定義資料集
df1 = pd.DataFrame(np.ones((3,4))*0, columns=["a","b","c","d"], index=[1,2,3])
df2 = pd.DataFrame(np.ones((3,4))*1, columns=["b","c","d","e"], index=[2,3,4])

print(df1)
print("======
")

print(df2)
print("======
")

#縱向"外"合并df1與df2
res = pd.concat([df1, df2], axis=0, join="outer")

print(res)

結(jié)果打印：

    a    b    c    d
1  0.0  0.0  0.0  0.0
2  0.0  0.0  0.0  0.0
3  0.0  0.0  0.0  0.0
======

     b    c    d    e
2  1.0  1.0  1.0  1.0
3  1.0  1.0  1.0  1.0
4  1.0  1.0  1.0  1.0
======

     a    b    c    d    e
1  0.0  0.0  0.0  0.0  NaN
2  0.0  0.0  0.0  0.0  NaN
3  0.0  0.0  0.0  0.0  NaN
2  NaN  1.0  1.0  1.0  1.0
3  NaN  1.0  1.0  1.0  1.0
4  NaN  1.0  1.0  1.0  1.0

原理同上個(gè)例子的說明，但只有相同的column合并在一起，其他的會(huì)被拋棄。

#承上一個(gè)例子

#縱向"內(nèi)"合并df1與df2
res = pd.concat([df1, df2], axis=0, join="inner")

#打印結(jié)果
print(res)
#     b    c    d
# 1  0.0  0.0  0.0
# 2  0.0  0.0  0.0
# 3  0.0  0.0  0.0
# 2  1.0  1.0  1.0
# 3  1.0  1.0  1.0
# 4  1.0  1.0  1.0

#重置index并打印結(jié)果
res = pd.concat([df1, df2], axis=0, join="inner", ignore_index=True)
print(res)
#     b    c    d
# 0  0.0  0.0  0.0
# 1  0.0  0.0  0.0
# 2  0.0  0.0  0.0
# 3  1.0  1.0  1.0
# 4  1.0  1.0  1.0
# 5  1.0  1.0  1.0

join_axes (依照axes合并)

#定義資料集
df1 = pd.DataFrame(np.ones((3,4))*0, columns=["a","b","c","d"], index=[1,2,3])
df2 = pd.DataFrame(np.ones((3,4))*1, columns=["b","c","d","e"], index=[2,3,4])

#依照`df1.index`進(jìn)行橫向合并
res = pd.concat([df1, df2], axis=1, join_axes=[df1.index])

#打印結(jié)果
print(res)
#     a    b    c    d    b    c    d    e
# 1  0.0  0.0  0.0  0.0  NaN  NaN  NaN  NaN
# 2  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0
# 3  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0

#移除join_axes，并打印結(jié)果
res = pd.concat([df1, df2], axis=1)
print(res)
#     a    b    c    d    b    c    d    e
# 1  0.0  0.0  0.0  0.0  NaN  NaN  NaN  NaN
# 2  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0
# 3  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0
# 4  NaN  NaN  NaN  NaN  1.0  1.0  1.0  1.0

append (添加數(shù)據(jù))

append只有縱向合并，沒有橫向合并。

import pandas as pd
import numpy as np

#定義資料集
df1 = pd.DataFrame(np.ones((3,4))*0, columns=["a","b","c","d"])
df2 = pd.DataFrame(np.ones((3,4))*1, columns=["a","b","c","d"])
df3 = pd.DataFrame(np.ones((3,4))*1, columns=["a","b","c","d"])
s1 = pd.Series([1,2,3,4], index=["a","b","c","d"])

#將df2合并到df1的下面，以及重置index，并打印出結(jié)果
res = df1.append(df2, ignore_index=True)
print(res)
#     a    b    c    d
# 0  0.0  0.0  0.0  0.0
# 1  0.0  0.0  0.0  0.0
# 2  0.0  0.0  0.0  0.0
# 3  1.0  1.0  1.0  1.0
# 4  1.0  1.0  1.0  1.0
# 5  1.0  1.0  1.0  1.0

#合并多個(gè)df，將df2與df3合并至df1的下面，以及重置index，并打印出結(jié)果
res = df1.append([df2, df3], ignore_index=True)
print(res)
#     a    b    c    d
# 0  0.0  0.0  0.0  0.0
# 1  0.0  0.0  0.0  0.0
# 2  0.0  0.0  0.0  0.0
# 3  1.0  1.0  1.0  1.0
# 4  1.0  1.0  1.0  1.0
# 5  1.0  1.0  1.0  1.0
# 6  1.0  1.0  1.0  1.0
# 7  1.0  1.0  1.0  1.0
# 8  1.0  1.0  1.0  1.0

#合并series，將s1合并至df1，以及重置index，并打印出結(jié)果
res = df1.append(s1, ignore_index=True)
print(res)
#     a    b    c    d
# 0  0.0  0.0  0.0  0.0
# 1  0.0  0.0  0.0  0.0
# 2  0.0  0.0  0.0  0.0
# 3  1.0  2.0  3.0  4.0

六、合并merge

pandas中的merge和concat類似,但主要是用于兩組有key column的數(shù)據(jù),統(tǒng)一索引的數(shù)據(jù). 通常也被用在Database的處理當(dāng)中.

依據(jù)一組key合并

import pandas as pd

#定義資料集并打印出
left = pd.DataFrame({"key": ["K0", "K1", "K2", "K3"],
                             "A": ["A0", "A1", "A2", "A3"],
                             "B": ["B0", "B1", "B2", "B3"]})
right = pd.DataFrame({"key": ["K0", "K1", "K2", "K3"],
                              "C": ["C0", "C1", "C2", "C3"],
                              "D": ["D0", "D1", "D2", "D3"]})

print(left)
#    A   B key
# 0  A0  B0  K0
# 1  A1  B1  K1
# 2  A2  B2  K2
# 3  A3  B3  K3

print(right)
#    C   D key
# 0  C0  D0  K0
# 1  C1  D1  K1
# 2  C2  D2  K2
# 3  C3  D3  K3

#依據(jù)key column合并，并打印出
res = pd.merge(left, right, on="key")

print(res)
     A   B key   C   D
# 0  A0  B0  K0  C0  D0
# 1  A1  B1  K1  C1  D1
# 2  A2  B2  K2  C2  D2
# 3  A3  B3  K3  C3  D3

依據(jù)兩組key合并

合并時(shí)有4種方法how = ["left", "right", "outer", "inner"]，預(yù)設(shè)值how="inner"。

import pandas as pd

#定義資料集并打印出
left = pd.DataFrame({"key1": ["K0", "K0", "K1", "K2"],
                      "key2": ["K0", "K1", "K0", "K1"],
                      "A": ["A0", "A1", "A2", "A3"],
                      "B": ["B0", "B1", "B2", "B3"]})
right = pd.DataFrame({"key1": ["K0", "K1", "K1", "K2"],
                       "key2": ["K0", "K0", "K0", "K0"],
                       "C": ["C0", "C1", "C2", "C3"],
                       "D": ["D0", "D1", "D2", "D3"]})

print(left)
#    A   B key1 key2
# 0  A0  B0   K0   K0
# 1  A1  B1   K0   K1
# 2  A2  B2   K1   K0
# 3  A3  B3   K2   K1

print(right)
#    C   D key1 key2
# 0  C0  D0   K0   K0
# 1  C1  D1   K1   K0
# 2  C2  D2   K1   K0
# 3  C3  D3   K2   K0

#依據(jù)key1與key2 columns進(jìn)行合并，并打印出四種結(jié)果["left", "right", "outer", "inner"]
res = pd.merge(left, right, on=["key1", "key2"], how="inner")
print(res)
#    A   B key1 key2   C   D
# 0  A0  B0   K0   K0  C0  D0
# 1  A2  B2   K1   K0  C1  D1
# 2  A2  B2   K1   K0  C2  D2

res = pd.merge(left, right, on=["key1", "key2"], how="outer")
print(res)
#     A    B key1 key2    C    D
# 0   A0   B0   K0   K0   C0   D0
# 1   A1   B1   K0   K1  NaN  NaN
# 2   A2   B2   K1   K0   C1   D1
# 3   A2   B2   K1   K0   C2   D2
# 4   A3   B3   K2   K1  NaN  NaN
# 5  NaN  NaN   K2   K0   C3   D3

res = pd.merge(left, right, on=["key1", "key2"], how="left")
print(res)
#    A   B key1 key2    C    D
# 0  A0  B0   K0   K0   C0   D0
# 1  A1  B1   K0   K1  NaN  NaN
# 2  A2  B2   K1   K0   C1   D1
# 3  A2  B2   K1   K0   C2   D2
# 4  A3  B3   K2   K1  NaN  NaN

res = pd.merge(left, right, on=["key1", "key2"], how="right")
print(res)
#     A    B key1 key2   C   D
# 0   A0   B0   K0   K0  C0  D0
# 1   A2   B2   K1   K0  C1  D1
# 2   A2   B2   K1   K0  C2  D2
# 3  NaN  NaN   K2   K0  C3  D3

Indicator

indicator=True會(huì)將合并的記錄放在新的一列。

import pandas as pd

#定義資料集并打印出
df1 = pd.DataFrame({"col1":[0,1], "col_left":["a","b"]})
df2 = pd.DataFrame({"col1":[1,2,2],"col_right":[2,2,2]})

print(df1)
#   col1 col_left
# 0     0        a
# 1     1        b

print(df2)
#   col1  col_right
# 0     1          2
# 1     2          2
# 2     2          2

# 依據(jù)col1進(jìn)行合并，并啟用indicator=True，最后打印出
res = pd.merge(df1, df2, on="col1", how="outer", indicator=True)
print(res)
#   col1 col_left  col_right      _merge
# 0   0.0        a        NaN   left_only
# 1   1.0        b        2.0        both
# 2   2.0      NaN        2.0  right_only
# 3   2.0      NaN        2.0  right_only

# 自定indicator column的名稱，并打印出
res = pd.merge(df1, df2, on="col1", how="outer", indicator="indicator_column")
print(res)
#   col1 col_left  col_right indicator_column
# 0   0.0        a        NaN        left_only
# 1   1.0        b        2.0             both
# 2   2.0      NaN        2.0       right_only
# 3   2.0      NaN        2.0       right_only

依據(jù)index合并

import pandas as pd

#定義資料集并打印出
left = pd.DataFrame({"A": ["A0", "A1", "A2"],
                     "B": ["B0", "B1", "B2"]},
                     index=["K0", "K1", "K2"])
right = pd.DataFrame({"C": ["C0", "C2", "C3"],
                      "D": ["D0", "D2", "D3"]},
                     index=["K0", "K2", "K3"])

print(left)
#     A   B
# K0  A0  B0
# K1  A1  B1
# K2  A2  B2

print(right)
#     C   D
# K0  C0  D0
# K2  C2  D2
# K3  C3  D3

#依據(jù)左右資料集的index進(jìn)行合并，how="outer",并打印出
res = pd.merge(left, right, left_index=True, right_index=True, how="outer")
print(res)
#      A    B    C    D
# K0   A0   B0   C0   D0
# K1   A1   B1  NaN  NaN
# K2   A2   B2   C2   D2
# K3  NaN  NaN   C3   D3

#依據(jù)左右資料集的index進(jìn)行合并，how="inner",并打印出
res = pd.merge(left, right, left_index=True, right_index=True, how="inner")
print(res)
#     A   B   C   D
# K0  A0  B0  C0  D0
# K2  A2  B2  C2  D2

解決overlapping的問題

import pandas as pd

#定義資料集
boys = pd.DataFrame({"k": ["K0", "K1", "K2"], "age": [1, 2, 3]})
girls = pd.DataFrame({"k": ["K0", "K0", "K3"], "age": [4, 5, 6]})

#使用suffixes解決overlapping的問題
res = pd.merge(boys, girls, on="k", suffixes=["_boy", "_girl"], how="inner")
print(res)
#    age_boy   k  age_girl
# 0        1  K0         4
# 1        1  K0         5

七、plot出圖

這次來學(xué)習(xí)如何將數(shù)據(jù)可視化. 首先import我們需要用到的模塊，除了 pandas，我們也需要使用 numpy 生成一些數(shù)據(jù)，這節(jié)里使用的 matplotlib 僅僅是用來 show 圖片的, 即 plt.show()。

創(chuàng)建一個(gè)Series

這是一個(gè)線性的數(shù)據(jù)，我們隨機(jī)生成1000個(gè)數(shù)據(jù)，Series 默認(rèn)的 index 就是從0開始的整數(shù)，但是這里我顯式賦值以便讓大家看的更清楚

# -*- coding:utf-8 -*-

"""
@author: Corwien
@file: pd_plot.py
@time: 18/9/1 10:59
"""
   
import pandas as pd #加載模塊
import numpy as np
import matplotlib.pyplot as plt

# 隨機(jī)生成1000個(gè)數(shù)據(jù)
data = pd.Series(np.random.randn(1000), index=np.arange(1000))

# print(data)
# 為了方便觀看效果, 我們累加這個(gè)數(shù)據(jù)
data.cumsum()

# print "
=======
"
# print(data)

# pandas 數(shù)據(jù)可以直接觀看其可視化形式
data.plot()

plt.show()

數(shù)據(jù)打印：

0      1.055920
1      2.151946
2      0.376157
3     -1.279114
4      0.584658
5      1.178072
6      0.873750
7     -1.039058
8     -0.892274
9     -0.532982
10     0.040962

         ...   

990    0.663714
991    0.013612
992   -1.993561
993    0.238042
994    0.696388
995    1.275367
996   -1.660392
997   -0.795660
998    1.062841
999    0.200333
Length: 1000, dtype: float64

Dataframe 可視化

我們生成一個(gè)1000*4 的DataFrame，并對(duì)他們累加

data = pd.DataFrame(
    np.random.randn(1000,4),
    index=np.arange(1000),
    columns=list("ABCD")
    )
data.cumsum()
data.plot()
plt.show()

結(jié)果打印：

            A         B         C         D
0   -0.240516  1.689101  2.195897 -1.011582
1   -1.067106  1.908657 -0.534270  0.016602
2   -0.239367  0.033567 -0.782701  0.746416
3   -0.104149 -0.756916 -0.984102  0.126436
4   -3.228259 -0.380957 -0.129879  0.738176
5    0.454551 -0.213664  0.200234  0.920599
6   -0.931042  0.731300 -1.424736  0.185456
7    1.823043  0.333958 -0.375364  0.371867
8   -1.407975  0.209401 -1.387218 -0.236411
9   -0.286918 -0.599334 -1.266337 -0.707990
10  -0.205903 -0.942891  1.650707  0.467071

..        ...       ...       ...       ...

994 -1.143698  1.159974 -0.433339 -0.705888
995  0.507159 -0.295003  0.534483 -0.925546
996  1.470531 -0.484951  0.087811 -1.393423
997 -0.225130  0.717332 -0.117851 -0.849506
998 -1.078925 -0.688264 -0.133773 -0.803970
999 -0.589185  0.649868  1.436989 -0.553600

[1000 rows x 4 columns]

這個(gè)就是我們剛剛生成的4個(gè)column`的數(shù)據(jù)，因?yàn)橛?組數(shù)據(jù)，所以4組數(shù)據(jù)會(huì)分別plot出來。plot 可以指定很多參數(shù)，具體的用法大家可以自己查一下這里

除了plot，我經(jīng)常會(huì)用到還有scatter，這個(gè)會(huì)顯示散點(diǎn)圖，首先給大家說一下在 pandas 中有多少種方法

bar

hist

box

ode

area

scatter

hexbin

但是我們今天不會(huì)一一介紹，主要說一下 plot 和 scatter. 因?yàn)閟catter只有x，y兩個(gè)屬性，我們我們就可以分別給x, y指定數(shù)據(jù)

ax = data.plot.scatter(x="A",y="B",color="DarkBlue",label="Class1")

然后我們?cè)诳梢栽佼嬕粋€(gè)在同一個(gè)ax上面，選擇不一樣的數(shù)據(jù)列，不同的 color 和 label

# 將之下這個(gè) data 畫在上一個(gè) ax 上面
data.plot.scatter(x="A",y="C",color="LightGreen",label="Class2",ax=ax)
plt.show()

下面就是我plot出來的圖片

這就是我們今天講的兩種呈現(xiàn)方式，一種是線性的方式，一種是散點(diǎn)圖。

文章版權(quán)歸作者所有，未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址：http://specialneedsforspecialkids.com/yun/42320.html

發(fā)表評(píng)論

登陸后可評(píng)論

0條評(píng)論

Miracle

男|高級(jí)講師

我要關(guān)注我要私信

TA的文章

Python 環(huán)境及開發(fā)工具 IDLE 安裝教程

閱讀 2927·2021-11-24 09:39
郵件營銷平臺(tái)哪個(gè)好？為什么要選擇摩杜云郵件推送？

閱讀 3610·2021-11-22 13:54
在知乎逮到一個(gè)騰訊10年老Java開發(fā)，聊過之后收益良多...

閱讀 3414·2021-11-16 11:45
三年探索：一條自控、電信/科類學(xué)生的技術(shù)成長路線

閱讀 2439·2021-09-09 09:33
符合ARIA的radiogroup

閱讀 3199·2019-08-30 15:55
JS基礎(chǔ)入門篇（十八）—日期對(duì)象

閱讀 1296·2019-08-29 15:40
前端每日實(shí)戰(zhàn)：15# 視頻演示如何用純 CSS 創(chuàng)作條形圖，不用任何圖表庫

閱讀 924·2019-08-29 15:19
css之簡(jiǎn)易水波效果

閱讀 3400·2019-08-29 15:14

国产xxxx99真实实拍_久久不雅视频_高清韩国a级特黄毛片_嗯老师别我我受不了了小说

資訊專欄INFORMATION COLUMN

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長期優(yōu)惠，快來選購！

python綜合學(xué)習(xí)五之Pandas

相關(guān)文章

Python機(jī)器學(xué)習(xí)入門資料整理

***python綜合學(xué)習(xí)三之Numpy和Pandas***

python綜合學(xué)習(xí)四之Numpy和Pandas(下)

零基礎(chǔ)如何學(xué)爬蟲技術(shù)

發(fā)表評(píng)論

0條評(píng)論

Miracle

男|高級(jí)講師

TA的文章

Python 環(huán)境及開發(fā)工具 IDLE 安裝教程

郵件營銷平臺(tái)哪個(gè)好？為什么要選擇摩杜云郵件推送？

在知乎逮到一個(gè)騰訊10年老Java開發(fā)，聊過之后收益良多...

三年探索：一條自控、電信/科類學(xué)生的技術(shù)成長路線

符合ARIA的radiogroup

JS基礎(chǔ)入門篇（十八）—日期對(duì)象

前端每日實(shí)戰(zhàn)：15# 視頻演示如何用純 CSS 創(chuàng)作條形圖，不用任何圖表庫

css之簡(jiǎn)易水波效果

最新活動(dòng)