如何利用Python pandas找到數(shù)據(jù)并刪除

89542767 發(fā)布于2022-10-08 14:56 / 715人閱讀

　　小編寫這篇文章的主要目的，主要是想給大家做出一個深入解答，解答一下關(guān)于Python pandas技術(shù)，主要是如何找到數(shù)據(jù)，然后對其進(jìn)行刪除，下面小編就給大家進(jìn)行詳細(xì)解答下。

　　前言

　　當(dāng)我們在使用Python pandas處理各種數(shù)據(jù)的時(shí)候，經(jīng)常性的會遇到一些問題，比如會遇到一些數(shù)據(jù)重復(fù)的問題，這個時(shí)候，我們需要做的就是找出產(chǎn)生問題的原因是什么。那么，pandas將會給我們提供兩種比較高效的方法：duplicated()和drop_duplicates()。

　　一、duplicated()

　　duplicated()可以被用在DataFrame的三種情況下，分別是pandas.DataFrame.duplicated、pandas.Series.duplicated和pandas.Index.duplicated。他們的用法都類似，前兩個會返回一個布爾值的Series，最后一個會返回一個布爾值的numpy.ndarray。

　　DataFrame.duplicated(subset=None,keep=‘first’)

　　subset：默認(rèn)為None，需要標(biāo)記重復(fù)的標(biāo)簽或標(biāo)簽序列

　　keep：默認(rèn)為‘first’，如何標(biāo)記重復(fù)標(biāo)簽

　　first：將除第一次出現(xiàn)以外的重復(fù)數(shù)據(jù)標(biāo)記為True

　　last：將除最后一次出現(xiàn)以外的重復(fù)數(shù)據(jù)標(biāo)記為True

　　False：將所有重復(fù)的項(xiàng)都標(biāo)記為True（不管是不是第一次出現(xiàn)）

　　Series.duplicated(keep=‘first’)

　　keep：與DataFrame.duplicated的keep相同

　　Index.duplicated(keep=‘first’)

　　keep：與DataFrame.duplicated的keep相同

　　例子：

　　import pandas as pd
　　df=pd.DataFrame({
　　'brand':['Yum Yum','Yum Yum','Indomie','Indomie','Indomie'],
　　'style':['cup','cup','cup','pack','pack'],
　　'rating':[4,4,3.5,15,5]
　　})
　　df

　　brand style rating

　　0 Yum Yum cup 4.0

　　1 Yum Yum cup 4.0

　　2 Indomie cup 3.5

　　3 Indomie pack 15.0

　　4 Indomie pack 5.0

　　df.duplicated()

　　0 False

　　1 True

　　2 False

　　3 False

　　4 False

　　dtype:bool

   df.duplicated(keep='last')

　　0 True

　　1 False

　　2 False

　　3 False

　　4 False

　　dtype:bool

　df.duplicated(keep=False)

　　0 True

　　1 True

　　2 False

　　3 False

　　4 False

　　dtype:bool

　　df.duplicated(subset=['brand'])

　　0 False

　　1 True

　　2 False

　　3 True

　　4 True

　　dtype:bool

　　關(guān)于Index的重復(fù)標(biāo)記：

　　df=df.set_index('brand')

　　style rating

　　brand

　　Yum Yum cup 4.0

　　Indomie cup 3.5

　　Indomie pack 15.0

　　Indomie pack 5.0

　　df.index.duplicated()

　　array([False,True,False,True,True])

　　二、drop_duplicates()

　　與duplicated()類似，drop_duplicates()是直接把重復(fù)值給刪掉。下面只會介紹一些含義不同的參數(shù)。

　　DataFrame.drop_duplicates(subset=None,keep=‘first’,inplace=False)

　　subset：與duplicated()中相同

　　keep：與duplicated()中相同

　　inplace：與pandas其他函數(shù)的inplace相同，選擇是修改現(xiàn)有數(shù)據(jù)還是返回新的數(shù)據(jù)

　　Series.drop_duplicates()相比Series.duplicated()也是多了一個inplace參數(shù)，和上訴介紹一樣，Index.drop_duplicates()與Index.duplicated()參數(shù)相同就不做贅述。下面是例子：

　　df=pd.DataFrame({
　　'brand':['Yum Yum','Yum Yum','Indomie','Indomie','Indomie'],
　　'style':['cup','cup','cup','pack','pack'],
　　'rating':[4,4,3.5,15,5]
　　})
　　df

　　brand style rating

　　0 Yum Yum cup 4.0

　　1 Yum Yum cup 4.0

　　2 Indomie cup 3.5

　　3 Indomie pack 15.0

　　4 Indomie pack 5.0

　　df.drop_duplicates()

　　brand style rating

　　0 Yum Yum cup 4.0

　　2 Indomie cup 3.5

　　3 Indomie pack 15.0

　　4 Indomie pack 5.0

　　df.drop_duplicates(inplace=True)

　　brand style rating

　　0 Yum Yum cup 4.0

　　2 Indomie cup 3.5

　　3 Indomie pack 15.0

　　4 Indomie pack 5.0

　　到此為止，小編寫的關(guān)于pandas內(nèi)容就為大家介紹到這里了，希望可以為各位讀者帶來幫助。

GPU云服務(wù)器云服務(wù)器 js顯示并刪除json數(shù)據(jù) pandas python python_pandas 服務(wù)器如何找到數(shù)據(jù)庫

文章版權(quán)歸作者所有，未經(jīng)允許請勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請注明本文地址：http://specialneedsforspecialkids.com/yun/127960.html

發(fā)表評論

登陸后可評論

0條評論

89542767

男|高級講師

我要關(guān)注我要私信

TA的文章

pythontime控制模塊時(shí)間格式與結(jié)構(gòu)型時(shí)長詳細(xì)說明

閱讀 911·2023-01-14 11:38
OpenMV與JSON編碼問題分析

閱讀 878·2023-01-14 11:04
python中的特性管理模式詳細(xì)說明

閱讀 740·2023-01-14 10:48
Python運(yùn)用fastapi完成上傳圖片

閱讀 1983·2023-01-14 10:34
pythonopencv圖象高通濾波和低通濾波器的范例編碼

閱讀 942·2023-01-14 10:24
Python根據(jù)ssh遠(yuǎn)程桌面連接Mysql數(shù)據(jù)庫操作

閱讀 819·2023-01-14 10:18
本文輕輕松松掌握Python中類的繼承

閱讀 499·2023-01-14 10:09
python中wordcloud組裝方式總結(jié)

閱讀 572·2023-01-14 10:02

国产xxxx99真实实拍_久久不雅视频_高清韩国a级特黄毛片_嗯老师别我我受不了了小说

資訊專欄INFORMATION COLUMN

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺、長期優(yōu)惠，快來選購！

如何利用Python pandas找到數(shù)據(jù)并刪除

相關(guān)文章

**一文帶你斬殺Python之Numpy??Pandas全部操作【全網(wǎng)最詳細(xì)】???**

Pandas之旅（一): 讓我們把基礎(chǔ)知識一次擼完，申精干貨

**收藏 | 10個可以快速用Python進(jìn)行數(shù)據(jù)分析的小技巧**

**使用Pandas&NumPy進(jìn)行數(shù)據(jù)清洗的6大常用方法**

發(fā)表評論

0條評論

89542767

男|高級講師

TA的文章

pythontime控制模塊時(shí)間格式與結(jié)構(gòu)型時(shí)長詳細(xì)說明

OpenMV與JSON編碼問題分析

python中的特性管理模式詳細(xì)說明

Python運(yùn)用fastapi完成上傳圖片

pythonopencv圖象高通濾波和低通濾波器的范例編碼

Python根據(jù)ssh遠(yuǎn)程桌面連接Mysql數(shù)據(jù)庫操作

本文輕輕松松掌握Python中類的繼承

python中wordcloud組裝方式總結(jié)

最新活動