摘要:本文針對前面利用所做的一次數據匹配實驗,整理了其中的一些對于文件的讀寫操作和常用的數據結構如字典和列表之間的轉換文件與列表之間的轉換將列表轉換為文件將嵌套字典的列表轉換為文件將列表轉換為文件最基本的轉換,將列表中的元素逐行寫入到文件中將嵌套
本文針對前面利用Python 所做的一次數據匹配實驗,整理了其中的一些對于csv文件的讀寫操作和常用的Python"數據結構"(如字典和列表)之間的轉換
(Python Version 2.7)
將列表轉換為csv文件
將嵌套字典的列表轉換為csv文件
將列表轉換為csv文件最基本的轉換,將列表中的元素逐行寫入到csv文件中
def list2csv(list, file): wr = csv.writer(open(file, "wb"), quoting=csv.QUOTE_ALL) for word in list: wr.writerow([word])將嵌套字典的列表轉換為csv文件
這種屬于典型的csv文件讀寫,常見的csv文件常常是第一行為屬性欄,標明各個字段,接下來每一行都是對應屬性的值,讀取時常常用字典來存儲(key為第一行的屬性,value為對應行的值),例如
my_list = [{"players.vis_name": "Khazri", "players.role": "Midfielder", "players.country": "Tunisia", "players.last_name": "Khazri", "players.player_id": "989", "players.first_name": "Wahbi", "players.date_of_birth": "08/02/1991", "players.team": "Bordeaux"}, {"players.vis_name": "Khazri", "players.role": "Midfielder", "players.country": "Tunisia", "players.last_name": "Khazri", "players.player_id": "989", "players.first_name": "Wahbi", "players.date_of_birth": "08/02/1991", "players.team": "Sunderland"}, {"players.vis_name": "Lewis Baker", "players.role": "Midfielder", "players.country": "England", "players.last_name": "Baker", "players.player_id": "9574", "players.first_name": "Lewis", "players.date_of_birth": "25/04/1995", "players.team": "Vitesse"} ]
而最后所有的字典嵌套到一個列表中存儲,而接下來是一個逆過程,即將這種嵌套了字典的列表還原為csv文件存儲起來
# write nested list of dict to csv def nestedlist2csv(list, out_file): with open(out_file, "wb") as f: w = csv.writer(f) fieldnames=list[0].keys() # solve the problem to automatically write the header w.writerow(fieldnames) for row in list: w.writerow(row.values())
注意其中的fieldnames用于傳遞key即第一行的屬性
csv文件與字典之間的轉換
csv文件轉換為字典
第一行為key,其余行為value
每一行為key,value的記錄
csv文件轉換為二級字典
字典轉換為csv文件
第一行為key,其余行為value
每一行為key,value的記錄
csv文件轉換為字典針對常見的首行為屬性,其余行為值的情形
# convert csv file to dict # @params: # key/value: the column of original csv file to set as the key and value of dict def csv2dict(in_file,key,value): new_dict = {} with open(in_file, "rb") as f: reader = csv.reader(f, delimiter=",") fieldnames = next(reader) reader = csv.DictReader(f, fieldnames=fieldnames, delimiter=",") for row in reader: new_dict[row[key]] = row[value] return new_dict
其中的new_dict[row[key]] = row[value]中的"key"和"value"是csv文件中的對應的第一行的屬性字段,需要注意的是這里假設csv文件比較簡單,所指定的key是唯一的,否則直接從csv轉換為dict文件會造成重復字段的覆蓋而丟失數據,如果原始數據指定作為key的列存在重復的情況,則需要構建列表字典,將value部分設置為list,可參照列表字典的構建部分代碼
針對每一行均為鍵值對的特殊情形
這里默認認為第一列為所構建的字典的key,而第二列對應為value,可根據需要進行修改
# convert csv file to dict(key-value pairs each row) def row_csv2dict(csv_file): dict_club={} with open(csv_file)as f: reader=csv.reader(f,delimiter=",") for row in reader: dict_club[row[0]]=row[1] return dict_club
[更新]
構造有值為列表的字典,主要適用于需要把csv中的某些列對應的值作為某一個列的值的情形
或者說本身并不適合作為單純的字典結構,同一個鍵對應的值不唯一
# build a dict of list like {key:[...element of lst_inner_value...]} # key is certain column name of csv file # the lst_inner_value is a list of specific column name of csv file def build_list_dict(source_file, key, lst_inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: for element in lst_inner_value: new_dict.setdefault(row[key], []).append(row[element]) return new_dict # sample: # test_club=build_list_dict("test_info.csv","season",["move from","move to"]) # print test_clubcsv文件轉換為二級字典
這個一般是特殊用途,將csv文件進一步結構化,將其中的某一列(屬性)所對應的值作為key,然后將其余鍵值對構成子字典作為value,一般用于匹配時優先過濾來建立一種層級結構提高準確度
例如我有csv文件的記錄如下(以表格形式表示)
id | name | age | country |
---|---|---|---|
1 | danny | 21 | China |
2 | Lancelot | 22 | America |
... | ... | ... | ... |
經過二級字典轉換后(假設構建country-name兩級)得到如下字典
dct={"China":{"danny":{"id":"1","age":"21"}} "America":{"Lancelot":{"id":"2","age":"22"}}}
代碼如下
# build specific nested dict from csv files(date->name) def build_level2_dict(source_file): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: item = new_dict.get(row["country"], dict()) item[row["name"]] = {k: row[k] for k in ("id","age")} new_dict[row["country"]] = item return new_dict
[更新]
進一步改進后可以使用更加靈活一點的方法來構建二級字典,不用修改內部代碼,二是指定傳入的鍵和值,有兩種不同的字典構建,按需查看
構建的二級字典的各層級的鍵值均人為指定為某一列的值
# build specific nested dict from csv files # @params: # source_file # outer_key:the outer level key of nested dict # inner_key:the inner level key of nested dict # inner_value:set the inner value for the inner key def build_level2_dict2(source_file,outer_key,inner_key,inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: item = new_dict.get(row[outer_key], dict()) item[row[inner_key]] = row[inner_value] new_dict[row[outer_key]] = item return new_dict
指定第一層和第二層的字典的鍵,而將csv文件中剩余的鍵值對存儲為最內層的值
# build specific nested dict from csv files # @params: # source_file # outer_key:the outer level key of nested dict # inner_key:the inner level key of nested dict,and rest key-value will be store as the value of inner key def build_level2_dict(source_file,outer_key,inner_key): new_dict = {} with open(source_file, "rb")as csv_file: reader = csv.reader(csv_file, delimiter=",") fieldnames = next(reader) inner_keyset=fieldnames inner_keyset.remove(outer_key) inner_keyset.remove(inner_key) csv_file.seek(0) data = csv.DictReader(csv_file, delimiter=",") for row in data: item = new_dict.get(row[outer_key], dict()) item[row[inner_key]] = {k: row[k] for k in inner_keyset} new_dict[row[outer_key]] = item return new_dict
還有另一種構建二級字典的方法,利用的是pop()方法,但是個人覺得不如這個直觀,貼在下面
def build_dict(source_file): projects = defaultdict(dict) # if there is no header within the csv file you need to set the header # and utilize fieldnames parameter in csv.DictReader method # headers = ["id", "name", "age", "country"] with open(source_file, "rb") as fp: reader = csv.DictReader(fp, dialect="excel", skipinitialspace=True) for rowdict in reader: if None in rowdict: del rowdict[None] nationality = rowdict.pop("country") date_of_birth = rowdict.pop("name") projects[nationality][date_of_birth] = rowdict return dict(projects)
[更新]
另外另種構造二級字典的方法,主要是針對csv文件并不適合直接構造單純的字典結構,某些鍵對應多個值,所以需要在內部用列表來保存值,或者對每一個鍵值對用列表保存
# build specific nested dict from csv files # @params: # source_file # outer_key:the outer level key of nested dict # lst_inner_value: a list of column name,for circumstance that the inner value of the same outer_key are not distinct # {outer_key:[{pairs of lst_inner_value}]} def build_level2_dict3(source_file,outer_key,lst_inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: new_dict.setdefault(row[outer_key], []).append({k: row[k] for k in lst_inner_value}) return new_dict
# build specific nested dict from csv files # @params: # source_file # outer_key:the outer level key of nested dict # lst_inner_value: a list of column name,for circumstance that the inner value of the same outer_key are not distinct # {outer_key:{key of lst_inner_value:[...value of lst_inner_value...]}} def build_level2_dict4(source_file,outer_key,lst_inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: # print row item = new_dict.get(row[outer_key], dict()) # item.setdefault("move from",[]).append(row["move from"]) # item.setdefault("move to", []).append(row["move to"]) for element in lst_inner_value: item.setdefault(element, []).append(row[element]) new_dict[row[outer_key]] = item return new_dict
# build specific nested dict from csv files # @params: # source_file # outer_key:the outer level key of nested dict # lst_inner_key:a list of column name # lst_inner_value: a list of column name,for circumstance that the inner value of the same lst_inner_key are not distinct # {outer_key:{lst_inner_key:[...lst_inner_value...]}} def build_list_dict2(source_file,outer_key,lst_inner_key,lst_inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: # print row item = new_dict.get(row[outer_key], dict()) item.setdefault(row[lst_inner_key], []).append(row[lst_inner_value]) new_dict[row[outer_key]] = item return new_dict # dct=build_list_dict2("test_info.csv","season","move from","move to")構造三級字典
類似的,可以從csv重構造三級字典甚至多級字典,方法和上面的類似,就不贅述了,只貼代碼
# build specific nested dict from csv files # a dict like {outer_key:{inner_key1:{inner_key2:{rest_key:rest_value...}}}} # the params are extract from the csv column name as you like def build_level3_dict(source_file,outer_key,inner_key1,inner_key2): new_dict = {} with open(source_file, "rb")as csv_file: reader = csv.reader(csv_file, delimiter=",") fieldnames = next(reader) inner_keyset=fieldnames inner_keyset.remove(outer_key) inner_keyset.remove(inner_key1) inner_keyset.remove(inner_key2) csv_file.seek(0) data = csv.DictReader(csv_file, delimiter=",") for row in data: item = new_dict.get(row[outer_key], dict()) sub_item = item.get(row[inner_key1], dict()) sub_item[row[inner_key2]] = {k: row[k] for k in inner_keyset} item[row[inner_key1]] = sub_item new_dict[row[outer_key]] = item return new_dict # build specific nested dict from csv files # a dict like {outer_key:{inner_key1:{inner_key2:inner_value}}} # the params are extract from the csv column name as you like def build_level3_dict2(source_file,outer_key,inner_key1,inner_key2,inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: item = new_dict.get(row[outer_key], dict()) sub_item = item.get(row[inner_key1], dict()) sub_item[row[inner_key2]] = row[inner_value] item[row[inner_key1]] = sub_item new_dict[row[outer_key]] = item return new_dict
這里同樣給出兩種根據不同需求構建字典的方法,一種是將剩余的鍵值對原封不動地保存為最內部的值,另一種是只取所需要的鍵值對保留。
此外還有一種特殊情形,當你的最內部的值不是一個多帶帶的元素而需要是一個列表來存儲多個對應同一個鍵的元素,則只需要對于最內部的鍵值對進行修改
# build specific nested dict from csv files # a dict like {outer_key:{inner_key1:{inner_key2:[inner_value]}}} # for multiple inner_value with the same inner_key2,thus gather them in a list # the params are extract from the csv column name as you like def build_level3_dict3(source_file,outer_key,inner_key1,inner_key2,inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: item = new_dict.get(row[outer_key], dict()) sub_item = item.get(row[inner_key1], dict()) sub_item.setdefault(row[inner_key2], []).append(row[inner_value]) item[row[inner_key1]] = sub_item new_dict[row[outer_key]] = item return new_dict
其中的核心部分是這一句
sub_item.setdefault(row[inner_key2], []).append(row[inner_value])
每一行為key,value的記錄
第一行為key,其余行為value
輸出列表字典
前述csv文件轉換為字典的逆過程,比較簡單就直接貼代碼啦
def dict2csv(dict,file): with open(file,"wb") as f: w=csv.writer(f) # write each key/value pair on a separate row w.writerows(dict.items())
def dict2csv(dict,file): with open(file,"wb") as f: w=csv.writer(f) # write all keys on one row and all values on the next w.writerow(dict.keys()) w.writerow(dict.values())
其實這個不太常用,倒是逆過程比較常見,就是從常規的csv文件導入到列表的字典(本身是一個字典,csv文件的首行構成鍵,其余行依次構成對應列下的鍵的值,其中值形成列表),不過如果碰到這種情形要保存為csv文件的話,做法如下
import csv import pandas as pd from collections import OrderedDict dct=OrderedDict() dct["a"]=[1,2,3,4] dct["b"]=[5,6,7,8] dct["c"]=[9,10,11,12] header = dct.keys() rows=pd.DataFrame(dct).to_dict("records") with open("outTest.csv", "wb") as f: f.write(",".join(header)) f.write(" ") for data in rows: f.write(",".join(str(data[h]) for h in header)) f.write(" ")
這里用到了三個包,除了csv包用于常規的csv文件讀取外,其中OrderedDict用于讓csv文件輸出后保持原有的列的順序,而pandas則適用于中間的一步將列表構成的字典轉換為字典構成的列表,舉個例子
[("a", [1, 2, 3, 4]), ("b", [5, 6, 7, 8]), ("c", [9, 10, 11, 12])] to [{"a": 1, "c": 9, "b": 5}, {"a": 2, "c": 10, "b": 6}, {"a": 3, "c": 11, "b": 7}, {"a": 4, "c": 12, "b": 8}]特殊的csv文件的讀取
這個主要是針對那種分隔符比較特殊的csv文件,一般情形下csv文件統一用一種分隔符是關系不大的(向上述操作基本都是針對分隔符統一用,的情形),而下面這種第一行屬性分隔符是,而后續值的分隔符均為;的讀取時略有不同,一般可逐行轉換為字典在進行操作,代碼如下:
def func(id_list,input_file,output_file): with open(input_file, "rb") as f: # if the delimiter for header is "," while ";" for rows reader = csv.reader(f, delimiter=",") fieldnames = next(reader) reader = csv.DictReader(f, fieldnames=fieldnames, delimiter=";") rows = [row for row in reader if row["players.player_id"] in set(id_list)] # operation on rows...
可根據需要修改分隔符中的內容.
關于csv文件的一些操作我在實驗過程中遇到的問題大概就是這些啦,大部分其實都可以在stackoverflow上找到或者自己提問解決,上面的朋友還是很給力的,后續會小結一下實驗過程中的一些對數據的其他處理如格式轉換,除重,重復判斷等等
最后,源碼我發布在github上的csv_toolkit里面,歡迎隨意玩耍~
更新日志
1、2016-12-22: 改進了構建二級字典的方法,使其變得更加靈活
2、2016-12-24 14:55:30: 加入構造三級字典的方法
3、2017年1月9日11:26:59: 最內部可保存制定列的元素列表
4、2017年1月16日10:29:44:加入了列表字典的構建;針對特殊二級字典的構建(需要保存對應同一個鍵的多個值);
5、2017年2月9日10:54:41: 加入新的二級列表字典的構建
6、2017年2月10日11:18:01:改進了簡單的csv文件到字典的構建代碼
文章版權歸作者所有,未經允許請勿轉載,若此文章存在違規行為,您可以聯系管理員刪除。
轉載請注明本文地址:http://specialneedsforspecialkids.com/yun/38187.html
摘要:本節中將繪制幅圖像收盤折線圖,收盤價對數變換,收盤價月日均值,收盤價周日均值,收盤價星期均值。對數變換是常用的處理方法之一。 《Python編程:從入門到實踐》筆記。本篇是Python數據處理的第二篇,本篇將使用網上下載的數據,對這些數據進行可視化。 1. 前言 本篇將訪問并可視化以兩種常見格式存儲的數據:CSV和JSON: 使用Python的csv模塊來處理以CSV(逗號分隔的值)...
摘要:如果你也是學習愛好者,今天講述的個小技巧,真挺香歡迎收藏學習,喜歡點贊支持。因此,鍵將成為值,而值將成為鍵。幸運的是,這可以通過一行代碼快速完成。因此,我們的代碼不會因錯誤而終止。 ...
目錄Numpy簡介Numpy操作集合1、不同維度數據的表示1.1 一維數據的表示1.2 二維數據的表示1.3 三維數據的表示2、 為什么要使用Numpy2.1、Numpy的ndarray具有廣播功能2.2 Numpy數組的性能比Python原生數據類型高3 ndarray的屬性和基本操作3.1 ndarray的基本屬性3.2 ndarray元素類型3.3 創建ndarray的方式3.4 ndarr...
摘要:因其在各個領域的實用性與和等其他編程語言相比的生產力以及與英語類似的命令而廣受歡迎。反轉字典一個非常常見的字典任務是如果我們有一個字典并且想要反轉它的鍵和值。 ??...
摘要:如果該文件已存在,文件指針將會放在文件的結尾。運行結果以上是讀取文件的方法。為了輸出中文,我們還需要指定一個參數為,另外規定文件輸出的編碼。 上一篇文章:Python3網絡爬蟲實戰---30、解析庫的使用:PyQuery下一篇文章:Python3網絡爬蟲實戰---32、數據存儲:關系型數據庫存儲:MySQL 我們用解析器解析出數據之后,接下來的一步就是對數據進行存儲了,保存的形式可以...
閱讀 1683·2023-04-25 20:16
閱讀 3838·2021-10-09 09:54
閱讀 2696·2021-09-04 16:40
閱讀 2517·2019-08-30 15:55
閱讀 830·2019-08-29 12:37
閱讀 2733·2019-08-26 13:55
閱讀 2903·2019-08-26 11:42
閱讀 3144·2019-08-23 18:26