P05-2 Load and dump ubike json data#

這個例子載入了台北市ubike的即時資料,該資料以JSON格式儲存,但他並非List of dict(s)的型態,而是Dict of dict(s)的型態,在最外面亦多一層Dictionary包裹著資料和回傳是否傳輸成功的確認碼。因此,和一般的例子比較起來,這個例子必須要找到資料的節點,在解出為Python的物件時,也要考慮如何處理資料中兩層的Dictionary。

Read and Write JSON#

Step 1. Requests to get data from internet#

在以下的程式碼中,我用requests這個套件發出一個get()要求,然後我嘗試列印出一些該要求的回傳資料,以確認我成功地獲得我要求的資料。

import requests
import json
response = requests.get('https://tcgbusfs.blob.core.windows.net/dotapp/youbike/v2/youbike_immediate.json')
print(response)
print(response.status_code)
print(response.headers)
print(type(response)) # <class 'requests.models.Response'>
print(type(response.text)) # <class 'str'>
<Response [200]>
200
{'Content-Length': '30165', 'Content-Type': 'application/octet-stream', 'Content-Encoding': 'gzip', 'Content-MD5': 'EFQuHfXNKFZPZRyOMXoYhA==', 'Last-Modified': 'Sun, 30 Oct 2022 10:53:49 GMT', 'ETag': '0x8DABA6507095944', 'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0', 'x-ms-request-id': 'd3747082-d01e-00fe-5a50-f4c6b6000000', 'x-ms-version': '2009-09-19', 'x-ms-lease-status': 'unlocked', 'x-ms-blob-type': 'BlockBlob', 'Access-Control-Allow-Origin': '*', 'Date': 'Sun, 01 Oct 2023 10:18:55 GMT'}
<class 'requests.models.Response'>
<class 'str'>

把前500個字元印出來可以看到這裡面只有大括號,而且在資料(retVal所對應到的那個大括號)外面還有一層,也就是retCoderetVal所在的那一層大括號。

response.text[:500]
'{"retCode":1,"retVal":{"0001":{"sno": "0001", "sna": "捷運市政府站(3號出口)", "tot": "84", "sbi": "74", "sarea": "信義區", "mday": "20221030185227", "lat": "25.0408578889", "lng": "121.567904444", "ar": "忠孝東路/松仁路(東南側)", "sareaen": "Xinyi Dist.", "snaen": "MRT Taipei City Hall Stataion(Exit 3)-2", "aren": "The S.W. side of Road Zhongxiao East Road & Road Chung Yan.", "bemp": "10", "act": "1"},"0002":{"sno": "0002", "sna": "捷運國父紀念館站(2號出口)", "tot": "16", "sbi": "4", "sarea": "大安區", "mday": "20221030185231", "l'

Step 2. Convert json text to python objects#

json_obj = json.loads(response.text)
print(type(json_obj))
print(type(json_obj["retVal"]))
print(type(json_obj["retCode"]))
print(len(json_obj["retVal"]))
<class 'dict'>
<class 'dict'>
<class 'int'>
369

Step 3. Save data to json#


json.dump(json_obj, open('ubike.json', 'w'))

Load/Dump Pickle#

Pickle有點類似暫存檔,把什麼東西dump-out,當load-in回來時就會是原本的樣子。不用再重新Parsing。甚至如果原本資料型態

Dump to Pickle#

import pickle
with open("output.p", "wb") as fout:
    pickle.dump(json_obj, fout)

Load Pickle data#

with open("output.p", "rb") as fin:
    ubike_data = pickle.load(fin)

Convert CSV to JSON#

現在我想做的是,假設我有一個檔案是CSV檔,但我希望用手動方式把它改為JSON檔,要怎麼做?做法其實就是先把CSV第一列取出來儲存為Key,然後把這個Key一一和每一列的值給對應起來結合成一個又一個的Dictionary,最後把每列所產生的Dictionary用append()加入一個List中,就會是一個List of dictionaries結構的JSON格式。

Step 1: Read CSV to list of list#

all_list = []
with open("drug_156_2.csv", "r", encoding="utf-8-sig") as f:
    for line in f.read().split("\n"):
        row = line.split(",")
        all_list.append(row)
#         print(len(row))
print(all_list[1][-2])
print(all_list[1])

keys = all_list[0]

Step 2: Convert list of list to list of dict#

json_list = []
for row in all_list[1:]:
    temp_dict = dict(zip(keys, row))
    json_list.append(temp_dict)
json_list[:3]

Step 3: Dump to json#


json.dump(json_obj, open('ubike.json', 'w'))