Files
ailearning/docs/da/066.md
2020-10-19 21:08:55 +08:00

250 lines
3.2 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CSV 文件和 csv 模块
标准库中有自带的 `csv` (逗号分隔值) 模块处理 `csv` 格式的文件:
In [1]:
```py
import csv
```
## 读 csv 文件
假设我们有这样的一个文件:
In [2]:
```py
%%file data.csv
"alpha 1", 100, -1.443
"beat 3", 12, -0.0934
"gamma 3a", 192, -0.6621
"delta 2a", 15, -4.515
```
```py
Writing data.csv
```
打开这个文件,并产生一个文件 reader
In [3]:
```py
fp = open("data.csv")
r = csv.reader(fp)
```
可以按行迭代数据:
In [4]:
```py
for row in r:
print row
fp.close()
```
```py
['alpha 1', ' 100', ' -1.443']
['beat 3', ' 12', ' -0.0934']
['gamma 3a', ' 192', ' -0.6621']
['delta 2a', ' 15', ' -4.515']
```
默认数据内容都被当作字符串处理,不过可以自己进行处理:
In [5]:
```py
data = []
with open('data.csv') as fp:
r = csv.reader(fp)
for row in r:
data.append([row[0], int(row[1]), float(row[2])])
data
```
Out[5]:
```py
[['alpha 1', 100, -1.443],
['beat 3', 12, -0.0934],
['gamma 3a', 192, -0.6621],
['delta 2a', 15, -4.515]]
```
In [6]:
```py
import os
os.remove('data.csv')
```
## 写 csv 文件
可以使用 `csv.writer` 写入文件,不过相应地,传入的应该是以写方式打开的文件,不过一般要用 `'wb'` 即二进制写入方式,防止出现换行不正确的问题:
In [7]:
```py
data = [('one', 1, 1.5), ('two', 2, 8.0)]
with open('out.csv', 'wb') as fp:
w = csv.writer(fp)
w.writerows(data)
```
显示结果:
In [8]:
```py
!cat 'out.csv'
```
```py
one,1,1.5
two,2,8.0
```
## 更换分隔符
默认情况下,`csv` 模块默认 `csv` 文件都是由 `excel` 产生的,实际中可能会遇到这样的问题:
In [9]:
```py
data = [('one, \"real\" string', 1, 1.5), ('two', 2, 8.0)]
with open('out.csv', 'wb') as fp:
w = csv.writer(fp)
w.writerows(data)
```
In [10]:
```py
!cat 'out.csv'
```
```py
"one, ""real"" string",1,1.5
two,2,8.0
```
可以修改分隔符来处理这组数据:
In [11]:
```py
data = [('one, \"real\" string', 1, 1.5), ('two', 2, 8.0)]
with open('out.psv', 'wb') as fp:
w = csv.writer(fp, delimiter="|")
w.writerows(data)
```
In [12]:
```py
!cat 'out.psv'
```
```py
"one, ""real"" string"|1|1.5
two|2|8.0
```
In [13]:
```py
import os
os.remove('out.psv')
os.remove('out.csv')
```
## 其他选项
`numpy.loadtxt()``pandas.read_csv()` 可以用来读写包含很多数值数据的 `csv` 文件:
In [14]:
```py
%%file trades.csv
Order,Date,Stock,Quantity,Price
A0001,2013-12-01,AAPL,1000,203.4
A0002,2013-12-01,MSFT,1500,167.5
A0003,2013-12-02,GOOG,1500,167.5
```
```py
Writing trades.csv
```
使用 `pandas` 进行处理,生成一个 `DataFrame` 对象:
In [15]:
```py
import pandas
df = pandas.read_csv('trades.csv', index_col=0)
print df
```
```py
Date Stock Quantity Price
Order
A0001 2013-12-01 AAPL 1000 203.4
A0002 2013-12-01 MSFT 1500 167.5
A0003 2013-12-02 GOOG 1500 167.5
```
通过名字进行索引:
In [16]:
```py
df['Quantity'] * df['Price']
```
Out[16]:
```py
Order
A0001 203400
A0002 251250
A0003 251250
dtype: float64
```
In [17]:
```py
import os
os.remove('trades.csv')
```