mirror of
https://github.com/apachecn/ailearning.git
synced 2026-02-09 21:36:27 +08:00
186 lines
3.3 KiB
Markdown
186 lines
3.3 KiB
Markdown
# pickle, cPickle 模块:序列化 Python 对象
|
||
|
||
`pickle` 模块实现了一种算法,可以将任意一个 `Python` 对象转化为一系列的字节,也可以将这些字节重构为一个有相同特征的新对象。
|
||
|
||
由于字节可以被传输或者存储,因此 `pickle` 事实上实现了传递或者保存 `Python` 对象的功能。
|
||
|
||
`cPickle` 使用 `C` 而不是 `Python` 实现了相同的算法,因此速度上要比 `pickle` 快一些。但是它不允许用户从 `pickle` 派生子类。如果子类对你的使用来说无关紧要,那么 `cPickle` 是个更好的选择。
|
||
|
||
In [1]:
|
||
|
||
```py
|
||
try:
|
||
import cPickle as pickle
|
||
except:
|
||
import pickle
|
||
|
||
```
|
||
|
||
## 编码和解码
|
||
|
||
使用 `pickle.dumps()` 可以将一个对象转换为字符串(`dump string`):
|
||
|
||
In [2]:
|
||
|
||
```py
|
||
data = [ { 'a':'A', 'b':2, 'c':3.0 } ]
|
||
|
||
data_string = pickle.dumps(data)
|
||
|
||
print "DATA:"
|
||
print data
|
||
print "PICKLE:"
|
||
print data_string
|
||
|
||
```
|
||
|
||
```py
|
||
DATA:
|
||
[{'a': 'A', 'c': 3.0, 'b': 2}]
|
||
PICKLE:
|
||
(lp1
|
||
(dp2
|
||
S'a'
|
||
S'A'
|
||
sS'c'
|
||
F3
|
||
sS'b'
|
||
I2
|
||
sa.
|
||
|
||
```
|
||
|
||
虽然 `pickle` 编码的字符串并不一定可读,但是我们可以用 `pickle.loads()` 来从这个字符串中恢复原对象中的内容(`load string`):
|
||
|
||
In [3]:
|
||
|
||
```py
|
||
data_from_string = pickle.loads(data_string)
|
||
|
||
print data_from_string
|
||
|
||
```
|
||
|
||
```py
|
||
[{'a': 'A', 'c': 3.0, 'b': 2}]
|
||
|
||
```
|
||
|
||
## 编码协议
|
||
|
||
`dumps` 可以接受一个可省略的 `protocol` 参数(默认为 0),目前有 3 种编码方式:
|
||
|
||
* 0:原始的 `ASCII` 编码格式
|
||
* 1:二进制编码格式
|
||
* 2:更有效的二进制编码格式
|
||
|
||
当前最高级的编码可以通过 `HIGHEST_PROTOCOL` 查看:
|
||
|
||
In [4]:
|
||
|
||
```py
|
||
print pickle.HIGHEST_PROTOCOL
|
||
|
||
```
|
||
|
||
```py
|
||
2
|
||
|
||
```
|
||
|
||
例如:
|
||
|
||
In [5]:
|
||
|
||
```py
|
||
data_string_1 = pickle.dumps(data, 1)
|
||
|
||
print "Pickle 1:", data_string_1
|
||
|
||
data_string_2 = pickle.dumps(data, 2)
|
||
|
||
print "Pickle 2:", data_string_2
|
||
|
||
```
|
||
|
||
```py
|
||
Pickle 1: ]q<EFBFBD>}q<EFBFBD>(U<EFBFBD>aU<EFBFBD>AU<EFBFBD>cG@<EFBFBD>U<EFBFBD>bK<EFBFBD>ua.
|
||
Pickle 2: <EFBFBD><EFBFBD>]q<EFBFBD>}q<EFBFBD>(U<EFBFBD>aU<EFBFBD>AU<EFBFBD>cG@<EFBFBD>U<EFBFBD>bK<EFBFBD>ua.
|
||
|
||
```
|
||
|
||
如果 `protocol` 参数指定为负数,那么将调用当前的最高级的编码协议进行编码:
|
||
|
||
In [6]:
|
||
|
||
```py
|
||
print pickle.dumps(data, -1)
|
||
|
||
```
|
||
|
||
```py
|
||
<EFBFBD><EFBFBD>]q<EFBFBD>}q<EFBFBD>(U<EFBFBD>aU<EFBFBD>AU<EFBFBD>cG@<EFBFBD>U<EFBFBD>bK<EFBFBD>ua.
|
||
|
||
```
|
||
|
||
从这些格式中恢复对象时,不需要指定所用的协议,`pickle.load()` 会自动识别:
|
||
|
||
In [7]:
|
||
|
||
```py
|
||
print "Load 1:", pickle.loads(data_string_1)
|
||
print "Load 2:", pickle.loads(data_string_2)
|
||
|
||
```
|
||
|
||
```py
|
||
Load 1: [{'a': 'A', 'c': 3.0, 'b': 2}]
|
||
Load 2: [{'a': 'A', 'c': 3.0, 'b': 2}]
|
||
|
||
```
|
||
|
||
## 存储和读取 pickle 文件
|
||
|
||
除了将对象转换为字符串这种方式,`pickle` 还支持将对象写入一个文件中,通常我们将这个文件命名为 `xxx.pkl`,以表示它是一个 `pickle` 文件:
|
||
|
||
存储和读取的函数分别为:
|
||
|
||
* `pickle.dump(obj, file, protocol=0)` 将对象序列化并存入 `file` 文件中
|
||
* `pickle.load(file)` 从 `file` 文件中的内容恢复对象
|
||
|
||
将对象存入文件:
|
||
|
||
In [8]:
|
||
|
||
```py
|
||
with open("data.pkl", "wb") as f:
|
||
pickle.dump(data, f)
|
||
|
||
```
|
||
|
||
从文件中读取:
|
||
|
||
In [9]:
|
||
|
||
```py
|
||
with open("data.pkl") as f:
|
||
data_from_file = pickle.load(f)
|
||
|
||
print data_from_file
|
||
|
||
```
|
||
|
||
```py
|
||
[{'a': 'A', 'c': 3.0, 'b': 2}]
|
||
|
||
```
|
||
|
||
清理生成的文件:
|
||
|
||
In [10]:
|
||
|
||
```py
|
||
import os
|
||
os.remove("data.pkl")
|
||
|
||
``` |