mirror of
https://github.com/apachecn/ailearning.git
synced 2026-02-03 18:34:17 +08:00
494 lines
6.7 KiB
Markdown
494 lines
6.7 KiB
Markdown
# 结构化数组
|
||
|
||
假设我们要保存这样的数据:
|
||
|
||
| | name | age | wgt |
|
||
| --- | --- | --- | --- |
|
||
| 0 | dan | 1 | 23.1 |
|
||
| 1 | ann | 0 | 25.1 |
|
||
| 2 | sam | 2 | 8.3 |
|
||
|
||
希望定义一个一维数组,每个元素有三个属性 `name, age, wgt`,此时我们需要使用结构化数组。
|
||
|
||
In [1]:
|
||
|
||
```py
|
||
import numpy as np
|
||
|
||
```
|
||
|
||
定义数组 `a`:
|
||
|
||
| 0 | 1 | 2 | 3 |
|
||
| --- | --- | --- | --- |
|
||
| 1.0 | 2.0 | 3.0 | 4.0 |
|
||
|
||
In [2]:
|
||
|
||
```py
|
||
a = np.array([1.0,2.0,3.0,4.0], np.float32)
|
||
|
||
```
|
||
|
||
使用 `view` 方法,将 `a` 对应的内存按照复数来解释:
|
||
|
||
In [3]:
|
||
|
||
```py
|
||
a.view(np.complex64)
|
||
|
||
```
|
||
|
||
Out[3]:
|
||
|
||
```py
|
||
array([ 1.+2.j, 3.+4.j], dtype=complex64)
|
||
```
|
||
|
||
| 0 | 1 | 2 | 3 |
|
||
| --- | --- | --- | --- |
|
||
| 1.0 | 2.0 | 3.0 | 4.0 |
|
||
| real | imag | real | imag |
|
||
|
||
事实上,我们可以把复数看成一个结构体,第一部分是实部,第二部分是虚部,这样这个数组便可以看成是一个结构化数组。
|
||
|
||
换句话说,我们只需要换种方式解释这段内存,便可以得到结构化数组的效果!
|
||
|
||
| 0 | 1 | 2 | 3 |
|
||
| --- | --- | --- | --- |
|
||
| 1.0 | 2.0 | 3.0 | 4.0 |
|
||
| mass | vol | mass | vol |
|
||
|
||
例如,我们可以将第一个浮点数解释为质量,第二个浮点数解释为速度,则这段内存还可以看成是包含两个域(质量和速度)的结构体。
|
||
|
||
In [4]:
|
||
|
||
```py
|
||
my_dtype = np.dtype([('mass', 'float32'), ('vol', 'float32')])
|
||
|
||
```
|
||
|
||
In [5]:
|
||
|
||
```py
|
||
a.view(my_dtype)
|
||
|
||
```
|
||
|
||
Out[5]:
|
||
|
||
```py
|
||
array([(1.0, 2.0), (3.0, 4.0)],
|
||
dtype=[('mass', '<f4'), ('vol', '<f4')])
|
||
```
|
||
|
||
这里,我们使用 `dtype` 创造了自定义的结构类型,然后用自定义的结构来解释数组 `a` 所占的内存。
|
||
|
||
这里 `f4` 表示四字节浮点数,`<` 表示小字节序。
|
||
|
||
利用这个自定义的结构类型,我们可以这样初始化结构化数组:
|
||
|
||
In [6]:
|
||
|
||
```py
|
||
my_data = np.array([(1,1), (1,2), (2,1), (1,3)], my_dtype)
|
||
|
||
print my_data
|
||
|
||
```
|
||
|
||
```py
|
||
[(1.0, 1.0) (1.0, 2.0) (2.0, 1.0) (1.0, 3.0)]
|
||
|
||
```
|
||
|
||
第一个元素:
|
||
|
||
In [7]:
|
||
|
||
```py
|
||
my_data[0]
|
||
|
||
```
|
||
|
||
Out[7]:
|
||
|
||
```py
|
||
(1.0, 1.0)
|
||
```
|
||
|
||
得到第一个元素的速度信息,可以使用域的名称来索引:
|
||
|
||
In [8]:
|
||
|
||
```py
|
||
my_data[0]['vol']
|
||
|
||
```
|
||
|
||
Out[8]:
|
||
|
||
```py
|
||
1.0
|
||
```
|
||
|
||
得到所有的质量信息:
|
||
|
||
In [9]:
|
||
|
||
```py
|
||
my_data['mass']
|
||
|
||
```
|
||
|
||
Out[9]:
|
||
|
||
```py
|
||
array([ 1., 1., 2., 1.], dtype=float32)
|
||
```
|
||
|
||
自定义排序规则,先按速度,再按质量:
|
||
|
||
In [10]:
|
||
|
||
```py
|
||
my_data.sort(order=('vol', 'mass'))
|
||
|
||
print my_data
|
||
|
||
```
|
||
|
||
```py
|
||
[(1.0, 1.0) (2.0, 1.0) (1.0, 2.0) (1.0, 3.0)]
|
||
|
||
```
|
||
|
||
回到最初的例子,定义一个人的结构类型:
|
||
|
||
In [11]:
|
||
|
||
```py
|
||
person_dtype = np.dtype([('name', 'S10'), ('age', 'int'), ('weight', 'float')])
|
||
|
||
```
|
||
|
||
查看类型所占字节数:
|
||
|
||
In [12]:
|
||
|
||
```py
|
||
person_dtype.itemsize
|
||
|
||
```
|
||
|
||
Out[12]:
|
||
|
||
```py
|
||
22
|
||
```
|
||
|
||
产生一个 3 x 4 共12人的空结构体数组:
|
||
|
||
In [13]:
|
||
|
||
```py
|
||
people = np.empty((3,4), person_dtype)
|
||
|
||
```
|
||
|
||
分别赋值:
|
||
|
||
In [14]:
|
||
|
||
```py
|
||
people['name'] = [['Brad', 'Jane', 'John', 'Fred'],
|
||
['Henry', 'George', 'Brain', 'Amy'],
|
||
['Ron', 'Susan', 'Jennife', 'Jill']]
|
||
|
||
```
|
||
|
||
In [15]:
|
||
|
||
```py
|
||
people['age'] = [[33, 25, 47, 54],
|
||
[29, 61, 32, 27],
|
||
[19, 33, 18, 54]]
|
||
|
||
```
|
||
|
||
In [16]:
|
||
|
||
```py
|
||
people['weight'] = [[135., 105., 255., 140.],
|
||
[154., 202., 137., 187.],
|
||
[188., 135., 88., 145.]]
|
||
|
||
```
|
||
|
||
In [17]:
|
||
|
||
```py
|
||
print people
|
||
|
||
```
|
||
|
||
```py
|
||
[[('Brad', 33, 135.0) ('Jane', 25, 105.0) ('John', 47, 255.0)
|
||
('Fred', 54, 140.0)]
|
||
[('Henry', 29, 154.0) ('George', 61, 202.0) ('Brain', 32, 137.0)
|
||
('Amy', 27, 187.0)]
|
||
[('Ron', 19, 188.0) ('Susan', 33, 135.0) ('Jennife', 18, 88.0)
|
||
('Jill', 54, 145.0)]]
|
||
|
||
```
|
||
|
||
In [18]:
|
||
|
||
```py
|
||
people[-1,-1]
|
||
|
||
```
|
||
|
||
Out[18]:
|
||
|
||
```py
|
||
('Jill', 54, 145.0)
|
||
```
|
||
|
||
## 从文本中读取结构化数组
|
||
|
||
我们有这样一个文件:
|
||
|
||
In [19]:
|
||
|
||
```py
|
||
%%writefile people.txt
|
||
name age weight
|
||
amy 11 38.2
|
||
john 10 40.3
|
||
bill 12 21.2
|
||
|
||
```
|
||
|
||
```py
|
||
Writing people.txt
|
||
|
||
```
|
||
|
||
利用 `loadtxt` 指定数据类型,从这个文件中读取结构化数组:
|
||
|
||
In [20]:
|
||
|
||
```py
|
||
person_dtype = np.dtype([('name', 'S10'), ('age', 'int'), ('weight', 'float')])
|
||
|
||
people = np.loadtxt('people.txt',
|
||
skiprows=1,
|
||
dtype=person_dtype)
|
||
|
||
people
|
||
|
||
```
|
||
|
||
Out[20]:
|
||
|
||
```py
|
||
array([('amy', 11, 38.2), ('john', 10, 40.3), ('bill', 12, 21.2)],
|
||
dtype=[('name', 'S10'), ('age', '<i4'), ('weight', '<f8')])
|
||
```
|
||
|
||
查看 `name` 域:
|
||
|
||
In [21]:
|
||
|
||
```py
|
||
people['name']
|
||
|
||
```
|
||
|
||
Out[21]:
|
||
|
||
```py
|
||
array(['amy', 'john', 'bill'],
|
||
dtype='|S10')
|
||
```
|
||
|
||
删除文件:
|
||
|
||
In [22]:
|
||
|
||
```py
|
||
import os
|
||
os.remove('people.txt')
|
||
|
||
```
|
||
|
||
对于下面的文件:
|
||
|
||
In [23]:
|
||
|
||
```py
|
||
%%writefile wood.csv
|
||
item,material,number
|
||
100,oak,33
|
||
110,maple,14
|
||
120,oak,7
|
||
145,birch,3
|
||
|
||
```
|
||
|
||
```py
|
||
Writing wood.csv
|
||
|
||
```
|
||
|
||
定义转换函数处理材料属性,使之对应一个整数:
|
||
|
||
In [24]:
|
||
|
||
```py
|
||
tree_to_int = dict(oak = 1,
|
||
maple=2,
|
||
birch=3)
|
||
|
||
def convert(s):
|
||
return tree_to_int.get(s, 0)
|
||
|
||
```
|
||
|
||
使用 `genfromtxt` 载入数据,可以自动从第一行读入属性名称:
|
||
|
||
In [25]:
|
||
|
||
```py
|
||
data = np.genfromtxt('wood.csv',
|
||
delimiter=',', # 逗号分隔
|
||
dtype=np.int, # 数据类型
|
||
names=True, # 从第一行读入域名
|
||
converters={1:convert}
|
||
)
|
||
|
||
```
|
||
|
||
In [26]:
|
||
|
||
```py
|
||
data
|
||
|
||
```
|
||
|
||
Out[26]:
|
||
|
||
```py
|
||
array([(100, 1, 33), (110, 2, 14), (120, 1, 7), (145, 3, 3)],
|
||
dtype=[('item', '<i4'), ('material', '<i4'), ('number', '<i4')])
|
||
```
|
||
|
||
查看域:
|
||
|
||
In [27]:
|
||
|
||
```py
|
||
data['material']
|
||
|
||
```
|
||
|
||
Out[27]:
|
||
|
||
```py
|
||
array([1, 2, 1, 3])
|
||
```
|
||
|
||
删除文件:
|
||
|
||
In [28]:
|
||
|
||
```py
|
||
os.remove('wood.csv')
|
||
|
||
```
|
||
|
||
## 嵌套类型
|
||
|
||
有时候,结构数组中的域可能包含嵌套的结构,例如,在我们希望在二维平面上纪录一个质点的位置和质量:
|
||
|
||
| position | mass |
|
||
| x | y |
|
||
|
||
那么它的类型可以这样嵌套定义:
|
||
|
||
In [29]:
|
||
|
||
```py
|
||
particle_dtype = np.dtype([('position', [('x', 'float'),
|
||
('y', 'float')]),
|
||
('mass', 'float')
|
||
])
|
||
|
||
```
|
||
|
||
假设数据文件如下:
|
||
|
||
In [30]:
|
||
|
||
```py
|
||
%%writefile data.txt
|
||
2.0 3.0 42.0
|
||
2.1 4.3 32.5
|
||
1.2 4.6 32.3
|
||
4.5 -6.4 23.3
|
||
|
||
```
|
||
|
||
```py
|
||
Overwriting data.txt
|
||
|
||
```
|
||
|
||
读取数据:
|
||
|
||
In [31]:
|
||
|
||
```py
|
||
data = np.loadtxt('data.txt', dtype=particle_dtype)
|
||
|
||
```
|
||
|
||
In [32]:
|
||
|
||
```py
|
||
data
|
||
|
||
```
|
||
|
||
Out[32]:
|
||
|
||
```py
|
||
array([((2.0, 3.0), 42.0), ((2.1, 4.3), 32.5), ((1.2, 4.6), 32.3),
|
||
((4.5, -6.4), 23.3)],
|
||
dtype=[('position', [('x', '<f8'), ('y', '<f8')]), ('mass', '<f8')])
|
||
```
|
||
|
||
查看位置的 `x` 轴:
|
||
|
||
In [33]:
|
||
|
||
```py
|
||
data['position']['x']
|
||
|
||
```
|
||
|
||
Out[33]:
|
||
|
||
```py
|
||
array([ 2\. , 2.1, 1.2, 4.5])
|
||
```
|
||
|
||
删除生成的文件:
|
||
|
||
In [34]:
|
||
|
||
```py
|
||
os.remove('data.txt')
|
||
|
||
``` |