ailearning/docs/da/048.md

# 结构化数组

假设我们要保存这样的数据：

|  | name | age | wgt |
| --- | --- | --- | --- |
| 0 | dan | 1 | 23.1 |
| 1 | ann | 0 | 25.1 |
| 2 | sam | 2 | 8.3 |

希望定义一个一维数组，每个元素有三个属性 `name, age, wgt`，此时我们需要使用结构化数组。

In [1]:

```py
import numpy as np

```

定义数组 `a`：

| 0 | 1 | 2 | 3 |
| --- | --- | --- | --- |
| 1.0 | 2.0 | 3.0 | 4.0 |

In [2]:

```py
a = np.array([1.0,2.0,3.0,4.0], np.float32)

```

使用 `view` 方法，将 `a` 对应的内存按照复数来解释：

In [3]:

```py
a.view(np.complex64)

```

Out[3]:

```py
array([ 1.+2.j,  3.+4.j], dtype=complex64)
```

| 0 | 1 | 2 | 3 |
| --- | --- | --- | --- |
| 1.0 | 2.0 | 3.0 | 4.0 |
| real | imag | real | imag |

事实上，我们可以把复数看成一个结构体，第一部分是实部，第二部分是虚部，这样这个数组便可以看成是一个结构化数组。

换句话说，我们只需要换种方式解释这段内存，便可以得到结构化数组的效果！

| 0 | 1 | 2 | 3 |
| --- | --- | --- | --- |
| 1.0 | 2.0 | 3.0 | 4.0 |
| mass | vol | mass | vol |

例如，我们可以将第一个浮点数解释为质量，第二个浮点数解释为速度，则这段内存还可以看成是包含两个域（质量和速度）的结构体。

In [4]:

```py
my_dtype = np.dtype([('mass', 'float32'), ('vol', 'float32')])

```

In [5]:

```py
a.view(my_dtype)

```

Out[5]:

```py
array([(1.0, 2.0), (3.0, 4.0)],
      dtype=[('mass', '<f4'), ('vol', '<f4')])
```

这里，我们使用 `dtype` 创造了自定义的结构类型，然后用自定义的结构来解释数组 `a` 所占的内存。

这里 `f4` 表示四字节浮点数，`<` 表示小字节序。

利用这个自定义的结构类型，我们可以这样初始化结构化数组：

In [6]:

```py
my_data = np.array([(1,1), (1,2), (2,1), (1,3)], my_dtype)

print my_data

```

```py
[(1.0, 1.0) (1.0, 2.0) (2.0, 1.0) (1.0, 3.0)]

```

第一个元素：

In [7]:

```py
my_data[0]

```

Out[7]:

```py
(1.0, 1.0)
```

得到第一个元素的速度信息，可以使用域的名称来索引：

In [8]:

```py
my_data[0]['vol']

```

Out[8]:

```py
1.0
```

得到所有的质量信息：

In [9]:

```py
my_data['mass']

```

Out[9]:

```py
array([ 1.,  1.,  2.,  1.], dtype=float32)
```

自定义排序规则，先按速度，再按质量：

In [10]:

```py
my_data.sort(order=('vol', 'mass'))

print my_data

```

```py
[(1.0, 1.0) (2.0, 1.0) (1.0, 2.0) (1.0, 3.0)]

```

回到最初的例子，定义一个人的结构类型：

In [11]:

```py
person_dtype = np.dtype([('name', 'S10'), ('age', 'int'), ('weight', 'float')])

```

查看类型所占字节数：

In [12]:

```py
person_dtype.itemsize

```

Out[12]:

```py
22
```

产生一个 3 x 4 共12人的空结构体数组：

In [13]:

```py
people = np.empty((3,4), person_dtype)

```

分别赋值：

In [14]:

```py
people['name'] = [['Brad', 'Jane', 'John', 'Fred'],
                  ['Henry', 'George', 'Brain', 'Amy'],
                  ['Ron', 'Susan', 'Jennife', 'Jill']]

```

In [15]:

```py
people['age'] = [[33, 25, 47, 54],
                 [29, 61, 32, 27],
                 [19, 33, 18, 54]]

```

In [16]:

```py
people['weight'] = [[135., 105., 255., 140.],
                    [154., 202., 137., 187.],
                    [188., 135., 88., 145.]]

```

In [17]:

```py
print people

```

```py
[[('Brad', 33, 135.0) ('Jane', 25, 105.0) ('John', 47, 255.0)
  ('Fred', 54, 140.0)]
 [('Henry', 29, 154.0) ('George', 61, 202.0) ('Brain', 32, 137.0)
  ('Amy', 27, 187.0)]
 [('Ron', 19, 188.0) ('Susan', 33, 135.0) ('Jennife', 18, 88.0)
  ('Jill', 54, 145.0)]]

```

In [18]:

```py
people[-1,-1]

```

Out[18]:

```py
('Jill', 54, 145.0)
```

## 从文本中读取结构化数组

我们有这样一个文件：

In [19]:

```py
%%writefile people.txt
name age weight
amy 11 38.2
john 10 40.3
bill 12 21.2

```

```py
Writing people.txt

```

利用 `loadtxt` 指定数据类型，从这个文件中读取结构化数组：

In [20]:

```py
person_dtype = np.dtype([('name', 'S10'), ('age', 'int'), ('weight', 'float')])

people = np.loadtxt('people.txt',
                    skiprows=1,
                    dtype=person_dtype)

people

```

Out[20]:

```py
array([('amy', 11, 38.2), ('john', 10, 40.3), ('bill', 12, 21.2)],
      dtype=[('name', 'S10'), ('age', '<i4'), ('weight', '<f8')])
```

查看 `name` 域：

In [21]:

```py
people['name']

```

Out[21]:

```py
array(['amy', 'john', 'bill'],
      dtype='|S10')
```

删除文件：

In [22]:

```py
import os
os.remove('people.txt')

```

对于下面的文件：

In [23]:

```py
%%writefile wood.csv
item,material,number
100,oak,33
110,maple,14
120,oak,7
145,birch,3

```

```py
Writing wood.csv

```

定义转换函数处理材料属性，使之对应一个整数：

In [24]:

```py
tree_to_int = dict(oak = 1,
                   maple=2,
                   birch=3)

def convert(s):
    return tree_to_int.get(s, 0)

```

使用 `genfromtxt` 载入数据，可以自动从第一行读入属性名称：

In [25]:

```py
data = np.genfromtxt('wood.csv',
                     delimiter=',', # 逗号分隔
                     dtype=np.int, # 数据类型
                     names=True,   # 从第一行读入域名
                     converters={1:convert}
                    )

```

In [26]:

```py
data

```

Out[26]:

```py
array([(100, 1, 33), (110, 2, 14), (120, 1, 7), (145, 3, 3)],
      dtype=[('item', '<i4'), ('material', '<i4'), ('number', '<i4')])
```

查看域：

In [27]:

```py
data['material']

```

Out[27]:

```py
array([1, 2, 1, 3])
```

删除文件：

In [28]:

```py
os.remove('wood.csv')

```

## 嵌套类型

有时候，结构数组中的域可能包含嵌套的结构，例如，在我们希望在二维平面上纪录一个质点的位置和质量：

| position | mass |
| x | y |

那么它的类型可以这样嵌套定义：

In [29]:

```py
particle_dtype = np.dtype([('position', [('x', 'float'),
                                         ('y', 'float')]),
                           ('mass', 'float')
                          ])

```

假设数据文件如下：

In [30]:

```py
%%writefile data.txt
2.0 3.0 42.0
2.1 4.3 32.5
1.2 4.6 32.3
4.5 -6.4 23.3

```

```py
Overwriting data.txt

```

读取数据：

In [31]:

```py
data = np.loadtxt('data.txt', dtype=particle_dtype)

```

In [32]:

```py
data

```

Out[32]:

```py
array([((2.0, 3.0), 42.0), ((2.1, 4.3), 32.5), ((1.2, 4.6), 32.3),
       ((4.5, -6.4), 23.3)],
      dtype=[('position', [('x', '<f8'), ('y', '<f8')]), ('mass', '<f8')])
```

查看位置的 `x` 轴：

In [33]:

```py
data['position']['x']

```

Out[33]:

```py
array([ 2\. ,  2.1,  1.2,  4.5])
```

删除生成的文件：

In [34]:

```py
os.remove('data.txt')

```