mirror of
https://github.com/apachecn/ailearning.git
synced 2026-02-04 02:44:48 +08:00
308 lines
3.8 KiB
Markdown
308 lines
3.8 KiB
Markdown
# 数组排序
|
||
|
||
In [1]:
|
||
|
||
```py
|
||
%pylab
|
||
|
||
```
|
||
|
||
```py
|
||
Using matplotlib backend: Qt4Agg
|
||
Populating the interactive namespace from numpy and matplotlib
|
||
|
||
```
|
||
|
||
## sort 函数
|
||
|
||
先看这个例子:
|
||
|
||
In [2]:
|
||
|
||
```py
|
||
names = array(['bob', 'sue', 'jan', 'ad'])
|
||
weights = array([20.8, 93.2, 53.4, 61.8])
|
||
|
||
sort(weights)
|
||
|
||
```
|
||
|
||
Out[2]:
|
||
|
||
```py
|
||
array([ 20.8, 53.4, 61.8, 93.2])
|
||
```
|
||
|
||
`sort` 返回的结果是从小到大排列的。
|
||
|
||
## argsort 函数
|
||
|
||
`argsort` 返回从小到大的排列在数组中的索引位置:
|
||
|
||
In [3]:
|
||
|
||
```py
|
||
ordered_indices = argsort(weights)
|
||
ordered_indices
|
||
|
||
```
|
||
|
||
Out[3]:
|
||
|
||
```py
|
||
array([0, 2, 3, 1], dtype=int64)
|
||
```
|
||
|
||
可以用它来进行索引:
|
||
|
||
In [4]:
|
||
|
||
```py
|
||
weights[ordered_indices]
|
||
|
||
```
|
||
|
||
Out[4]:
|
||
|
||
```py
|
||
array([ 20.8, 53.4, 61.8, 93.2])
|
||
```
|
||
|
||
In [5]:
|
||
|
||
```py
|
||
names[ordered_indices]
|
||
|
||
```
|
||
|
||
Out[5]:
|
||
|
||
```py
|
||
array(['bob', 'jan', 'ad', 'sue'],
|
||
dtype='|S3')
|
||
```
|
||
|
||
使用函数并不会改变原来数组的值:
|
||
|
||
In [6]:
|
||
|
||
```py
|
||
weights
|
||
|
||
```
|
||
|
||
Out[6]:
|
||
|
||
```py
|
||
array([ 20.8, 93.2, 53.4, 61.8])
|
||
```
|
||
|
||
## sort 和 argsort 方法
|
||
|
||
数组也支持方法操作:
|
||
|
||
In [7]:
|
||
|
||
```py
|
||
data = array([20.8, 93.2, 53.4, 61.8])
|
||
data.argsort()
|
||
|
||
```
|
||
|
||
Out[7]:
|
||
|
||
```py
|
||
array([0, 2, 3, 1], dtype=int64)
|
||
```
|
||
|
||
`argsort` 方法与 `argsort` 函数的使用没什么区别,也不会改变数组的值。
|
||
|
||
In [8]:
|
||
|
||
```py
|
||
data
|
||
|
||
```
|
||
|
||
Out[8]:
|
||
|
||
```py
|
||
array([ 20.8, 93.2, 53.4, 61.8])
|
||
```
|
||
|
||
但是 `sort`方法会改变数组的值:
|
||
|
||
In [9]:
|
||
|
||
```py
|
||
data.sort()
|
||
|
||
```
|
||
|
||
In [10]:
|
||
|
||
```py
|
||
data
|
||
|
||
```
|
||
|
||
Out[10]:
|
||
|
||
```py
|
||
array([ 20.8, 53.4, 61.8, 93.2])
|
||
```
|
||
|
||
## 二维数组排序
|
||
|
||
对于多维数组,sort方法默认沿着最后一维开始排序:
|
||
|
||
In [11]:
|
||
|
||
```py
|
||
a = array([
|
||
[.2, .1, .5],
|
||
[.4, .8, .3],
|
||
[.9, .6, .7]
|
||
])
|
||
a
|
||
|
||
```
|
||
|
||
Out[11]:
|
||
|
||
```py
|
||
array([[ 0.2, 0.1, 0.5],
|
||
[ 0.4, 0.8, 0.3],
|
||
[ 0.9, 0.6, 0.7]])
|
||
```
|
||
|
||
对于二维数组,默认相当于对每一行进行排序:
|
||
|
||
In [12]:
|
||
|
||
```py
|
||
sort(a)
|
||
|
||
```
|
||
|
||
Out[12]:
|
||
|
||
```py
|
||
array([[ 0.1, 0.2, 0.5],
|
||
[ 0.3, 0.4, 0.8],
|
||
[ 0.6, 0.7, 0.9]])
|
||
```
|
||
|
||
改变轴,对每一列进行排序:
|
||
|
||
In [13]:
|
||
|
||
```py
|
||
sort(a, axis = 0)
|
||
|
||
```
|
||
|
||
Out[13]:
|
||
|
||
```py
|
||
array([[ 0.2, 0.1, 0.3],
|
||
[ 0.4, 0.6, 0.5],
|
||
[ 0.9, 0.8, 0.7]])
|
||
```
|
||
|
||
## searchsorted 函数
|
||
|
||
```py
|
||
searchsorted(sorted_array, values)
|
||
```
|
||
|
||
`searchsorted` 接受两个参数,其中,第一个必需是已排序的数组。
|
||
|
||
In [14]:
|
||
|
||
```py
|
||
sorted_array = linspace(0,1,5)
|
||
values = array([.1,.8,.3,.12,.5,.25])
|
||
|
||
```
|
||
|
||
In [15]:
|
||
|
||
```py
|
||
searchsorted(sorted_array, values)
|
||
|
||
```
|
||
|
||
Out[15]:
|
||
|
||
```py
|
||
array([1, 4, 2, 1, 2, 1], dtype=int64)
|
||
```
|
||
|
||
排序数组:
|
||
|
||
| 0 | 1 | 2 | 3 | 4 |
|
||
| --- | --- | --- | --- | --- |
|
||
| 0.0 | 0.25 | 0.5 | 0.75 | 1.0 |
|
||
|
||
数值:
|
||
|
||
| 值 | 0.1 | 0.8 | 0.3 | 0.12 | 0.5 | 0.25 |
|
||
| --- | --- | --- | --- | --- | --- | --- |
|
||
| 插入位置 | 1 | 4 | 2 | 1 | 2 | 1 |
|
||
|
||
`searchsorted` 返回的值相当于保持第一个数组的排序性质不变,将第二个数组中的值插入第一个数组中的位置:
|
||
|
||
例如 `0.1` 在 [0.0, 0.25) 之间,所以插入时应当放在第一个数组的索引 `1` 处,故第一个返回值为 `1`。
|
||
|
||
In [16]:
|
||
|
||
```py
|
||
from numpy.random import rand
|
||
data = rand(100)
|
||
data.sort()
|
||
|
||
```
|
||
|
||
不加括号,默认是元组:
|
||
|
||
In [17]:
|
||
|
||
```py
|
||
bounds = .4, .6
|
||
bounds
|
||
|
||
```
|
||
|
||
Out[17]:
|
||
|
||
```py
|
||
(0.4, 0.6)
|
||
```
|
||
|
||
返回这两个值对应的插入位置:
|
||
|
||
In [18]:
|
||
|
||
```py
|
||
low_idx, high_idx = searchsorted(data, bounds)
|
||
|
||
```
|
||
|
||
利用插入位置,将数组中所有在这两个值之间的值提取出来:
|
||
|
||
In [19]:
|
||
|
||
```py
|
||
data[low_idx:high_idx]
|
||
|
||
```
|
||
|
||
Out[19]:
|
||
|
||
```py
|
||
array([ 0.41122674, 0.4395727 , 0.45609773, 0.45707137, 0.45772076,
|
||
0.46029997, 0.46757401, 0.47525517, 0.4969198 , 0.53068779,
|
||
0.55764166, 0.56288568, 0.56506548, 0.57003042, 0.58035233,
|
||
0.59279233, 0.59548555])
|
||
``` |