mirror of
https://github.com/Visualize-ML/Book4_Power-of-Matrix.git
synced 2026-02-02 18:21:08 +08:00
412 lines
8.7 KiB
Plaintext
412 lines
8.7 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "73bd968b-d970-4a05-94ef-4e7abf990827",
|
||
"metadata": {},
|
||
"source": [
|
||
"Chapter 02\n",
|
||
"\n",
|
||
"# 余弦距离\n",
|
||
"Book_4《矩阵力量》 | 鸢尾花书:从加减乘除到机器学习 (第二版)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "a4d2d126-5db0-4038-a1dc-9a83708d2220",
|
||
"metadata": {},
|
||
"source": [
|
||
"此代码加载了鸢尾花数据集,并提取了数据集中的几个特定数据点以计算它们之间的余弦距离和夹角余弦值。选取的特征包含所有特征,而数据点分别为第1、2、51、101个观测。\n",
|
||
"\n",
|
||
"### 计算公式\n",
|
||
"代码使用余弦距离和余弦相似度来度量两个向量之间的相似性。\n",
|
||
"\n",
|
||
"1. 余弦距离公式:\n",
|
||
"\n",
|
||
"$$\n",
|
||
"\\text{cosine\\_distance} = 1 - \\frac{x_1 \\cdot x_2}{\\|x_1\\| \\|x_2\\|}\n",
|
||
"$$\n",
|
||
"\n",
|
||
"2. 向量间夹角的余弦值:\n",
|
||
"\n",
|
||
"$$\n",
|
||
"\\cos \\theta = \\frac{x_1 \\cdot x_2}{\\|x_1\\| \\|x_2\\|}\n",
|
||
"$$\n",
|
||
"\n",
|
||
"余弦距离反映了两个向量间的角度差异,而夹角余弦值则用于计算向量的相似性。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "6aec7dfe-a3d4-4cb9-b921-b3a137455aca",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 导入所需库"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"id": "83ff9c7b-6adb-4fe1-91ef-4325850851b2",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from scipy.spatial import distance # 导入SciPy库,用于计算向量间的距离\n",
|
||
"from sklearn import datasets # 导入sklearn数据集模块\n",
|
||
"import numpy as np # 导入NumPy库,用于数值计算"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "4723d4a7-92b7-4c8d-8dcc-e678a7daded0",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 导入鸢尾花数据集"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"id": "f4654f70-41cf-40b9-ad1d-5e39c337a40a",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"iris = datasets.load_iris() # 加载鸢尾花数据集"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "a7d90350-e6b8-4f35-a30f-f17e3042bbfa",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 使用前两个特征:萼片长度和萼片宽度"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"id": "111624e1-5071-43b5-ad7f-f91c1ea86f1f",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"X = iris.data[:, :] # 提取所有特征数据"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "08b685c0-3e21-4c99-8f19-d15fc50ba309",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 提取4个数据点"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"id": "af961fce-c345-4d97-a8d8-dc1fd27f75e6",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([5.1, 3.5, 1.4, 0.2])"
|
||
]
|
||
},
|
||
"execution_count": 4,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"x1_data = X[0, :] # 第一个数据点\n",
|
||
"x1_data"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"id": "98593e09-12b0-43a1-81c0-97751c3f44cf",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([4.9, 3. , 1.4, 0.2])"
|
||
]
|
||
},
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"x2_data = X[1, :] # 第二个数据点\n",
|
||
"x2_data"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"id": "261efd1f-06bc-4361-996b-62a2f621234a",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([7. , 3.2, 4.7, 1.4])"
|
||
]
|
||
},
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"x51_data = X[50, :] # 第51个数据点\n",
|
||
"x51_data"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"id": "637692dd-8df7-472c-97e5-c1be1475c891",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([6.3, 3.3, 6. , 2.5])"
|
||
]
|
||
},
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"x101_data = X[100, :] # 第101个数据点\n",
|
||
"x101_data"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "a14578f6-7866-4fea-8354-2633c0a50df7",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 计算余弦距离和夹角余弦值"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"id": "d73204a7-6235-4449-a3ea-90d677c04fe6",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0.0014208364959781283"
|
||
]
|
||
},
|
||
"execution_count": 8,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"x1_x2_cos_dist = distance.cosine(x1_data, x2_data) \n",
|
||
"# 计算x1和x2的余弦距离\n",
|
||
"x1_x2_cos_dist"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"id": "1b3df27e-df51-4564-a192-b3d0544fd820",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"x1_norm = np.linalg.norm(x1_data) # 计算x1的L2范数"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 10,
|
||
"id": "efb9a6c3-ccfa-4000-8135-677fdd68c81d",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"x2_norm = np.linalg.norm(x2_data) # 计算x2的L2范数"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 11,
|
||
"id": "06c48d91-451f-404c-8e11-6006943c4a38",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"37.489999999999995"
|
||
]
|
||
},
|
||
"execution_count": 11,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"x1_dot_x2 = x1_data.T @ x2_data # 计算x1和x2的点积\n",
|
||
"x1_dot_x2"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"id": "5818da3f-985c-4c48-8fc9-159ec8067f2e",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0.9985791635040218"
|
||
]
|
||
},
|
||
"execution_count": 12,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"x1_x2_cos = x1_dot_x2 / x1_norm / x2_norm # 计算x1和x2的夹角余弦值\n",
|
||
"x1_x2_cos"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"id": "f3e0969d-ffbf-4c10-95ee-69898c57dc0c",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0.0014208364959782394"
|
||
]
|
||
},
|
||
"execution_count": 16,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"1 - x1_x2_cos"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 15,
|
||
"id": "62c0e4a9-c455-4290-9389-a2ee56c2adc1",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0.0014208364959781283"
|
||
]
|
||
},
|
||
"execution_count": 15,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"x1_x2_cos_dist = distance.cosine(x1_data, x2_data) # 计算x1和x2的余弦距离\n",
|
||
"x1_x2_cos_dist"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"id": "46e0f1db-9989-4908-89de-ffe0f0a8f431",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0.07161964128508791"
|
||
]
|
||
},
|
||
"execution_count": 13,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"x1_x51_cos_dist = distance.cosine(x1_data, x51_data) # 计算x1和x51的余弦距离\n",
|
||
"x1_x51_cos_dist"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 14,
|
||
"id": "0f516143-46a8-42ba-8e31-fe4b08d428de",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0.1399186683412712"
|
||
]
|
||
},
|
||
"execution_count": 14,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"x1_x101_cos_dist = distance.cosine(x1_data, x101_data) # 计算x1和x101的余弦距离\n",
|
||
"x1_x101_cos_dist"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "85a80909-2aac-49ed-bb7a-f8cc6b80ee7d",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "ecd322f4-f919-4be2-adc3-69d28ef25e69",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.12.7"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|