Files
Iris Series: Visualize Math -- From Arithmetic Basics to Machine Learning 79be5dda7d Add files via upload
2025-02-01 17:06:45 +08:00

412 lines
8.7 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"id": "73bd968b-d970-4a05-94ef-4e7abf990827",
"metadata": {},
"source": [
"Chapter 02\n",
"\n",
"# 余弦距离\n",
"Book_4《矩阵力量》 | 鸢尾花书:从加减乘除到机器学习 (第二版)"
]
},
{
"cell_type": "markdown",
"id": "a4d2d126-5db0-4038-a1dc-9a83708d2220",
"metadata": {},
"source": [
"此代码加载了鸢尾花数据集并提取了数据集中的几个特定数据点以计算它们之间的余弦距离和夹角余弦值。选取的特征包含所有特征而数据点分别为第1、2、51、101个观测。\n",
"\n",
"### 计算公式\n",
"代码使用余弦距离和余弦相似度来度量两个向量之间的相似性。\n",
"\n",
"1. 余弦距离公式:\n",
"\n",
"$$\n",
"\\text{cosine\\_distance} = 1 - \\frac{x_1 \\cdot x_2}{\\|x_1\\| \\|x_2\\|}\n",
"$$\n",
"\n",
"2. 向量间夹角的余弦值:\n",
"\n",
"$$\n",
"\\cos \\theta = \\frac{x_1 \\cdot x_2}{\\|x_1\\| \\|x_2\\|}\n",
"$$\n",
"\n",
"余弦距离反映了两个向量间的角度差异,而夹角余弦值则用于计算向量的相似性。"
]
},
{
"cell_type": "markdown",
"id": "6aec7dfe-a3d4-4cb9-b921-b3a137455aca",
"metadata": {},
"source": [
"## 导入所需库"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "83ff9c7b-6adb-4fe1-91ef-4325850851b2",
"metadata": {},
"outputs": [],
"source": [
"from scipy.spatial import distance # 导入SciPy库用于计算向量间的距离\n",
"from sklearn import datasets # 导入sklearn数据集模块\n",
"import numpy as np # 导入NumPy库用于数值计算"
]
},
{
"cell_type": "markdown",
"id": "4723d4a7-92b7-4c8d-8dcc-e678a7daded0",
"metadata": {},
"source": [
"## 导入鸢尾花数据集"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "f4654f70-41cf-40b9-ad1d-5e39c337a40a",
"metadata": {},
"outputs": [],
"source": [
"iris = datasets.load_iris() # 加载鸢尾花数据集"
]
},
{
"cell_type": "markdown",
"id": "a7d90350-e6b8-4f35-a30f-f17e3042bbfa",
"metadata": {},
"source": [
"## 使用前两个特征:萼片长度和萼片宽度"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "111624e1-5071-43b5-ad7f-f91c1ea86f1f",
"metadata": {},
"outputs": [],
"source": [
"X = iris.data[:, :] # 提取所有特征数据"
]
},
{
"cell_type": "markdown",
"id": "08b685c0-3e21-4c99-8f19-d15fc50ba309",
"metadata": {},
"source": [
"## 提取4个数据点"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "af961fce-c345-4d97-a8d8-dc1fd27f75e6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([5.1, 3.5, 1.4, 0.2])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x1_data = X[0, :] # 第一个数据点\n",
"x1_data"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "98593e09-12b0-43a1-81c0-97751c3f44cf",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([4.9, 3. , 1.4, 0.2])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x2_data = X[1, :] # 第二个数据点\n",
"x2_data"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "261efd1f-06bc-4361-996b-62a2f621234a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([7. , 3.2, 4.7, 1.4])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x51_data = X[50, :] # 第51个数据点\n",
"x51_data"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "637692dd-8df7-472c-97e5-c1be1475c891",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([6.3, 3.3, 6. , 2.5])"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x101_data = X[100, :] # 第101个数据点\n",
"x101_data"
]
},
{
"cell_type": "markdown",
"id": "a14578f6-7866-4fea-8354-2633c0a50df7",
"metadata": {},
"source": [
"## 计算余弦距离和夹角余弦值"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "d73204a7-6235-4449-a3ea-90d677c04fe6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.0014208364959781283"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x1_x2_cos_dist = distance.cosine(x1_data, x2_data) \n",
"# 计算x1和x2的余弦距离\n",
"x1_x2_cos_dist"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "1b3df27e-df51-4564-a192-b3d0544fd820",
"metadata": {},
"outputs": [],
"source": [
"x1_norm = np.linalg.norm(x1_data) # 计算x1的L2范数"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "efb9a6c3-ccfa-4000-8135-677fdd68c81d",
"metadata": {},
"outputs": [],
"source": [
"x2_norm = np.linalg.norm(x2_data) # 计算x2的L2范数"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "06c48d91-451f-404c-8e11-6006943c4a38",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"37.489999999999995"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x1_dot_x2 = x1_data.T @ x2_data # 计算x1和x2的点积\n",
"x1_dot_x2"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "5818da3f-985c-4c48-8fc9-159ec8067f2e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9985791635040218"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x1_x2_cos = x1_dot_x2 / x1_norm / x2_norm # 计算x1和x2的夹角余弦值\n",
"x1_x2_cos"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "f3e0969d-ffbf-4c10-95ee-69898c57dc0c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.0014208364959782394"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"1 - x1_x2_cos"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "62c0e4a9-c455-4290-9389-a2ee56c2adc1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.0014208364959781283"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x1_x2_cos_dist = distance.cosine(x1_data, x2_data) # 计算x1和x2的余弦距离\n",
"x1_x2_cos_dist"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "46e0f1db-9989-4908-89de-ffe0f0a8f431",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.07161964128508791"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x1_x51_cos_dist = distance.cosine(x1_data, x51_data) # 计算x1和x51的余弦距离\n",
"x1_x51_cos_dist"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "0f516143-46a8-42ba-8e31-fe4b08d428de",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.1399186683412712"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x1_x101_cos_dist = distance.cosine(x1_data, x101_data) # 计算x1和x101的余弦距离\n",
"x1_x101_cos_dist"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "85a80909-2aac-49ed-bb7a-f8cc6b80ee7d",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "ecd322f4-f919-4be2-adc3-69d28ef25e69",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}