更新到6.SVM 寻找最大间隔

2026-05-09 15:52:33 +08:00 · 2017-04-10 16:20:22 +08:00
parent fda805f077
commit ca5860c9f7
4 changed files with 62 additions and 6 deletions
--- a/docs/6.支持向量机.md
+++ b/docs/6.支持向量机.md
@@ -1,12 +1,68 @@
-
 # 6) 支持向量机

-* 基本概念
+![支持向量机_首页](/images/6.SVM/SVM_1.jpg)
+
+## 支持向量机的概念
+
+> 支持向量机(Support Vector Machines, SVM)
+
+* 支持向量(Support Vector)就是离分隔超平面最近的那些点。
+* 机(Machine)就是表示一种算法，而不是表示机器。
+* SVM有很多种实现，最流行的一种实现是： `序列最小优化(Sequential Minimal Optimization, SMO)算法`。
+* 下面还会介绍一种称为`核函数(kernel)`的方式将SVM扩展到更多数据集上。
+* 实战项目：回顾第1章中手写识别的案例，并考察其能否通过SVM来提高识别的效果。
+* 注意：`SVM几何含义比较直观，但其算法实现较复杂，牵扯大量数学公式的推导。`
+
+```
+优点：泛化错误率低，计算开销不大，结果易理解。
+缺点：对参数调节和核函数的选择敏感，原始分类器不加修改仅适合于处理二分类问题。
+使用数据类型：数值型和标称型数据。
+```
+
+## 基于最大间隔分隔数据
+
+* 数据可以通过画一条直线就可以将它们完全分开，这组数据叫`线性可分(linearly separable)`数据。
+* 而这条分隔直线称为`分隔超平面(separating hyperplane)`。
+* 如果数据集上升到1024维呢？那么需要1023维来分隔数据集，也就说需要N-1维的对象来分隔，这个对象叫做`超平面(hyperlane)`，也就是分类的决策边界。
+* ![分隔超平面](/images/6.SVM/SVM_2_separating-hyperplane.jpg)
+
+## 寻找最大间隔
+
+> 为什么寻找最大间隔
+
+```
+摘录地址：http://slideplayer.com/slide/8610144  (第12条信息)
+Support Vector Machines: Slide 12 Copyright © 2001, 2003, Andrew W. Moore Why Maximum Margin? 
+denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. 
+This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against 
+
+1.Intuitively this feels safest. 
+2.If we’ve made a small error in the location of the boundary (it’s been jolted in its perpendicular direction) this gives us least chance of causing a misclassification. 
+3.CV is easy since the model is immune to removal of any non-support-vector datapoints. 
+4.There’s some theory that this is a good thing. 
+5.Empirically it works very very well. 
+
+* * *
+
+1. 直觉上是安全的
+2. 如果我们在边界的位置发生了一个小错误（它在垂直方向上被颠倒），这给我们最小的错误分类机会。
+3. CV很容易，因为该模型对任何非支持向量数据点的去除是免疫的。
+4. 有一些理论，这是一件好事。
+5. 通常它的工作非常好。
+```
+
+* 选择D会比B、C分隔的效果要好很多，原因是上述的5个结论。
+![线性可分](/images/6.SVM/SVM_3_linearly-separable.jpg)
+
+> 怎么寻找最大间隔
+
+
+
+
+
+
+

-    * 假设有两组数据，我们可以用一条线将这两个数据集分隔开，这条直线称为分隔超平面
-    （当维数很高时称为分隔超平面）。
-    * 我们希望找到离分隔超平面最近的点，确保它们离分隔面的距离尽可能远。
-    * 支持向量就是离分隔超平面最近的那些点。

 * 支持向量机的目标是找出能够最大化训练集数据间隔（margin）的最优分类超平面。

--- a/images/6.SVM/SVM_1.jpg
+++ b/images/6.SVM/SVM_1.jpg
--- a/images/6.SVM/SVM_2_separating-hyperplane.jpg
+++ b/images/6.SVM/SVM_2_separating-hyperplane.jpg
--- a/images/6.SVM/SVM_3_linearly-separable.jpg
+++ b/images/6.SVM/SVM_3_linearly-separable.jpg