代码整理

2026-04-03 19:07:41 +08:00 · 2021-03-20 17:35:14 +08:00
parent 893aed5d16
commit 3cfd4f47c5
230 changed files with 1027931 additions and 0 deletions
--- a/机器学习/殷康龙/机器学习实战1源代码/1.MLFoundation/NumPy.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/1.MLFoundation/NumPy.py
@@ -56,3 +56,5 @@ print(myEye - eye(4))
 '''
 如果上面的代码运行没有问题，说明numpy安装没有问题
 '''
+
+# %%
--- a/机器学习/殷康龙/机器学习实战1源代码/10.kmeans/init.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/10.kmeans/init.py
--- a/机器学习/殷康龙/机器学习实战1源代码/10.kmeans/k-means.md
+++ b/机器学习/殷康龙/机器学习实战1源代码/10.kmeans/k-means.md
--- a/机器学习/殷康龙/机器学习实战1源代码/10.kmeans/kMeans.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/10.kmeans/kMeans.py
--- a/机器学习/殷康龙/机器学习实战1源代码/10.kmeans/kMeansSklearn.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/10.kmeans/kMeansSklearn.py
--- a/机器学习/殷康龙/机器学习实战1源代码/10.kmeans/test.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/10.kmeans/test.txt
--- a/机器学习/殷康龙/机器学习实战1源代码/11.Apriori/apriori.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/11.Apriori/apriori.py
--- a/机器学习/殷康龙/机器学习实战1源代码/12.FrequentPattemTree/fpGrowth.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/12.FrequentPattemTree/fpGrowth.py
--- a/机器学习/殷康龙/机器学习实战1源代码/13.PCA/pca.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/13.PCA/pca.py
--- a/机器学习/殷康龙/机器学习实战1源代码/14.SVD/svdRecommend.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/14.SVD/svdRecommend.py
--- a/机器学习/殷康龙/机器学习实战1源代码/15.BigData_MapReduce/mrMean.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/15.BigData_MapReduce/mrMean.py
--- a/机器学习/殷康龙/机器学习实战1源代码/15.BigData_MapReduce/mrMeanMapper.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/15.BigData_MapReduce/mrMeanMapper.py
--- a/机器学习/殷康龙/机器学习实战1源代码/15.BigData_MapReduce/mrMeanReducer.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/15.BigData_MapReduce/mrMeanReducer.py
--- a/机器学习/殷康龙/机器学习实战1源代码/15.BigData_MapReduce/mrSVM.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/15.BigData_MapReduce/mrSVM.py
--- a/机器学习/殷康龙/机器学习实战1源代码/15.BigData_MapReduce/mrSVMkickStart.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/15.BigData_MapReduce/mrSVMkickStart.py
--- a/机器学习/殷康龙/机器学习实战1源代码/15.BigData_MapReduce/pegasos.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/15.BigData_MapReduce/pegasos.py
--- a/机器学习/殷康龙/机器学习实战1源代码/15.BigData_MapReduce/proximalSVM.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/15.BigData_MapReduce/proximalSVM.py
--- a/机器学习/殷康龙/机器学习实战1源代码/15.BigData_MapReduce/py27dbg.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/15.BigData_MapReduce/py27dbg.py
--- a/机器学习/殷康龙/机器学习实战1源代码/15.BigData_MapReduce/wc.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/15.BigData_MapReduce/wc.py
--- a/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/RS-itemcf.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/RS-itemcf.py
--- a/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/RS-usercf.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/RS-usercf.py
--- a/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/python/Recommender.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/python/Recommender.py
--- a/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/sklearn-RS-demo-cf-item-test.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/sklearn-RS-demo-cf-item-test.py
--- a/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/sklearn-RS-demo-item.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/sklearn-RS-demo-item.py
--- a/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/sklearn-RS-demo-user.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/sklearn-RS-demo-user.py
--- a/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/sklearn-RS-demo.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/sklearn-RS-demo.py
--- a/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/test_evaluation_model.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/test_evaluation_model.py
--- a/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/test_graph-based.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/test_graph-based.py
--- a/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/test_lfm.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/test_lfm.py
--- a/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/test_基于物品.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/test_基于物品.py
--- a/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/test_基于用户.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/16.RecommenderSystems/test_基于用户.py
--- a/机器学习/殷康龙/机器学习实战1源代码/2.KNN/kNN.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/2.KNN/kNN.py
@@ -323,3 +323,5 @@ if __name__ == '__main__':
    # test1()
    # datingClassTest()
    handwritingClassTest()
+
+# %%
--- a/机器学习/殷康龙/机器学习实战1源代码/2.KNN/sklearn-knn-demo.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/2.KNN/sklearn-knn-demo.py
--- a/机器学习/殷康龙/机器学习实战1源代码/3.DecisionTree/DTSklearn.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/3.DecisionTree/DTSklearn.py
--- a/机器学习/殷康龙/机器学习实战1源代码/3.DecisionTree/DecisionTree.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/3.DecisionTree/DecisionTree.py
--- a/机器学习/殷康龙/机器学习实战1源代码/3.DecisionTree/decisionTreePlot.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/3.DecisionTree/decisionTreePlot.py
--- a/机器学习/殷康龙/机器学习实战1源代码/3.DecisionTree/skelearn_dts_regressor_demo.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/3.DecisionTree/skelearn_dts_regressor_demo.py
--- a/机器学习/殷康龙/机器学习实战1源代码/3.DecisionTree/sklearn_dts_classify_demo.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/3.DecisionTree/sklearn_dts_classify_demo.py
--- a/机器学习/殷康龙/机器学习实战1源代码/4.NaiveBayes/bayes.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/4.NaiveBayes/bayes.py
--- a/机器学习/殷康龙/机器学习实战1源代码/4.NaiveBayes/sklearn-nb-demo.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/4.NaiveBayes/sklearn-nb-demo.py
--- a/机器学习/殷康龙/机器学习实战1源代码/5.Logistic/logistic.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/5.Logistic/logistic.py
--- a/机器学习/殷康龙/机器学习实战1源代码/5.Logistic/sklearn_logisticRegression_demo.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/5.Logistic/sklearn_logisticRegression_demo.py
--- a/机器学习/殷康龙/机器学习实战1源代码/6.SVM/sklearn-svm-demo.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/6.SVM/sklearn-svm-demo.py
--- a/机器学习/殷康龙/机器学习实战1源代码/6.SVM/svm-complete.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/6.SVM/svm-complete.py
--- a/机器学习/殷康龙/机器学习实战1源代码/6.SVM/svm-complete_Non-Kernel.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/6.SVM/svm-complete_Non-Kernel.py
--- a/机器学习/殷康龙/机器学习实战1源代码/6.SVM/svm-simple.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/6.SVM/svm-simple.py
--- a/机器学习/殷康龙/机器学习实战1源代码/7.AdaBoost/adaboost.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/7.AdaBoost/adaboost.py
--- a/机器学习/殷康龙/机器学习实战1源代码/7.AdaBoost/roc_test.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/7.AdaBoost/roc_test.py
--- a/机器学习/殷康龙/机器学习实战1源代码/7.AdaBoost/sklearn-adaboost-demo.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/7.AdaBoost/sklearn-adaboost-demo.py
--- a/机器学习/殷康龙/机器学习实战1源代码/7.RandomForest/randomForest.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/7.RandomForest/randomForest.py
--- a/机器学习/殷康龙/机器学习实战1源代码/8.Regression/regression.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/8.Regression/regression.py
--- a/机器学习/殷康龙/机器学习实战1源代码/8.Regression/sklearn-regression-demo.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/8.Regression/sklearn-regression-demo.py
--- a/机器学习/殷康龙/机器学习实战1源代码/9.RegTrees/RTSklearn.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/9.RegTrees/RTSklearn.py
--- a/机器学习/殷康龙/机器学习实战1源代码/9.RegTrees/regTrees.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/9.RegTrees/regTrees.py
--- a/机器学习/殷康龙/机器学习实战1源代码/9.RegTrees/sklearn-regressTree-demo.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/9.RegTrees/sklearn-regressTree-demo.py
--- a/机器学习/殷康龙/机器学习实战1源代码/9.RegTrees/treeExplore.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/9.RegTrees/treeExplore.py
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch02/EXTRAS/README.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch02/EXTRAS/README.txt
@@ -0,0 +1,8 @@
+This folder contains the code used to create the plots in the examples.  
+The code is not very difficult, however I never meant for it to go out to readers. 
+ It’s not the cleanest code nor very well documented.  Most of the time I threw 
+together a dirty hack to make the plots look right, with no thoughts about 
+efficiency or readability.   I’m providing it as-is, if you have a question on 
+how it works or why I did something please ask, I will be more than happy to answer 
+any questions.  
+Peter Harrington
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch02/EXTRAS/createDist.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch02/EXTRAS/createDist.py
@@ -0,0 +1,65 @@
+'''
+Created on Oct 6, 2010
+
+@author: Peter
+'''
+from numpy import *
+import matplotlib
+import matplotlib.pyplot as plt
+from matplotlib.patches import Rectangle
+
+
+n = 1000 #number of points to create
+xcord = zeros((n))
+ycord = zeros((n))
+markers =[]
+colors =[]
+fw = open('testSet.txt','w')
+for i in range(n):
+    [r0,r1] = random.standard_normal(2)
+    myClass = random.uniform(0,1)
+    if (myClass <= 0.16):
+        fFlyer = random.uniform(22000, 60000)
+        tats = 3 + 1.6*r1
+        markers.append(20)
+        colors.append(2.1)
+        classLabel = 1 #'didntLike'
+        print ("%d, %f, class1") % (fFlyer, tats)
+    elif ((myClass > 0.16) and (myClass <= 0.33)):
+        fFlyer = 6000*r0 + 70000
+        tats = 10 + 3*r1 + 2*r0
+        markers.append(20)
+        colors.append(1.1)
+        classLabel = 1 #'didntLike'
+        print ("%d, %f, class1") % (fFlyer, tats)
+    elif ((myClass > 0.33) and (myClass <= 0.66)):
+        fFlyer = 5000*r0 + 10000
+        tats = 3 + 2.8*r1
+        markers.append(30)
+        colors.append(1.1)
+        classLabel = 2 #'smallDoses'
+        print ("%d, %f, class2") % (fFlyer, tats)
+    else:
+        fFlyer = 10000*r0 + 35000
+        tats = 10 + 2.0*r1
+        markers.append(50)
+        colors.append(0.1)
+        classLabel = 3 #'largeDoses'
+        print ("%d, %f, class3") % (fFlyer, tats)
+    if (tats < 0): tats =0
+    if (fFlyer < 0): fFlyer =0
+    xcord[i] = fFlyer; ycord[i]=tats
+    fw.write("%d\t%f\t%f\t%d\n" % (fFlyer, tats, random.uniform(0.0, 1.7), classLabel))
+
+fw.close()
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.scatter(xcord,ycord, c=colors, s=markers)
+type1 = ax.scatter([-10], [-10], s=20, c='red')
+type2 = ax.scatter([-10], [-15], s=30, c='green')
+type3 = ax.scatter([-10], [-20], s=50, c='blue')
+ax.legend([type1, type2, type3], ["Class 1", "Class 2", "Class 3"], loc=2)
+#ax.axis([-5000,100000,-2,25])
+plt.xlabel('Frequent Flyier Miles Earned Per Year')
+plt.ylabel('Percentage of Body Covered By Tatoos')
+plt.show()
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch02/EXTRAS/createDist2.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch02/EXTRAS/createDist2.py
@@ -0,0 +1,68 @@
+'''
+Created on Oct 6, 2010
+
+@author: Peter
+'''
+from numpy import *
+import matplotlib
+import matplotlib.pyplot as plt
+from matplotlib.patches import Rectangle
+
+
+n = 1000 #number of points to create
+xcord1 = []; ycord1 = []
+xcord2 = []; ycord2 = []
+xcord3 = []; ycord3 = []
+markers =[]
+colors =[]
+fw = open('testSet.txt','w')
+for i in range(n):
+    [r0,r1] = random.standard_normal(2)
+    myClass = random.uniform(0,1)
+    if (myClass <= 0.16):
+        fFlyer = random.uniform(22000, 60000)
+        tats = 3 + 1.6*r1
+        markers.append(20)
+        colors.append(2.1)
+        classLabel = 1 #'didntLike'
+        xcord1.append(fFlyer); ycord1.append(tats)
+    elif ((myClass > 0.16) and (myClass <= 0.33)):
+        fFlyer = 6000*r0 + 70000
+        tats = 10 + 3*r1 + 2*r0
+        markers.append(20)
+        colors.append(1.1)
+        classLabel = 1 #'didntLike'
+        if (tats < 0): tats =0
+        if (fFlyer < 0): fFlyer =0
+        xcord1.append(fFlyer); ycord1.append(tats)
+    elif ((myClass > 0.33) and (myClass <= 0.66)):
+        fFlyer = 5000*r0 + 10000
+        tats = 3 + 2.8*r1
+        markers.append(30)
+        colors.append(1.1)
+        classLabel = 2 #'smallDoses'
+        if (tats < 0): tats =0
+        if (fFlyer < 0): fFlyer =0
+        xcord2.append(fFlyer); ycord2.append(tats)
+    else:
+        fFlyer = 10000*r0 + 35000
+        tats = 10 + 2.0*r1
+        markers.append(50)
+        colors.append(0.1)
+        classLabel = 3 #'largeDoses'
+        if (tats < 0): tats =0
+        if (fFlyer < 0): fFlyer =0
+        xcord3.append(fFlyer); ycord3.append(tats)    
+
+fw.close()
+fig = plt.figure()
+ax = fig.add_subplot(111)
+#ax.scatter(xcord,ycord, c=colors, s=markers)
+type1 = ax.scatter(xcord1, ycord1, s=20, c='red')
+type2 = ax.scatter(xcord2, ycord2, s=30, c='green')
+type3 = ax.scatter(xcord3, ycord3, s=50, c='blue')
+ax.legend([type1, type2, type3], ["Did Not Like", "Liked in Small Doses", "Liked in Large Doses"], loc=2)
+ax.axis([-5000,100000,-2,25])
+plt.xlabel('Frequent Flyier Miles Earned Per Year')
+plt.ylabel('Percentage of Time Spent Playing Video Games')
+plt.show()
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch02/EXTRAS/createFirstPlot.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch02/EXTRAS/createFirstPlot.py
@@ -0,0 +1,18 @@
+'''
+Created on Oct 27, 2010
+
+@author: Peter
+'''
+from numpy import *
+import kNN
+import matplotlib
+import matplotlib.pyplot as plt
+fig = plt.figure()
+ax = fig.add_subplot(111)
+datingDataMat,datingLabels = kNN.file2matrix('datingTestSet.txt')
+#ax.scatter(datingDataMat[:,1], datingDataMat[:,2])
+ax.scatter(datingDataMat[:,1], datingDataMat[:,2], 15.0*array(datingLabels), 15.0*array(datingLabels))
+ax.axis([-2,25,-0.2,2.0])
+plt.xlabel('Percentage of Time Spent Playing Video Games')
+plt.ylabel('Liters of Ice Cream Consumed Per Week')
+plt.show()
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch02/EXTRAS/testSet.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch02/EXTRAS/testSet.txt
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch02/README.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch02/README.txt
@@ -0,0 +1,3 @@
+The code for the examples in Ch.1 is contained in the python module: kNN.py.
+The examples assume that datingTestSet.txt is in the current working directory.  
+Folders testDigits, and trainingDigits are assumed to be in this folder also.  
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch02/datingTestSet.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch02/datingTestSet.txt
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch02/datingTestSet2.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch02/datingTestSet2.txt
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch02/digits.zip
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch02/digits.zip
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch02/kNN.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch02/kNN.py
@@ -0,0 +1,108 @@
+'''
+Created on Sep 16, 2010
+kNN: k Nearest Neighbors
+
+Input:      inX: vector to compare to existing dataset (1xN)
+            dataSet: size m data set of known vectors (NxM)
+            labels: data set labels (1xM vector)
+            k: number of neighbors to use for comparison (should be an odd number)
+            
+Output:     the most popular class label
+
+@author: pbharrin
+'''
+from numpy import *
+import operator
+from os import listdir
+
+def classify0(inX, dataSet, labels, k):
+    dataSetSize = dataSet.shape[0]
+    diffMat = tile(inX, (dataSetSize,1)) - dataSet
+    sqDiffMat = diffMat**2
+    sqDistances = sqDiffMat.sum(axis=1)
+    distances = sqDistances**0.5
+    sortedDistIndicies = distances.argsort()     
+    classCount={}          
+    for i in range(k):
+        voteIlabel = labels[sortedDistIndicies[i]]
+        classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1
+    sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)
+    return sortedClassCount[0][0]
+
+def createDataSet():
+    group = array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
+    labels = ['A','A','B','B']
+    return group, labels
+
+def file2matrix(filename):
+    fr = open(filename)
+    numberOfLines = len(fr.readlines())         #get the number of lines in the file
+    returnMat = zeros((numberOfLines,3))        #prepare matrix to return
+    classLabelVector = []                       #prepare labels return   
+    fr = open(filename)
+    index = 0
+    for line in fr.readlines():
+        line = line.strip()
+        listFromLine = line.split('\t')
+        returnMat[index,:] = listFromLine[0:3]
+        classLabelVector.append(int(listFromLine[-1]))
+        index += 1
+    return returnMat,classLabelVector
+    
+def autoNorm(dataSet):
+    minVals = dataSet.min(0)
+    maxVals = dataSet.max(0)
+    ranges = maxVals - minVals
+    normDataSet = zeros(shape(dataSet))
+    m = dataSet.shape[0]
+    normDataSet = dataSet - tile(minVals, (m,1))
+    normDataSet = normDataSet/tile(ranges, (m,1))   #element wise divide
+    return normDataSet, ranges, minVals
+   
+def datingClassTest():
+    hoRatio = 0.50      #hold out 10%
+    datingDataMat,datingLabels = file2matrix('datingTestSet2.txt')       #load data setfrom file
+    normMat, ranges, minVals = autoNorm(datingDataMat)
+    m = normMat.shape[0]
+    numTestVecs = int(m*hoRatio)
+    errorCount = 0.0
+    for i in range(numTestVecs):
+        classifierResult = classify0(normMat[i,:],normMat[numTestVecs:m,:],datingLabels[numTestVecs:m],3)
+        print "the classifier came back with: %d, the real answer is: %d" % (classifierResult, datingLabels[i])
+        if (classifierResult != datingLabels[i]): errorCount += 1.0
+    print "the total error rate is: %f" % (errorCount/float(numTestVecs))
+    print errorCount
+    
+def img2vector(filename):
+    returnVect = zeros((1,1024))
+    fr = open(filename)
+    for i in range(32):
+        lineStr = fr.readline()
+        for j in range(32):
+            returnVect[0,32*i+j] = int(lineStr[j])
+    return returnVect
+
+def handwritingClassTest():
+    hwLabels = []
+    trainingFileList = listdir('trainingDigits')           #load the training set
+    m = len(trainingFileList)
+    trainingMat = zeros((m,1024))
+    for i in range(m):
+        fileNameStr = trainingFileList[i]
+        fileStr = fileNameStr.split('.')[0]     #take off .txt
+        classNumStr = int(fileStr.split('_')[0])
+        hwLabels.append(classNumStr)
+        trainingMat[i,:] = img2vector('trainingDigits/%s' % fileNameStr)
+    testFileList = listdir('testDigits')        #iterate through the test set
+    errorCount = 0.0
+    mTest = len(testFileList)
+    for i in range(mTest):
+        fileNameStr = testFileList[i]
+        fileStr = fileNameStr.split('.')[0]     #take off .txt
+        classNumStr = int(fileStr.split('_')[0])
+        vectorUnderTest = img2vector('testDigits/%s' % fileNameStr)
+        classifierResult = classify0(vectorUnderTest, trainingMat, hwLabels, 3)
+        print "the classifier came back with: %d, the real answer is: %d" % (classifierResult, classNumStr)
+        if (classifierResult != classNumStr): errorCount += 1.0
+    print "\nthe total number of errors is: %d" % errorCount
+    print "\nthe total error rate is: %f" % (errorCount/float(mTest))
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch02/kNN.pyc
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch02/kNN.pyc
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch02/testSet.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch02/testSet.txt
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch03/classifierStorage.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch03/classifierStorage.txt
@@ -0,0 +1,18 @@
+(dp0
+S'booze'
+p1
+(dp2
+I0
+S'no'
+p3
+sI1
+(dp4
+S'weed'
+p5
+(dp6
+I0
+g3
+sI1
+S'yes'
+p7
+ssss.
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch03/lenses.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch03/lenses.txt
@@ -0,0 +1,24 @@
+young	myope	no	reduced	no lenses
+young	myope	no	normal	soft
+young	myope	yes	reduced	no lenses
+young	myope	yes	normal	hard
+young	hyper	no	reduced	no lenses
+young	hyper	no	normal	soft
+young	hyper	yes	reduced	no lenses
+young	hyper	yes	normal	hard
+pre	myope	no	reduced	no lenses
+pre	myope	no	normal	soft
+pre	myope	yes	reduced	no lenses
+pre	myope	yes	normal	hard
+pre	hyper	no	reduced	no lenses
+pre	hyper	no	normal	soft
+pre	hyper	yes	reduced	no lenses
+pre	hyper	yes	normal	no lenses
+presbyopic	myope	no	reduced	no lenses
+presbyopic	myope	no	normal	no lenses
+presbyopic	myope	yes	reduced	no lenses
+presbyopic	myope	yes	normal	hard
+presbyopic	hyper	no	reduced	no lenses
+presbyopic	hyper	no	normal	soft
+presbyopic	hyper	yes	reduced	no lenses
+presbyopic	hyper	yes	normal	no lenses
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch03/treePlotter.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch03/treePlotter.py
@@ -0,0 +1,88 @@
+'''
+Created on Oct 14, 2010
+
+@author: Peter Harrington
+'''
+import matplotlib.pyplot as plt
+
+decisionNode = dict(boxstyle="sawtooth", fc="0.8")
+leafNode = dict(boxstyle="round4", fc="0.8")
+arrow_args = dict(arrowstyle="<-")
+
+def getNumLeafs(myTree):
+    numLeafs = 0
+    firstStr = myTree.keys()[0]
+    secondDict = myTree[firstStr]
+    for key in secondDict.keys():
+        if type(secondDict[key]).__name__=='dict':#test to see if the nodes are dictonaires, if not they are leaf nodes
+            numLeafs += getNumLeafs(secondDict[key])
+        else:   numLeafs +=1
+    return numLeafs
+
+def getTreeDepth(myTree):
+    maxDepth = 0
+    firstStr = myTree.keys()[0]
+    secondDict = myTree[firstStr]
+    for key in secondDict.keys():
+        if type(secondDict[key]).__name__=='dict':#test to see if the nodes are dictonaires, if not they are leaf nodes
+            thisDepth = 1 + getTreeDepth(secondDict[key])
+        else:   thisDepth = 1
+        if thisDepth > maxDepth: maxDepth = thisDepth
+    return maxDepth
+
+def plotNode(nodeTxt, centerPt, parentPt, nodeType):
+    createPlot.ax1.annotate(nodeTxt, xy=parentPt,  xycoords='axes fraction',
+             xytext=centerPt, textcoords='axes fraction',
+             va="center", ha="center", bbox=nodeType, arrowprops=arrow_args )
+    
+def plotMidText(cntrPt, parentPt, txtString):
+    xMid = (parentPt[0]-cntrPt[0])/2.0 + cntrPt[0]
+    yMid = (parentPt[1]-cntrPt[1])/2.0 + cntrPt[1]
+    createPlot.ax1.text(xMid, yMid, txtString, va="center", ha="center", rotation=30)
+
+def plotTree(myTree, parentPt, nodeTxt):#if the first key tells you what feat was split on
+    numLeafs = getNumLeafs(myTree)  #this determines the x width of this tree
+    depth = getTreeDepth(myTree)
+    firstStr = myTree.keys()[0]     #the text label for this node should be this
+    cntrPt = (plotTree.xOff + (1.0 + float(numLeafs))/2.0/plotTree.totalW, plotTree.yOff)
+    plotMidText(cntrPt, parentPt, nodeTxt)
+    plotNode(firstStr, cntrPt, parentPt, decisionNode)
+    secondDict = myTree[firstStr]
+    plotTree.yOff = plotTree.yOff - 1.0/plotTree.totalD
+    for key in secondDict.keys():
+        if type(secondDict[key]).__name__=='dict':#test to see if the nodes are dictonaires, if not they are leaf nodes   
+            plotTree(secondDict[key],cntrPt,str(key))        #recursion
+        else:   #it's a leaf node print the leaf node
+            plotTree.xOff = plotTree.xOff + 1.0/plotTree.totalW
+            plotNode(secondDict[key], (plotTree.xOff, plotTree.yOff), cntrPt, leafNode)
+            plotMidText((plotTree.xOff, plotTree.yOff), cntrPt, str(key))
+    plotTree.yOff = plotTree.yOff + 1.0/plotTree.totalD
+#if you do get a dictonary you know it's a tree, and the first element will be another dict
+
+def createPlot(inTree):
+    fig = plt.figure(1, facecolor='white')
+    fig.clf()
+    axprops = dict(xticks=[], yticks=[])
+    createPlot.ax1 = plt.subplot(111, frameon=False, **axprops)    #no ticks
+    #createPlot.ax1 = plt.subplot(111, frameon=False) #ticks for demo puropses 
+    plotTree.totalW = float(getNumLeafs(inTree))
+    plotTree.totalD = float(getTreeDepth(inTree))
+    plotTree.xOff = -0.5/plotTree.totalW; plotTree.yOff = 1.0;
+    plotTree(inTree, (0.5,1.0), '')
+    plt.show()
+
+#def createPlot():
+#    fig = plt.figure(1, facecolor='white')
+#    fig.clf()
+#    createPlot.ax1 = plt.subplot(111, frameon=False) #ticks for demo puropses 
+#    plotNode('a decision node', (0.5, 0.1), (0.1, 0.5), decisionNode)
+#    plotNode('a leaf node', (0.8, 0.1), (0.3, 0.8), leafNode)
+#    plt.show()
+
+def retrieveTree(i):
+    listOfTrees =[{'no surfacing': {0: 'no', 1: {'flippers': {0: 'no', 1: 'yes'}}}},
+                  {'no surfacing': {0: 'no', 1: {'flippers': {0: {'head': {0: 'no', 1: 'yes'}}, 1: 'no'}}}}
+                  ]
+    return listOfTrees[i]
+
+#createPlot(thisTree)
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch03/treePlotter.pyc
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch03/treePlotter.pyc
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch03/trees.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch03/trees.py
@@ -0,0 +1,105 @@
+'''
+Created on Oct 12, 2010
+Decision Tree Source Code for Machine Learning in Action Ch. 3
+@author: Peter Harrington
+'''
+from math import log
+import operator
+
+def createDataSet():
+    dataSet = [[1, 1, 'yes'],
+               [1, 1, 'yes'],
+               [1, 0, 'no'],
+               [0, 1, 'no'],
+               [0, 1, 'no']]
+    labels = ['no surfacing','flippers']
+    #change to discrete values
+    return dataSet, labels
+
+def calcShannonEnt(dataSet):
+    numEntries = len(dataSet)
+    labelCounts = {}
+    for featVec in dataSet: #the the number of unique elements and their occurance
+        currentLabel = featVec[-1]
+        if currentLabel not in labelCounts.keys(): labelCounts[currentLabel] = 0
+        labelCounts[currentLabel] += 1
+    shannonEnt = 0.0
+    for key in labelCounts:
+        prob = float(labelCounts[key])/numEntries
+        shannonEnt -= prob * log(prob,2) #log base 2
+    return shannonEnt
+    
+def splitDataSet(dataSet, axis, value):
+    retDataSet = []
+    for featVec in dataSet:
+        if featVec[axis] == value:
+            reducedFeatVec = featVec[:axis]     #chop out axis used for splitting
+            reducedFeatVec.extend(featVec[axis+1:])
+            retDataSet.append(reducedFeatVec)
+    return retDataSet
+    
+def chooseBestFeatureToSplit(dataSet):
+    numFeatures = len(dataSet[0]) - 1      #the last column is used for the labels
+    baseEntropy = calcShannonEnt(dataSet)
+    bestInfoGain = 0.0; bestFeature = -1
+    for i in range(numFeatures):        #iterate over all the features
+        featList = [example[i] for example in dataSet]#create a list of all the examples of this feature
+        uniqueVals = set(featList)       #get a set of unique values
+        newEntropy = 0.0
+        for value in uniqueVals:
+            subDataSet = splitDataSet(dataSet, i, value)
+            prob = len(subDataSet)/float(len(dataSet))
+            newEntropy += prob * calcShannonEnt(subDataSet)     
+        infoGain = baseEntropy - newEntropy     #calculate the info gain; ie reduction in entropy
+        if (infoGain > bestInfoGain):       #compare this to the best gain so far
+            bestInfoGain = infoGain         #if better than current best, set to best
+            bestFeature = i
+    return bestFeature                      #returns an integer
+
+def majorityCnt(classList):
+    classCount={}
+    for vote in classList:
+        if vote not in classCount.keys(): classCount[vote] = 0
+        classCount[vote] += 1
+    sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)
+    return sortedClassCount[0][0]
+
+def createTree(dataSet,labels):
+    classList = [example[-1] for example in dataSet]
+    if classList.count(classList[0]) == len(classList): 
+        return classList[0]#stop splitting when all of the classes are equal
+    if len(dataSet[0]) == 1: #stop splitting when there are no more features in dataSet
+        return majorityCnt(classList)
+    bestFeat = chooseBestFeatureToSplit(dataSet)
+    bestFeatLabel = labels[bestFeat]
+    myTree = {bestFeatLabel:{}}
+    del(labels[bestFeat])
+    featValues = [example[bestFeat] for example in dataSet]
+    uniqueVals = set(featValues)
+    for value in uniqueVals:
+        subLabels = labels[:]       #copy all of labels, so trees don't mess up existing labels
+        myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value),subLabels)
+    return myTree                            
+    
+def classify(inputTree,featLabels,testVec):
+    firstStr = inputTree.keys()[0]
+    secondDict = inputTree[firstStr]
+    featIndex = featLabels.index(firstStr)
+    key = testVec[featIndex]
+    valueOfFeat = secondDict[key]
+    if isinstance(valueOfFeat, dict): 
+        classLabel = classify(valueOfFeat, featLabels, testVec)
+    else: classLabel = valueOfFeat
+    return classLabel
+
+def storeTree(inputTree,filename):
+    import pickle
+    fw = open(filename,'w')
+    pickle.dump(inputTree,fw)
+    fw.close()
+    
+def grabTree(filename):
+    import pickle
+    fr = open(filename)
+    return pickle.load(fr)
+    
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch03/trees.pyc
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch03/trees.pyc
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/EXTRAS/README.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/EXTRAS/README.txt
@@ -0,0 +1,8 @@
+This folder contains the code used to create the plots in the examples.  
+The code is not very difficult, however I never meant for it to go out to readers. 
+ It’s not the cleanest code nor very well documented.  Most of the time I threw 
+together a dirty hack to make the plots look right, with no thoughts about 
+efficiency or readability.   I’m providing it as-is, if you have a question on 
+how it works or why I did something please ask, I will be more than happy to answer 
+any questions.  
+Peter Harrington
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/EXTRAS/create2Normal.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/EXTRAS/create2Normal.py
@@ -0,0 +1,40 @@
+'''
+Created on Oct 6, 2010
+
+@author: Peter
+'''
+from numpy import *
+import matplotlib
+import matplotlib.pyplot as plt
+
+n = 1000 #number of points to create
+xcord0 = []
+ycord0 = []
+xcord1 = []
+ycord1 = []
+markers =[]
+colors =[]
+fw = open('testSet.txt','w')
+for i in range(n):
+    [r0,r1] = random.standard_normal(2)
+    myClass = random.uniform(0,1)
+    if (myClass <= 0.5):
+        fFlyer = r0 + 9.0
+        tats = 1.0*r1 + fFlyer - 9.0
+        xcord0.append(fFlyer)
+        ycord0.append(tats)
+    else:
+        fFlyer = r0 + 2.0
+        tats = r1+fFlyer - 2.0
+        xcord1.append(fFlyer)
+        ycord1.append(tats)
+    #fw.write("%f\t%f\t%d\n" % (fFlyer, tats, classLabel))
+
+fw.close()
+fig = plt.figure()
+ax = fig.add_subplot(111)
+#ax.scatter(xcord,ycord, c=colors, s=markers)
+ax.scatter(xcord0,ycord0, marker='^', s=90)
+ax.scatter(xcord1,ycord1, marker='o', s=50, c='red')
+plt.plot([0,1], label='going up')
+plt.show()
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/EXTRAS/monoDemo.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/EXTRAS/monoDemo.py
@@ -0,0 +1,24 @@
+'''
+Created on Oct 6, 2010
+Shows montonocity of a function and the log of that function
+@author: Peter
+'''
+from numpy import *
+import matplotlib
+import matplotlib.pyplot as plt
+
+t = arange(0.0, 0.5, 0.01)
+s = sin(2*pi*t)
+logS = log(s)
+
+fig = plt.figure()
+ax = fig.add_subplot(211)
+ax.plot(t,s)
+ax.set_ylabel('f(x)')
+ax.set_xlabel('x')
+
+ax = fig.add_subplot(212)
+ax.plot(t,logS)
+ax.set_ylabel('ln(f(x))')
+ax.set_xlabel('x')
+plt.show()
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/bayes.py
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/bayes.py
@@ -0,0 +1,171 @@
+'''
+Created on Oct 19, 2010
+
+@author: Peter
+'''
+from numpy import *
+
+def loadDataSet():
+    postingList=[['my', 'dog', 'has', 'flea', 'problems', 'help', 'please'],
+                 ['maybe', 'not', 'take', 'him', 'to', 'dog', 'park', 'stupid'],
+                 ['my', 'dalmation', 'is', 'so', 'cute', 'I', 'love', 'him'],
+                 ['stop', 'posting', 'stupid', 'worthless', 'garbage'],
+                 ['mr', 'licks', 'ate', 'my', 'steak', 'how', 'to', 'stop', 'him'],
+                 ['quit', 'buying', 'worthless', 'dog', 'food', 'stupid']]
+    classVec = [0,1,0,1,0,1]    #1 is abusive, 0 not
+    return postingList,classVec
+                 
+def createVocabList(dataSet):
+    vocabSet = set([])  #create empty set
+    for document in dataSet:
+        vocabSet = vocabSet | set(document) #union of the two sets
+    return list(vocabSet)
+
+def setOfWords2Vec(vocabList, inputSet):
+    returnVec = [0]*len(vocabList)
+    for word in inputSet:
+        if word in vocabList:
+            returnVec[vocabList.index(word)] = 1
+        else: print "the word: %s is not in my Vocabulary!" % word
+    return returnVec
+
+def trainNB0(trainMatrix,trainCategory):
+    numTrainDocs = len(trainMatrix)
+    numWords = len(trainMatrix[0])
+    pAbusive = sum(trainCategory)/float(numTrainDocs)
+    p0Num = ones(numWords); p1Num = ones(numWords)      #change to ones() 
+    p0Denom = 2.0; p1Denom = 2.0                        #change to 2.0
+    for i in range(numTrainDocs):
+        if trainCategory[i] == 1:
+            p1Num += trainMatrix[i]
+            p1Denom += sum(trainMatrix[i])
+        else:
+            p0Num += trainMatrix[i]
+            p0Denom += sum(trainMatrix[i])
+    p1Vect = log(p1Num/p1Denom)          #change to log()
+    p0Vect = log(p0Num/p0Denom)          #change to log()
+    return p0Vect,p1Vect,pAbusive
+
+def classifyNB(vec2Classify, p0Vec, p1Vec, pClass1):
+    p1 = sum(vec2Classify * p1Vec) + log(pClass1)    #element-wise mult
+    p0 = sum(vec2Classify * p0Vec) + log(1.0 - pClass1)
+    if p1 > p0:
+        return 1
+    else: 
+        return 0
+    
+def bagOfWords2VecMN(vocabList, inputSet):
+    returnVec = [0]*len(vocabList)
+    for word in inputSet:
+        if word in vocabList:
+            returnVec[vocabList.index(word)] += 1
+    return returnVec
+
+def testingNB():
+    listOPosts,listClasses = loadDataSet()
+    myVocabList = createVocabList(listOPosts)
+    trainMat=[]
+    for postinDoc in listOPosts:
+        trainMat.append(setOfWords2Vec(myVocabList, postinDoc))
+    p0V,p1V,pAb = trainNB0(array(trainMat),array(listClasses))
+    testEntry = ['love', 'my', 'dalmation']
+    thisDoc = array(setOfWords2Vec(myVocabList, testEntry))
+    print testEntry,'classified as: ',classifyNB(thisDoc,p0V,p1V,pAb)
+    testEntry = ['stupid', 'garbage']
+    thisDoc = array(setOfWords2Vec(myVocabList, testEntry))
+    print testEntry,'classified as: ',classifyNB(thisDoc,p0V,p1V,pAb)
+
+def textParse(bigString):    #input is big string, #output is word list
+    import re
+    listOfTokens = re.split(r'\W*', bigString)
+    return [tok.lower() for tok in listOfTokens if len(tok) > 2] 
+    
+def spamTest():
+    docList=[]; classList = []; fullText =[]
+    for i in range(1,26):
+        wordList = textParse(open('email/spam/%d.txt' % i).read())
+        docList.append(wordList)
+        fullText.extend(wordList)
+        classList.append(1)
+        wordList = textParse(open('email/ham/%d.txt' % i).read())
+        docList.append(wordList)
+        fullText.extend(wordList)
+        classList.append(0)
+    vocabList = createVocabList(docList)#create vocabulary
+    trainingSet = range(50); testSet=[]           #create test set
+    for i in range(10):
+        randIndex = int(random.uniform(0,len(trainingSet)))
+        testSet.append(trainingSet[randIndex])
+        del(trainingSet[randIndex])  
+    trainMat=[]; trainClasses = []
+    for docIndex in trainingSet:#train the classifier (get probs) trainNB0
+        trainMat.append(bagOfWords2VecMN(vocabList, docList[docIndex]))
+        trainClasses.append(classList[docIndex])
+    p0V,p1V,pSpam = trainNB0(array(trainMat),array(trainClasses))
+    errorCount = 0
+    for docIndex in testSet:        #classify the remaining items
+        wordVector = bagOfWords2VecMN(vocabList, docList[docIndex])
+        if classifyNB(array(wordVector),p0V,p1V,pSpam) != classList[docIndex]:
+            errorCount += 1
+            print "classification error",docList[docIndex]
+    print 'the error rate is: ',float(errorCount)/len(testSet)
+    #return vocabList,fullText
+
+def calcMostFreq(vocabList,fullText):
+    import operator
+    freqDict = {}
+    for token in vocabList:
+        freqDict[token]=fullText.count(token)
+    sortedFreq = sorted(freqDict.iteritems(), key=operator.itemgetter(1), reverse=True) 
+    return sortedFreq[:30]       
+
+def localWords(feed1,feed0):
+    import feedparser
+    docList=[]; classList = []; fullText =[]
+    minLen = min(len(feed1['entries']),len(feed0['entries']))
+    for i in range(minLen):
+        wordList = textParse(feed1['entries'][i]['summary'])
+        docList.append(wordList)
+        fullText.extend(wordList)
+        classList.append(1) #NY is class 1
+        wordList = textParse(feed0['entries'][i]['summary'])
+        docList.append(wordList)
+        fullText.extend(wordList)
+        classList.append(0)
+    vocabList = createVocabList(docList)#create vocabulary
+    top30Words = calcMostFreq(vocabList,fullText)   #remove top 30 words
+    for pairW in top30Words:
+        if pairW[0] in vocabList: vocabList.remove(pairW[0])
+    trainingSet = range(2*minLen); testSet=[]           #create test set
+    for i in range(20):
+        randIndex = int(random.uniform(0,len(trainingSet)))
+        testSet.append(trainingSet[randIndex])
+        del(trainingSet[randIndex])  
+    trainMat=[]; trainClasses = []
+    for docIndex in trainingSet:#train the classifier (get probs) trainNB0
+        trainMat.append(bagOfWords2VecMN(vocabList, docList[docIndex]))
+        trainClasses.append(classList[docIndex])
+    p0V,p1V,pSpam = trainNB0(array(trainMat),array(trainClasses))
+    errorCount = 0
+    for docIndex in testSet:        #classify the remaining items
+        wordVector = bagOfWords2VecMN(vocabList, docList[docIndex])
+        if classifyNB(array(wordVector),p0V,p1V,pSpam) != classList[docIndex]:
+            errorCount += 1
+    print 'the error rate is: ',float(errorCount)/len(testSet)
+    return vocabList,p0V,p1V
+
+def getTopWords(ny,sf):
+    import operator
+    vocabList,p0V,p1V=localWords(ny,sf)
+    topNY=[]; topSF=[]
+    for i in range(len(p0V)):
+        if p0V[i] > -6.0 : topSF.append((vocabList[i],p0V[i]))
+        if p1V[i] > -6.0 : topNY.append((vocabList[i],p1V[i]))
+    sortedSF = sorted(topSF, key=lambda pair: pair[1], reverse=True)
+    print "SF**SF**SF**SF**SF**SF**SF**SF**SF**SF**SF**SF**SF**SF**SF**SF**"
+    for item in sortedSF:
+        print item[0]
+    sortedNY = sorted(topNY, key=lambda pair: pair[1], reverse=True)
+    print "NY**NY**NY**NY**NY**NY**NY**NY**NY**NY**NY**NY**NY**NY**NY**NY**"
+    for item in sortedNY:
+        print item[0]
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/bayes.pyc
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/bayes.pyc
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email.zip
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email.zip
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/1.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/1.txt
@@ -0,0 +1,8 @@
+Hi Peter,
+
+With Jose out of town, do you want to
+meet once in a while to keep things
+going and do some interesting stuff?
+
+Let me know
+Eugene
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/10.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/10.txt
@@ -0,0 +1,4 @@
+Ryan Whybrew commented on your status.
+
+Ryan wrote:
+"turd ferguson or butt horn."
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/11.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/11.txt
@@ -0,0 +1,8 @@
+Arvind Thirumalai commented on your status.
+
+Arvind wrote:
+""you know""
+
+
+Reply to this email to comment on this status.
+
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/12.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/12.txt
@@ -0,0 +1,11 @@
+Thanks Peter.
+
+I'll definitely check in on this. How is your book
+going? I heard chapter 1 came in and it was in 
+good shape. ;-)
+
+I hope you are doing well.
+
+Cheers,
+
+Troy
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/13.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/13.txt
@@ -0,0 +1,10 @@
+Jay Stepp commented on your status.
+
+Jay wrote:
+""to the" ???"
+
+
+Reply to this email to comment on this status.
+
+To see the comment thread, follow the link below:
+
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/14.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/14.txt
@@ -0,0 +1,10 @@
+LinkedIn
+
+Kerry Haloney requested to add you as a connection on LinkedIn:
+
+Peter,
+
+I'd like to add you to my professional network on LinkedIn.
+
+- Kerry Haloney
+ 
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/15.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/15.txt
@@ -0,0 +1,9 @@
+Hi Peter,
+ 
+The hotels are the ones that rent out the tent. They are all lined up on the hotel grounds : )) So much for being one with nature, more like being one with a couple dozen tour groups and nature.
+I have about 100M of pictures from that trip. I can go through them and get you jpgs of my favorite scenic pictures.
+ 
+Where are you and Jocelyn now? New York? Will you come to Tokyo for Chinese New Year? Perhaps to see the two of you then. I will go to Thailand for winter holiday to see my mom : )
+ 
+Take care,
+D
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/16.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/16.txt
@@ -0,0 +1 @@
+yeah I am ready.  I may not be here because Jar Jar has plane tickets to Germany for me.  
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/17.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/17.txt
@@ -0,0 +1,11 @@
+Benoit Mandelbrot 1924-2010
+
+Benoit Mandelbrot 1924-2010
+
+Wilmott Team
+
+Benoit Mandelbrot, the mathematician, the father of fractal mathematics, and advocate of more sophisticated modelling in quantitative finance, died on 14th October 2010 aged 85.
+
+Wilmott magazine has often featured Mandelbrot, his ideas, and the work of others inspired by his fundamental insights.
+
+You must be logged on to view these articles from past issues of Wilmott Magazine.
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/18.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/18.txt
@@ -0,0 +1,8 @@
+Hi Peter,
+
+    Sure thing.  Sounds good.  Let me know what time would be good for you.
+I will come prepared with some ideas and we can go from there.
+
+Regards,
+
+-Vivek.
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/19.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/19.txt
@@ -0,0 +1,10 @@
+LinkedIn
+
+Julius O requested to add you as a connection on LinkedIn:
+
+Hi Peter.
+
+Looking forward to the book!
+
+ 
+Accept 	View invitation from Julius O
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/2.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/2.txt
@@ -0,0 +1,3 @@
+Yay to you both doing fine!
+
+I'm working on an MBA in Design Strategy at CCA (top art school.)  It's a new program focusing on more of a right-brained creative and strategic approach to management.  I'm an 1/8 of the way done today!
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/20.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/20.txt
@@ -0,0 +1,5 @@
+I've thought about this and think it's possible. We should get another
+lunch. I have a car now and could come pick you up this time. Does
+this wednesday work? 11:50?
+
+Can I have a signed copy of you book?
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/21.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/21.txt
@@ -0,0 +1,6 @@
+we saw this on the way to the coast...thought u might like it
+
+hangzhou is huge, one day wasn't enough, but we got a glimpse...
+
+we went inside the china pavilion at expo, it is pretty interesting,
+each province has an exhibit...
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/22.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/22.txt
@@ -0,0 +1,7 @@
+Hi Hommies,
+
+Just got a phone call from the roofer, they will come and spaying the foaming today. it will be dusty. pls close all the doors and windows.
+Could you help me to close my bathroom window, cat window and the sliding door behind the TV?
+I don't know how can those 2 cats survive......
+
+Sorry for any inconvenience!
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/23.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/23.txt
@@ -0,0 +1,7 @@
+
+SciFinance now automatically generates GPU-enabled pricing & risk model source code that runs up to 50-300x faster than serial code using a new NVIDIA Fermi-class Tesla 20-Series GPU.
+
+SciFinance® is a derivatives pricing and risk model development tool that automatically generates C/C++ and GPU-enabled source code from concise, high-level model specifications. No parallel computing or CUDA programming expertise is required.
+
+SciFinance's automatic, GPU-enabled Monte Carlo pricing model source code generation capabilities have been significantly extended in the latest release. This includes:
+
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/24.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/24.txt
@@ -0,0 +1 @@
+Ok I will be there by 10:00 at the latest.
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/25.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/25.txt
@@ -0,0 +1,2 @@
+That is cold.  Is there going to be a retirement party?  
+Are the leaves changing color?
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/3.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/3.txt
@@ -0,0 +1,8 @@
+WHat is going on there?
+I talked to John on email.  We talked about some computer stuff that's it.
+
+I went bike riding in the rain, it was not that cold.
+
+We went to the museum in SF yesterday it was $3 to get in and they had
+free food.  At the same time was a SF Giants game, when we got done we
+had to take the train with all the Giants fans, they are 1/2 drunk.
--- a/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/4.txt
+++ b/机器学习/殷康龙/机器学习实战1源代码/Ch04/email/ham/4.txt
@@ -0,0 +1,3 @@
+Yo.  I've been working on my running website.  I'm using jquery and the jqplot plugin.  I'm not too far away from having a prototype to launch.  
+
+You used jqplot right?  If not, I think you would like it.
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				`yeah I am ready. I may not be here because Jar Jar has plane tickets to Germany for me.`