国产亚洲欧美精品久久久,久久香蕉精品,朋友的妻子免费观看

1 Support Vector Machines

1.1 Example Dataset 1

%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
from scipy.io import loadmat
from sklearn import svm

大多數SVM的庫會自動幫你添加額外的特征X?已經θ?，所以無需手動添加

mat = loadmat("./data/ex6data1.mat")
print(mat.keys())
# dict_keys(["__header__", "__version__", "__globals__", "X", "y"])
X = mat["X"]
y = mat["y"]

def plotData(X, y):
    plt.figure(figsize=(8,5))
    plt.scatter(X[:,0], X[:,1], c=y.flatten(), cmap="rainbow")
    plt.xlabel("X1")
    plt.ylabel("X2")
    plt.legend() 
plotData(X, y)

吳恩達機器學習練習:SVM支持向量機

def plotBoundary(clf, X):
    """plot decision bondary"""
    x_min, x_max = X[:,0].min()*1.2, X[:,0].max()*1.1
    y_min, y_max = X[:,1].min()*1.1,X[:,1].max()*1.1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 500),
                         np.linspace(y_min, y_max, 500))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.contour(xx, yy, Z)

models = [svm.SVC(C, kernel="linear") for C in [1, 100]]
clfs = [model.fit(X, y.ravel()) for model in models]

title = ["SVM Decision Boundary with C = {} (Example Dataset 1".format(C) for C in [1, 100]]
for model,title in zip(clfs,title):
    plt.figure(figsize=(8,5))
    plotData(X, y)
    plotBoundary(model, X)
    plt.title(title)

吳恩達機器學習練習:SVM支持向量機

可以從上圖看到，當C比較小時模型對誤分類的懲罰增大，比較嚴格，誤分類少，間隔比較狹窄。

當C比較大時模型對誤分類的懲罰增大，比較寬松，允許一定的誤分類存在，間隔較大。

1.2 SVM with Gaussian Kernels

這部分，使用SVM做非線性分類。我們將使用高斯核函數。

為了用SVM找出一個非線性的決策邊界，我們首先要實現高斯核函數。我可以把高斯核函數想象成一個相似度函數，用來測量一對樣本的距離，(x ? ? ?,y ? ? ?)

吳恩達機器學習練習:SVM支持向量機

這里我們用sklearn自帶的svm中的核函數即可。

1.2.1 Gaussian Kernel

def gaussKernel(x1, x2, sigma):
    return np.exp(- ((x1 - x2) ** 2).sum() / (2 * sigma ** 2))
gaussKernel(np.array([1, 2, 1]),np.array([0, 4, -1]), 2.)  # 0.32465246735834974

1.2.2 Example Dataset 2

mat = loadmat("./data/ex6data2.mat")
X2 = mat["X"]
y2 = mat["y"]

plotData(X2, y2)

吳恩達機器學習練習:SVM支持向量機

sigma = 0.1
gamma = np.power(sigma,-2.)/2
clf = svm.SVC(C=1, kernel="rbf", gamma=gamma)
modle = clf.fit(X2, y2.flatten())
plotData(X2, y2)
plotBoundary(modle, X2)

吳恩達機器學習練習:SVM支持向量機

1.2.3 Example Dataset 3

mat3 = loadmat("data/ex6data3.mat")
X3, y3 = mat3["X"], mat3["y"]
Xval, yval = mat3["Xval"], mat3["yval"]
plotData(X3, y3)

吳恩達機器學習練習:SVM支持向量機

Cvalues = (0.01, 0.03, 0.1, 0.3, 1., 3., 10., 30.)
sigmavalues = Cvalues
best_pair, best_score = (0, 0), 0
for C in Cvalues:
    for sigma in sigmavalues:
        gamma = np.power(sigma,-2.)/2
        model = svm.SVC(C=C,kernel="rbf",gamma=gamma)
        model.fit(X3, y3.flatten())
        this_score = model.score(Xval, yval)
        if this_score > best_score:
            best_score = this_score
            best_pair = (C, sigma)
print("best_pair={}, best_score={}".format(best_pair, best_score))
# best_pair=(1.0, 0.1), best_score=0.965

model = svm.SVC(C=1., kernel="rbf", gamma = np.power(.1, -2.)/2)
model.fit(X3, y3.flatten())
plotData(X3, y3)
plotBoundary(model, X3)

吳恩達機器學習練習:SVM支持向量機

# 這我的一個練習畫圖的，和作業無關，給個畫圖的參考。
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
# we create 40 separable points
np.random.seed(0)
X = np.array([[3,3],[4,3],[1,1]])
Y = np.array([1,1,-1])
# fit the model
clf = svm.SVC(kernel="linear")
clf.fit(X, Y)
# get the separating hyperplane
w = clf.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(-5, 5)
yy = a * xx - (clf.intercept_[0]) / w[1]
# plot the parallels to the separating hyperplane that pass through the
# support vectors
b = clf.support_vectors_[0]
yy_down = a * xx + (b[1] - a * b[0])
b = clf.support_vectors_[-1]
yy_up = a * xx + (b[1] - a * b[0])
# plot the line, the points, and the nearest vectors to the plane
plt.figure(figsize=(8,5))
plt.plot(xx, yy, "k-")
plt.plot(xx, yy_down, "k--")
plt.plot(xx, yy_up, "k--")
# 圈出支持向量
plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1],
            s=150, facecolors="none", edgecolors="k", linewidths=1.5)
plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=plt.cm.rainbow)
plt.axis("tight")
plt.show()
print(clf.decision_function(X))

吳恩達機器學習練習:SVM支持向量機

[ 1. 1.5 -1. ]

2 Spam Classification

2.1 Preprocessing Emails

這部分用SVM建立一個垃圾郵件分類器。你需要將每個email變成一個n維的特征向量，這個分類器將判斷給定一個郵件x是垃圾郵件(y=1)或不是垃圾郵件(y=0)。

take a look at examples from the dataset

with open("data/emailSample1.txt", "r") as f:
    email = f.read()
    print(email)

> Anyone knows how much it costs to host a web portal ?
>
Well, it depends on how many visitors you"re expecting.
This can be anywhere from less than 10 bucks a month to a couple of $100. 
You should checkout http://www.rackspace.com/ or perhaps Amazon EC2 
if youre running something big..
To unsubscribe yourself from this mailing list, send an email to:
[email protected]

可以看到，郵件內容包含 a URL, an email address(at the end), numbers, and dollar amounts. 很多郵件都會包含這些元素，但是每封郵件的具體內容可能會不一樣。因此，處理郵件經常采用的方法是標準化這些數據，把所有URL當作一樣，所有數字看作一樣。

例如，我們用唯一的一個字符串‘httpaddr"來替換所有的URL，來表示郵件包含URL，而不要求具體的URL內容。這通常會提高垃圾郵件分類器的性能，因為垃圾郵件發送者通常會隨機化URL，因此在新的垃圾郵件中再次看到任何特定URL的幾率非常小。

我們可以做如下處理：

 1. Lower-casing: 把整封郵件轉化為小寫。
  2. Stripping HTML: 移除所有HTML標簽，只保留內容。
  3. Normalizing URLs: 將所有的URL替換為字符串 “httpaddr”.
  4. Normalizing Email Addresses: 所有的地址替換為 “emailaddr”
  5. Normalizing Dollars: 所有dollar符號($)替換為“dollar”.
  6. Normalizing Numbers: 所有數字替換為“number”
  7. Word Stemming(詞干提取): 將所有單詞還原為詞源。例如，“discount”, “discounts”, “discounted” and “discounting”都替換為“discount”。
  8. Removal of non-words: 移除所有非文字類型，所有的空格(tabs, newlines, spaces)調整為一個空格.

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import loadmat
from sklearn import svm
import re #regular expression for e-mail processing
# 這是一個可用的英文分詞算法(Porter stemmer)
from stemming.porter2 import stem
# 這個英文算法似乎更符合作業里面所用的代碼，與上面效果差不多
import nltk, nltk.stem.porter

def processEmail(email):
    """做除了Word Stemming和Removal of non-words的所有處理"""
    email = email.lower()
    email = re.sub("<[^<>]>", " ", email)  # 匹配<開頭，然后所有不是< ,> 的內容，知道>結尾，相當于匹配<...>
    email = re.sub("(http|https)://[^s]*", "httpaddr", email )  # 匹配//后面不是空白字符的內容，遇到空白字符則停止
    email = re.sub("[^s]+@[^s]+", "emailaddr", email)
    email = re.sub("[$]+", "dollar", email)
    email = re.sub("[d]+", "number", email) 
    return email

接下來就是提取詞干，以及去除非字符內容。

def email2TokenList(email):
    """預處理數據，返回一個干凈的單詞列表"""
    # I"ll use the NLTK stemmer because it more accurately duplicates the
    # performance of the OCTAVE implementation in the assignment
    stemmer = nltk.stem.porter.PorterStemmer()
    email = preProcess(email)
    # 將郵件分割為單個單詞，re.split() 可以設置多種分隔符
    tokens = re.split("[ @$/#.-:&*+=[]?!(){},"">\_<;\%]", email)
    # 遍歷每個分割出來的內容
    tokenlist = []
    for token in tokens:
        # 刪除任何非字母數字的字符
        token = re.sub("[^a-zA-Z0-9]", "", token);
        # Use the Porter stemmer to 提取詞根
        stemmed = stemmer.stem(token)
        # 去除空字符串‘"，里面不含任何字符
        if not len(token): continue
        tokenlist.append(stemmed)
    return tokenlist

2.1.1 Vocabulary List(詞匯表)

在對郵件進行預處理之后，我們有一個處理后的單詞列表。下一步是選擇我們想在分類器中使用哪些詞，我們需要去除哪些詞。

我們有一個詞匯表vocab.txt，里面存儲了在實際中經常使用的單詞，共1899個。

我們要算出處理后的email中含有多少vocab.txt中的單詞，并返回在vocab.txt中的index，這就我們想要的訓練單詞的索引。

def email2VocabIndices(email, vocab):
    """提取存在單詞的索引"""
    token = email2TokenList(email)
    index = [i for i in range(len(vocab)) if vocab[i] in token ]
    return index

2.2 Extracting Features from Emails

def email2FeatureVector(email):
    """
    將email轉化為詞向量，n是vocab的長度。存在單詞的相應位置的值置為1，其余為0
    """
    df = pd.read_table("data/vocab.txt",names=["words"])
    vocab = df.as_matrix()  # return array
    vector = np.zeros(len(vocab))  # init vector
    vocab_indices = email2VocabIndices(email, vocab)  # 返回含有單詞的索引
    # 將有單詞的索引置為1
    for i in vocab_indices:
        vector[i] = 1
    return vector

vector = email2FeatureVector(email)
print("length of vector = {}
num of non-zero = {}".format(len(vector), int(vector.sum())))

length of vector = 1899

num of non-zero = 45

2.3 Training SVM for Spam Classification

讀取已經訓提取好的特征向量以及相應的標簽。分訓練集和測試集。

# Training set
mat1 = loadmat("data/spamTrain.mat")
X, y = mat1["X"], mat1["y"]
# Test set
mat2 = scipy.io.loadmat("data/spamTest.mat")
Xtest, ytest = mat2["Xtest"], mat2["ytest"]

clf = svm.SVC(C=0.1, kernel="linear")
clf.fit(X, y)

2.4 Top Predictors for Spam

predTrain = clf.score(X, y)
predTest = clf.score(Xtest, ytest)
predTrain, predTest

(0.99825, 0.989)

到此這篇關于機器學習SVM支持向量機的練習文章就介紹到這了,更多相關機器學習內容請搜索服務器之家以前的文章或繼續瀏覽下面的相關文章，希望大家以后多多支持服務器之家！

原文鏈接：https://blog.csdn.net/Cowry5/article/details/80465922

一区二区三区在线-一区二区三区亚洲视频-一区二区三区亚洲-一区二区三区午夜-一区二区三区四区在线视频-一区二区三区四区在线免费观看

吳恩達機器學習練習:SVM支持向量機

1 Support Vector Machines

1.1 Example Dataset 1

1.2 SVM with Gaussian Kernels

1.2.1 Gaussian Kernel

1.2.2 Example Dataset 2

mat = loadmat("./data/ex6data2.mat")
X2 = mat["X"]
y2 = mat["y"]

1.2.3 Example Dataset 3

2 Spam Classification

2.1 Preprocessing Emails

2.1.1 Vocabulary List(詞匯表)

2.2 Extracting Features from Emails

2.3 Training SVM for Spam Classification

2.4 Top Predictors for Spam

延伸 · 閱讀

在Windows系統上搭建Nginx+Python+MySQL環境的教程

python 列表轉為字典的兩個小方法(小結)

Python3以GitHub為例來實現模擬登錄和爬取的實例講解

python 插入Null值數據到Postgresql的操作

Python的dict字典結構操作方法學習筆記

python直接訪問私有屬性的簡單方法

Python實現ping指定IP的示例

使用NumPy和pandas對CSV文件進行寫操作的實例

PyCharm設置SSH遠程調試的方法

Python安裝圖文教程 Pycharm安裝教程

python是什么意思？python有什么用？

使用Python抓取模板之家的CSS模板

Python 列表(List)操作方法詳解

吳恩達機器學習練習:SVM支持向量機

1 Support Vector Machines

1.1 Example Dataset 1

1.2 SVM with Gaussian Kernels

1.2.1 Gaussian Kernel

1.2.2 Example Dataset 2

mat = loadmat("./data/ex6data2.mat") X2 = mat["X"] y2 = mat["y"]

1.2.3 Example Dataset 3

2 Spam Classification

2.1 Preprocessing Emails

2.1.1 Vocabulary List(詞匯表)

2.2 Extracting Features from Emails

2.3 Training SVM for Spam Classification

2.4 Top Predictors for Spam

延伸 · 閱讀

mat = loadmat("./data/ex6data2.mat")
X2 = mat["X"]
y2 = mat["y"]