本文實例講述了Python聚類算法之基本K均值運算技巧。分享給大家供大家參考,具體如下:
基本K均值 :選擇 K 個初始質心,其中 K 是用戶指定的參數,即所期望的簇的個數。每次循環中,每個點被指派到最近的質心,指派到同一個質心的點集構成一個。然后,根據指派到簇的點,更新每個簇的質心。重復指派和更新操作,直到質心不發生明顯的變化。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
|
# scoding=utf-8 import pylab as pl points = [[ int (eachpoint.split( "#" )[ 0 ]), int (eachpoint.split( "#" )[ 1 ])] for eachpoint in open ( "points" , "r" )] # 指定三個初始質心 currentCenter1 = [ 20 , 190 ]; currentCenter2 = [ 120 , 90 ]; currentCenter3 = [ 170 , 140 ] pl.plot([currentCenter1[ 0 ]], [currentCenter1[ 1 ]], 'ok' ) pl.plot([currentCenter2[ 0 ]], [currentCenter2[ 1 ]], 'ok' ) pl.plot([currentCenter3[ 0 ]], [currentCenter3[ 1 ]], 'ok' ) # 記錄每次迭代后每個簇的質心的更新軌跡 center1 = [currentCenter1]; center2 = [currentCenter2]; center3 = [currentCenter3] # 三個簇 group1 = []; group2 = []; group3 = [] for runtime in range ( 50 ): group1 = []; group2 = []; group3 = [] for eachpoint in points: # 計算每個點到三個質心的距離 distance1 = pow ( abs (eachpoint[ 0 ] - currentCenter1[ 0 ]), 2 ) + pow ( abs (eachpoint[ 1 ] - currentCenter1[ 1 ]), 2 ) distance2 = pow ( abs (eachpoint[ 0 ] - currentCenter2[ 0 ]), 2 ) + pow ( abs (eachpoint[ 1 ] - currentCenter2[ 1 ]), 2 ) distance3 = pow ( abs (eachpoint[ 0 ] - currentCenter3[ 0 ]), 2 ) + pow ( abs (eachpoint[ 1 ] - currentCenter3[ 1 ]), 2 ) # 將該點指派到離它最近的質心所在的簇 mindis = min (distance1,distance2,distance3) if (mindis = = distance1): group1.append(eachpoint) elif (mindis = = distance2): group2.append(eachpoint) else : group3.append(eachpoint) # 指派完所有的點后,更新每個簇的質心 currentCenter1 = [ sum ([eachpoint[ 0 ] for eachpoint in group1]) / len (group1), sum ([eachpoint[ 1 ] for eachpoint in group1]) / len (group1)] currentCenter2 = [ sum ([eachpoint[ 0 ] for eachpoint in group2]) / len (group2), sum ([eachpoint[ 1 ] for eachpoint in group2]) / len (group2)] currentCenter3 = [ sum ([eachpoint[ 0 ] for eachpoint in group3]) / len (group3), sum ([eachpoint[ 1 ] for eachpoint in group3]) / len (group3)] # 記錄該次對質心的更新 center1.append(currentCenter1) center2.append(currentCenter2) center3.append(currentCenter3) # 打印所有的點,用顏色標識該點所屬的簇 pl.plot([eachpoint[ 0 ] for eachpoint in group1], [eachpoint[ 1 ] for eachpoint in group1], 'or' ) pl.plot([eachpoint[ 0 ] for eachpoint in group2], [eachpoint[ 1 ] for eachpoint in group2], 'oy' ) pl.plot([eachpoint[ 0 ] for eachpoint in group3], [eachpoint[ 1 ] for eachpoint in group3], 'og' ) # 打印每個簇的質心的更新軌跡 for center in [center1,center2,center3]: pl.plot([eachcenter[ 0 ] for eachcenter in center], [eachcenter[ 1 ] for eachcenter in center], 'k' ) pl.show() |
運行效果截圖如下:
希望本文所述對大家Python程序設計有所幫助。