大數(shù)據(jù)預(yù)測(cè)是大數(shù)據(jù)最核心的應(yīng)用,是它將傳統(tǒng)意義的預(yù)測(cè)拓展到“現(xiàn)測(cè)”。大數(shù)據(jù)預(yù)測(cè)的優(yōu)勢(shì)體現(xiàn)在,它把一個(gè)非常困難的預(yù)測(cè)問題,轉(zhuǎn)化為一個(gè)相對(duì)簡單的描述問題,而這是傳統(tǒng)小數(shù)據(jù)集根本無法企及的。從預(yù)測(cè)的角度看,大數(shù)據(jù)預(yù)測(cè)所得出的結(jié)果不僅僅是用于處理現(xiàn)實(shí)業(yè)務(wù)的簡單、客觀的結(jié)論,更是能用于幫助企業(yè)經(jīng)營的決策。
在過去,人們的決策主要是依賴 20% 的結(jié)構(gòu)化數(shù)據(jù),而大數(shù)據(jù)預(yù)測(cè)則可以利用另外 80% 的非結(jié)構(gòu)化數(shù)據(jù)來做決策。大數(shù)據(jù)預(yù)測(cè)具有更多的數(shù)據(jù)維度,更快的數(shù)據(jù)頻度和更廣的數(shù)據(jù)寬度。與小數(shù)據(jù)時(shí)代相比,大數(shù)據(jù)預(yù)測(cè)的思維具有 3 大改變:實(shí)樣而非抽樣;預(yù)測(cè)效率而非精確;相關(guān)關(guān)系而非因果關(guān)系。
而今天我們就將利用python制作可視化的大數(shù)據(jù)預(yù)測(cè)部分集成工具,其中數(shù)據(jù)在這里使用一個(gè)實(shí)驗(yàn)中的數(shù)據(jù)。普遍性的應(yīng)用則直接從文件讀取即可。其中的效果圖如下:
實(shí)驗(yàn)前的準(zhǔn)備
首先我們使用的python版本是3.6.5所用到的模塊如下:
- sklearn模塊用來創(chuàng)建整個(gè)模型訓(xùn)練和保存調(diào)用以及算法的搭建框架等等。
- numpy模塊用來處理數(shù)據(jù)矩陣運(yùn)算。
- matplotlib模塊用來可視化擬合模型效果。
- Pillow庫用來加載圖片至GUI界面。
- Pandas模塊用來讀取csv數(shù)據(jù)文件。
- Tkinter用來創(chuàng)建GUI窗口程序。
數(shù)據(jù)的訓(xùn)練和訓(xùn)練的GUI窗口
經(jīng)過算法比較,發(fā)現(xiàn)這里我們選擇使用sklearn簡單的多元回歸進(jìn)行擬合數(shù)據(jù)可以達(dá)到比較好的效果。
(1)首先是是數(shù)據(jù)的讀取,通過設(shè)定選定文件夾函數(shù)來讀取文件,加載數(shù)據(jù)的效果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
|
'''選擇文件功能''' def selectPath(): # 選擇文件path_接收文件地址 path_ = tkinter.filedialog.askopenfilename() # 通過replace函數(shù)替換絕對(duì)文件地址中的/來使文件可被程序讀取 # 注意:\\轉(zhuǎn)義后為\,所以\\\\轉(zhuǎn)義后為\\ path_ = path_.replace( "/" , "\\\\" ) # path設(shè)置path_的值 path. set (path_) return path # 得到的DataFrame讀入所有數(shù)據(jù) data = pd.read_excel(FILENAME, header = 0 , usecols = "A,B,C,D,E,F,G,H,I" ) # DataFrame轉(zhuǎn)化為array DataArray = data.values # 讀取已使用年限作為標(biāo)簽 Y = DataArray[:, 8 ] # 讀取其他參數(shù)作為自變量,影響因素 X = DataArray[:, 0 : 8 ] # 字符串轉(zhuǎn)變?yōu)檎麛?shù) for i in range ( len (Y)): Y[i] = int (Y[i].replace( "年" , "")) X = np.array(X) # 轉(zhuǎn)化為array Y = np.array(Y) # 轉(zhuǎn)化為array root = Tk() root.geometry( "+500+260" ) # 背景圖設(shè)置 canvas = tk.Canvas(root, width = 600 , height = 200 , bd = 0 , highlightthickness = 0 ) imgpath = '1.jpg' img = Image. open (imgpath) photo = ImageTk.PhotoImage(img) #背景圖大小設(shè)置 canvas.create_image( 700 , 400 , image = photo) canvas.pack() path = StringVar() #標(biāo)簽名稱位置 label1 = tk.Label(text = "目標(biāo)路徑:" ) label1.pack() e1 = tk.Entry( textvariable = path) e1.pack() bn1 = tk.Button(text = "路徑選擇" , command = selectPath) bn1.pack() bn2 = tk.Button(text = "模型訓(xùn)練" , command = train) bn2.pack() bn3 = tk.Button(text = "模型預(yù)測(cè)" , command = test) bn3.pack() #標(biāo)簽按鈕等放在背景圖上 canvas.create_window( 50 , 50 , width = 150 , height = 30 , window = label1) canvas.create_window( 280 , 50 , width = 300 , height = 30 , window = e1) canvas.create_window( 510 , 50 , width = 150 , height = 30 , window = bn1) canvas.create_window( 50 , 100 , width = 150 , height = 30 , window = bn2) canvas.create_window( 510 , 100 , width = 150 , height = 30 , window = bn3) root.mainloop() |
效果如下可見:
(2)然后是數(shù)據(jù)的擬合和可視化模型效果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
# 模型擬合 reg = LinearRegression() reg.fit(X, Y) # 預(yù)測(cè)效果 predict = reg.predict(np.array([X[ 0 ]])) Y_predict = reg.predict(X) print (Y_predict) # 橫坐標(biāo) x_label = [] for i in range ( len (Y)): x_label.append(i) # 繪圖 fig, ax = plt.subplots() # 真實(shí)值分布散點(diǎn)圖 plt.scatter(x_label, Y) # 預(yù)測(cè)值分布散點(diǎn)圖 plt.scatter(x_label, Y_predict) # 預(yù)測(cè)值擬合直線圖 plt.plot(x_label, Y_predict) # 橫縱坐標(biāo) ax.set_xlabel( '預(yù)測(cè)值與真實(shí)值模型擬合效果圖' ) ax.set_ylabel( '藍(lán)色為真實(shí)值,黃色為預(yù)測(cè)值' ) # 將繪制的圖形顯示到tkinter:創(chuàng)建屬于root的canvas畫布,并將圖f置于畫布上 canvas = FigureCanvasTkAgg(fig, master = root) canvas.draw() # 注意show方法已經(jīng)過時(shí)了,這里改用draw canvas.get_tk_widget().pack() # matplotlib的導(dǎo)航工具欄顯示上來(默認(rèn)是不會(huì)顯示它的) toolbar = NavigationToolbar2Tk(canvas, root) toolbar.update() canvas._tkcanvas.pack() #彈窗顯示 messagebox.showinfo(title = '模型情況' , message = "模型訓(xùn)練完成!" ) 其中的效果如下可見: |
其中的效果如下可見:
模型的預(yù)測(cè)和使用
其中模型的預(yù)測(cè)主要通過兩種方式進(jìn)行預(yù)測(cè),分別是:手動(dòng)輸入單個(gè)數(shù)據(jù)進(jìn)行預(yù)測(cè)和讀取文件進(jìn)行預(yù)測(cè)。
其中手動(dòng)輸入數(shù)據(jù)進(jìn)行預(yù)測(cè)需要設(shè)置更多的GUI按鈕,其中代碼如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
|
#子窗口 LOVE = Toplevel(root) LOVE.geometry( "+100+260" ) LOVE.title = "模型測(cè)試" #子窗口各標(biāo)簽名 label = [ "上升沿斜率(v/us)" , "下降沿斜率(v/us)" , "脈寬(ns)" , "低狀態(tài)電平(mv)" , "低電平方差(mv2)x10-3" , "高狀態(tài)電平(v)" , "高電平方差(v2)" , "信號(hào)質(zhì)量因子" ] Label(LOVE, text = "1、輸入?yún)?shù)預(yù)測(cè)" , font = ( "微軟雅黑" , 20 )).grid(row = 0 , column = 0 ) #標(biāo)簽名稱,字體位置 Label(LOVE, text = label[ 0 ], font = ( "微軟雅黑" , 10 )).grid(row = 1 , column = 0 ) Label(LOVE, text = label[ 1 ], font = ( "微軟雅黑" , 10 )).grid(row = 1 , column = 1 ) Label(LOVE, text = label[ 2 ], font = ( "微軟雅黑" , 10 )).grid(row = 1 , column = 2 ) Label(LOVE, text = label[ 3 ], font = ( "微軟雅黑" , 10 )).grid(row = 1 , column = 3 ) Label(LOVE, text = label[ 4 ], font = ( "微軟雅黑" , 10 )).grid(row = 1 , column = 4 ) Label(LOVE, text = label[ 5 ], font = ( "微軟雅黑" , 10 )).grid(row = 1 , column = 5 ) Label(LOVE, text = label[ 6 ], font = ( "微軟雅黑" , 10 )).grid(row = 1 , column = 6 ) Label(LOVE, text = label[ 7 ], font = ( "微軟雅黑" , 10 )).grid(row = 1 , column = 7 ) #編輯框位置和字體 en1 = tk.Entry(LOVE, font = ( "微軟雅黑" , 8 )) en1.grid(row = 2 , column = 0 ) en2 = tk.Entry(LOVE, font = ( "微軟雅黑" , 8 )) en2.grid(row = 2 , column = 1 ) en3 = tk.Entry(LOVE, font = ( "微軟雅黑" , 8 )) en3.grid(row = 2 , column = 2 ) en4 = tk.Entry(LOVE, font = ( "微軟雅黑" , 8 )) en4.grid(row = 2 , column = 3 ) en5 = tk.Entry(LOVE, font = ( "微軟雅黑" , 8 )) en5.grid(row = 2 , column = 4 ) en6 = tk.Entry(LOVE, font = ( "微軟雅黑" , 8 )) en6.grid(row = 2 , column = 5 ) en7 = tk.Entry(LOVE, font = ( "微軟雅黑" , 8 )) en7.grid(row = 2 , column = 6 ) en8 = tk.Entry(LOVE, font = ( "微軟雅黑" , 8 )) en8.grid(row = 2 , column = 7 ) Label(LOVE, text = " ", font=(" 微軟雅黑", 10 )).grid(row = 3 , column = 0 ) #測(cè)試輸入框預(yù)測(cè) def pp(): x = np.array([ int (en1.get()), int (en2.get()), int (en3.get()), int (en4.get()), int (en5.get()), int (en6.get()), int (en7.get()), int (en8.get())]) # 預(yù)測(cè)效果 predict = reg.predict(np.array([x])) Label(LOVE, text = "預(yù)測(cè)結(jié)果已使用年數(shù)為:" + str (predict[ 0 ]) + "年" , font = ( "微軟雅黑" , 10 )).grid(row = 4 , column = 3 ) print (predict) Button(LOVE, text = "預(yù)測(cè):" , font = ( "微軟雅黑" , 15 ),command = pp).grid(row = 4 , column = 0 ) Label(LOVE, text = "2、選擇文件預(yù)測(cè)" , font = ( "微軟雅黑" , 20 )).grid(row = 5 , column = 0 ) path1 = StringVar() label1 = tk.Label(LOVE,text = "目標(biāo)路徑:" , font = ( "微軟雅黑" , 10 )) label1.grid(row = 6 , column = 0 ) e1 = tk.Entry(LOVE,textvariable = path1, font = ( "微軟雅黑" , 10 )) e1.grid(row = 6 , column = 2 ) label = [ "上升沿斜率(v/us)" , "下降沿斜率(v/us)" , "脈寬(ns)" , "低狀態(tài)電平(mv)" , "低電平方差(mv2)x10-3" , "高狀態(tài)電平(v)" , "高電平方差(v2)" , "信號(hào)質(zhì)量因子" ] n = 0 for i in predict_value: print ( str (label) + "分別為" + str (X[n]) + "預(yù)測(cè)出來的結(jié)果為:" + str (i) + "年" + "\n" ) f = open ( "預(yù)測(cè)結(jié)果.txt" , "a" ) f.write( str (label) + "分別為" + str (X[n]) + "預(yù)測(cè)出來的結(jié)果為:" + str (i) + "年" + "\n" ) f.close() f = open ( "result.txt" , "a" ) f.write( str (i) + "\n" ) f.close() n + = 1 messagebox.showinfo(title = '模型情況' , message = "預(yù)測(cè)結(jié)果保存在當(dāng)前文件夾下的TXT文件中!" ) os.system( "result.txt" ) os.system( "預(yù)測(cè)結(jié)果.txt" ) Button(LOVE, text = "預(yù)測(cè):" , font = ( "微軟雅黑" , 15 ), command = ppt).grid(row = 7 , column = 0 ) |
效果如下可見:
選擇文件進(jìn)行讀取預(yù)測(cè)和模型訓(xùn)練數(shù)據(jù)的讀取類似,代碼如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
|
#選擇文件預(yù)測(cè) def selectPath1(): # 選擇文件path_接收文件地址 path_ = tkinter.filedialog.askopenfilename() # 通過replace函數(shù)替換絕對(duì)文件地址中的/來使文件可被程序讀取 # 注意:\\轉(zhuǎn)義后為\,所以\\\\轉(zhuǎn)義后為\\ path_ = path_.replace( "/" , "\\\\" ) # path設(shè)置path_的值 path1. set (path_) return path bn1 = tk.Button(LOVE,text = "路徑選擇" , font = ( "微軟雅黑" , 10 ), command = selectPath1) bn1.grid(row = 6 , column = 6 ) def ppt(): try : os.remove( "預(yù)測(cè)結(jié)果.txt" ) os.remove( "result.txt" ) except : pass # 文件的名字 FILENAME = path1.get() # 禁用科學(xué)計(jì)數(shù)法 pd.set_option( 'float_format' , lambda x: '%.3f' % x) np.set_printoptions(threshold = np.inf) # 得到的DataFrame讀入所有數(shù)據(jù) data = pd.read_excel(FILENAME, header = 0 , usecols = "A,B,C,D,E,F,G,H" ) # DataFrame轉(zhuǎn)化為array DataArray = data.values # 讀取其他參數(shù)作為自變量,影響因素 X = DataArray[:, 0 : 8 ] predict_value = reg.predict(X) print (predict_value) |
效果如下:
由于讀取文件進(jìn)行預(yù)測(cè)的話,數(shù)據(jù)較多故直接存儲(chǔ)在TXT中方便查看
以上就是Python制作數(shù)據(jù)預(yù)測(cè)集成工具(值得收藏)的詳細(xì)內(nèi)容,更多關(guān)于python 數(shù)據(jù)預(yù)測(cè)的資料請(qǐng)關(guān)注服務(wù)器之家其它相關(guān)文章!
原文鏈接:https://developer.51cto.com/art/202008/624349.htm?utm_source=tuicool&utm_medium=referral