為什么訓(xùn)練誤差比測(cè)試誤差高很多?
一個(gè)Keras的模型有兩個(gè)模式:訓(xùn)練模式和測(cè)試模式。一些正則機(jī)制,如Dropout,L1/L2正則項(xiàng)在測(cè)試模式下將不被啟用。
另外,訓(xùn)練誤差是訓(xùn)練數(shù)據(jù)每個(gè)batch的誤差的平均。在訓(xùn)練過(guò)程中,每個(gè)epoch起始時(shí)的batch的誤差要大一些,而后面的batch的誤差要小一些。另一方面,每個(gè)epoch結(jié)束時(shí)計(jì)算的測(cè)試誤差是由模型在epoch結(jié)束時(shí)的狀態(tài)決定的,這時(shí)候的網(wǎng)絡(luò)將產(chǎn)生較小的誤差。
【Tips】可以通過(guò)定義回調(diào)函數(shù)將每個(gè)epoch的訓(xùn)練誤差和測(cè)試誤差并作圖,如果訓(xùn)練誤差曲線和測(cè)試誤差曲線之間有很大的空隙,說(shuō)明你的模型可能有過(guò)擬合的問(wèn)題。當(dāng)然,這個(gè)問(wèn)題與Keras無(wú)關(guān)。
在keras中文文檔中指出了這一誤區(qū),筆者認(rèn)為產(chǎn)生這一問(wèn)題的原因在于網(wǎng)絡(luò)實(shí)現(xiàn)的機(jī)制。即dropout層有前向?qū)崿F(xiàn)和反向?qū)崿F(xiàn)兩種方式,這就決定了概率p是在訓(xùn)練時(shí)候設(shè)置還是測(cè)試的時(shí)候進(jìn)行設(shè)置
利用預(yù)訓(xùn)練的權(quán)值進(jìn)行Fine tune時(shí)的注意事項(xiàng):
不能把自己添加的層進(jìn)行將隨機(jī)初始化后直接連接到前面預(yù)訓(xùn)練后的網(wǎng)絡(luò)層
in order to perform fine-tuning, all layers should start with properly trained weights: for instance you should not slap a randomly initialized fully-connected network on top of a pre-trained convolutional base. This is because the large gradient updates triggered by the randomly initialized weights would wreck the learned weights in the convolutional base. In our case this is why we first train the top-level classifier, and only then start fine-tuning convolutional weights alongside it.
we choose to only fine-tune the last convolutional block rather than the entire network in order to prevent overfitting, since the entire network would have a very large entropic capacity and thus a strong tendency to overfit. The features learned by low-level convolutional blocks are more general, less abstract than those found higher-up, so it is sensible to keep the first few blocks fixed (more general features) and only fine-tune the last one (more specialized features).
fine-tuning should be done with a very slow learning rate, and typically with the SGD optimizer rather than an adaptative learning rate optimizer such as RMSProp. This is to make sure that the magnitude of the updates stays very small, so as not to wreck the previously learned features.
補(bǔ)充知識(shí):keras框架中用keras.models.Model做的時(shí)候預(yù)測(cè)數(shù)據(jù)不是標(biāo)簽的問(wèn)題
我們發(fā)現(xiàn),在用Sequential去搭建網(wǎng)絡(luò)的時(shí)候,其中有predict和predict_classes兩個(gè)預(yù)測(cè)函數(shù),前一個(gè)是返回的精度,后面的是返回的具體標(biāo)簽。但是,在使用keras.models.Model去做的時(shí)候,就會(huì)發(fā)現(xiàn),它只有一個(gè)predict函數(shù),沒有返回標(biāo)簽的predict_classes函數(shù),所以,針對(duì)這個(gè)問(wèn)題,我們將其改寫。改寫如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
def my_predict_classes(predict_data): if predict_data.shape[ - 1 ] > 1 : return predict_data.argmax(axis = - 1 ) else : return (predict_data > 0.5 ).astype( 'int32' ) # 這里省略網(wǎng)絡(luò)搭建部分。。。。 model = Model(data_input, label_output) model. compile (loss = 'categorical_crossentropy' , optimizer = keras.optimizers.Nadam(lr = 0.002 ), metrics = [ 'accuracy' ]) model.summary() y_predict = model.predict(X_test) y_pre = my_predict_classes(y_predict) |
這樣,y_pre就是具體的標(biāo)簽了。
以上這篇淺談keras 模型用于預(yù)測(cè)時(shí)的注意事項(xiàng)就是小編分享給大家的全部?jī)?nèi)容了,希望能給大家一個(gè)參考,也希望大家多多支持服務(wù)器之家。
原文鏈接:https://blog.csdn.net/xiaojiajia007/article/details/73771311