深度學習 - IMDb數據集 - 過擬合
再續先前IMDb數據練習在深度學習時,遇到最難的問題就是訓練模型在訓練數據上表現很好,但在未見過的數據上並非如此,此時稱為訓練過程的過擬合(over-fit),大部分深度學習的過程都是要降低過擬合的參數調整。
有以下3種方式可以嘗試:
- 減小網絡大小
這比較容易實行,只要改變隱藏維度參數就可以達到,調整網絡維度,數值越小,過擬合的速度越慢。
model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
調整成
model = models.Sequential()
model.add(layers.Dense(8, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(8, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
調整成
model = models.Sequential()
model.add(layers.Dense(8, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(8, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
- 添加權重正則化(weight regularization)
見書:
一種常見的降低過擬合的方法就是強制讓模型權重只能取較小的值, 從而限制模型的複雜度,這使得權重值的分佈更加規則(regular)。這種方法叫作權重正則化(weight regularization),其實現方法是向網絡損失函數中添加與較大權重值相關的成本(cost)。這個成本有兩種形式。
L1 正則化(L1 regularization):添加的成本與權重係數的絕對值[權重的 L1 範數(norm)] 成正比。
L2 正則化(L2 regularization):添加的成本與權重係數的平方(權重的 L2 範數)成正比。神經網絡的 L2 正則化也叫權重衰減(weight decay)。不要被不同的名稱搞混,權重衰減 與 L2 正則化在數學上是完全相同的。
只添加l2的參數
model_regularized = models.Sequential()
model_regularized.add(layers.Dense(16, activation='relu',
kernel_regularizer=regularizers.l2(0.001),
input_shape=(10000,)))
model_regularized.add(layers.Dense(16,
kernel_regularizer=regularizers.l2(0.001),
activation='relu'))
model_regularized.add(layers.Dense(1, activation='sigmoid'))
添加l1+l2的參數
model_regularized = models.Sequential()
model_regularized.add(layers.Dense(16, activation='relu',
kernel_regularizer=regularizers.l1_l2(l1=0.001, l2=0.001),
input_shape=(10000,)))
model_regularized.add(layers.Dense(16,
kernel_regularizer=regularizers.l1_l2(l1=0.001, l2=0.001),
activation='relu'))
model_regularized.add(layers.Dense(1, activation='sigmoid'))
左圖是指調整l2,右圖是l1&l2都調整
可以看出調整l1&l2會使得訓練過擬合的速度將得非常慢。也可以從左圖看出,加入l2權重正則化的模型較沒有修正的模型更不易過擬合(曲線上升緩慢)。
一種常見的降低過擬合的方法就是強制讓模型權重只能取較小的值, 從而限制模型的複雜度,這使得權重值的分佈更加規則(regular)。這種方法叫作權重正則化(weight regularization),其實現方法是向網絡損失函數中添加與較大權重值相關的成本(cost)。這個成本有兩種形式。
L1 正則化(L1 regularization):添加的成本與權重係數的絕對值[權重的 L1 範數(norm)] 成正比。
L2 正則化(L2 regularization):添加的成本與權重係數的平方(權重的 L2 範數)成正比。神經網絡的 L2 正則化也叫權重衰減(weight decay)。不要被不同的名稱搞混,權重衰減 與 L2 正則化在數學上是完全相同的。只添加l2的參數
model_regularized = models.Sequential()
model_regularized.add(layers.Dense(16, activation='relu',
kernel_regularizer=regularizers.l2(0.001),
input_shape=(10000,)))
model_regularized.add(layers.Dense(16,
kernel_regularizer=regularizers.l2(0.001),
activation='relu'))
model_regularized.add(layers.Dense(1, activation='sigmoid'))
model_regularized = models.Sequential()
model_regularized.add(layers.Dense(16, activation='relu',
kernel_regularizer=regularizers.l1_l2(l1=0.001, l2=0.001),
input_shape=(10000,)))
model_regularized.add(layers.Dense(16,
kernel_regularizer=regularizers.l1_l2(l1=0.001, l2=0.001),
activation='relu'))
model_regularized.add(layers.Dense(1, activation='sigmoid'))
- 添加 dropout 正則化
- 見論文
- <<ImageNet Classification with Deep Convolutional
Neural Networks>>
- by Geoffrey E. Hinton
- 4.2 Dropout
- Combining the predictions of many different models is a very successful way to reduce test errors, but it appears to be too expensive for big neural networks that already take several days to train. There is, however, a very efficient version of model combination that only costs about a factor of two during training. The recently-introduced technique, called “dropout” , consists of setting to zero the output of each hidden neuron with probability 0.5. The neurons which are “dropped out” in this way do not contribute to the forward pass and do not participate in back propagation. So every time an input is presented, the neural network samples a different architecture, but all these architectures share weights. This technique reduces complex co-adaptations of neurons, since a neuron cannot rely on the presence of particular other neurons. It is, therefore, forced to learn more robust features that are useful in conjunction with many different random subsets of the other neurons. At test time, we use all the neurons but multiply their outputs by 0.5, which is a reasonable approximation to taking the geometric mean of the predictive distributions produced by the exponentially-many dropout networks.
- 在訓練模型中隨機刪除掉隱藏層一半的神經元,導致神經元對正向傳遞沒有貢獻且不參與反向傳播。因此,每次輸入時,神經網絡都會採樣不同的架構,這些架構共享權重。
- Dropout就是利用這個原理,每次丟掉一半的一隱藏層神經元,相當於在不同的神經網絡上進行訓練,這樣就減少了神經元之間的依賴性即每個神經元不能依賴於某幾個其他的神經元(指層與層之間相連接的神經元)。
- 在Keras中只要加入 model.add(layers.Dropout(0.5)) 即可達到此目地,詳細過程之後再來解釋其函數定義。
- 我們現在將 dropout 層加入IMDb中操作看看,
model_dropout = models.Sequential()
model_dropout.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
model_dropout.add(layers.Dropout(0.5))
model_dropout.add(layers.Dense(16, activation='relu'))
model_dropout.add(layers.Dropout(0.5))
model_dropout.add(layers.Dense(1, activation='sigmoid'))
總結一下,防止神經網絡過擬合的常用方法包括
- 獲取更多的訓練數據
- 減小網絡容量
- 添加權重正則化(weight regularization)
- 添加 dropout
完整程式碼:dorpout操作
from keras.datasets import imdb (train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000) import numpy as np def vectorize_sequences(sequences, dimension=10000): results = np.zeros((len(sequences), dimension)) for i, sequence in enumerate(sequences): results[i, sequence] = 1. return results x_train = vectorize_sequences(train_data) x_test = vectorize_sequences(test_data) y_train = np.asarray(train_labels).astype('float32') y_test = np.asarray(test_labels).astype('float32') from keras import models from keras import layers model = models.Sequential() model.add(layers.Dense(16, activation='relu', input_shape=(10000,))) model.add(layers.Dense(16, activation='relu')) model.add(layers.Dense(1, activation='sigmoid')) model_dropout = models.Sequential() model_dropout.add(layers.Dense(16, activation='relu', input_shape=(10000,))) model_dropout.add(layers.Dropout(0.5)) model_dropout.add(layers.Dense(16, activation='relu')) model_dropout.add(layers.Dropout(0.5)) model_dropout.add(layers.Dense(1, activation='sigmoid')) x_val = x_train[:10000] partial_x_train = x_train[10000:] y_val = y_train[:10000] partial_y_train = y_train[10000:] model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy']) model_dropout.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy']) history_dropout = model_dropout.fit(partial_x_train, partial_y_train, epochs=20, batch_size=512, validation_data=(x_val, y_val)) history = model.fit(partial_x_train, partial_y_train, epochs=20, batch_size=512, validation_data=(x_val, y_val)) import os os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE" import matplotlib matplotlib.use("TkAgg") from matplotlib import pyplot as pet history_dropout_dict = history_dropout.history history_dict = history.history dropout_val_loss_value = history_dropout_dict['val_loss'] val_loss_values = history_dict['val_loss'] epochs = range(1, len(val_loss_values) + 1) pet.plot(epochs, dropout_val_loss_value, 'b+', label='Dropout validation loss value') pet.plot(epochs, val_loss_values, 'g3', label='Validation loss') pet.title('dropout validation loss and validation loss') pet.xlabel('Epochs') pet.ylabel('Loss') pet.legend() pet.show()
沒有留言:
張貼留言