PyOD 0.8 : Examples : AutoEncoder (解説)
翻訳 : (株)クラスキャット セールスインフォメーション
作成日時 : 06/28/2021 (0.8.9)
* 本ページは、PyOD の以下のドキュメントとサンプルを参考にして作成しています:
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。
PyOD 0.8 : Examples : AutoEncoder
完全なサンプル : examples/auto_encoder_example.py
合成データの生成
pyod.utils.data.generate_data() でサンプルデータを生成します :
from pyod.utils.data import generate_data
contamination = 0.1 # percentage of outliers
n_train = 20000 # number of training points
n_test = 2000 # number of testing points
n_features = 300 # number of features
# Generate sample data
X_train, y_train, X_test, y_test = generate_data(
n_train=n_train,
n_test=n_test,
n_features=n_features,
contamination=contamination,
random_state=42)
X_train, y_train の shape と値を確認します :
print(X_train.shape)
print(y_train.shape)
(20000, 300) (20000,)
X_train[:10]
array([[6.43365854, 5.5091683 , 5.04469788, ..., 4.98920813, 6.08796866, 5.65703627], [6.98114644, 4.97019307, 7.24011768, ..., 4.13407401, 4.17437525, 7.14246591], [6.96879306, 5.29747338, 5.29666367, ..., 5.97531553, 6.40414268, 5.8399228 ], ..., [5.13676552, 6.62890142, 6.14075622, ..., 5.42330645, 5.68013833, 7.49193446], [6.09141558, 5.30143243, 7.09624952, ..., 6.32592813, 7.31717914, 7.4945297 ], [5.74924769, 6.76427622, 7.10854915, ..., 6.38070765, 6.23367069, 6.3011638 ]])
y_train[:10]
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
モデル訓練
pyod.models.auto_encoder.AutoEncoder 検出器をインポートして初期化し、そしてモデルを適合させます。
オートエンコーダ (AE) は有用なデータ表現を教師なしで学習するためのニューラルネットワークの一種です。PCA と同様に、再構築エラーを計算することによりデータの外れ値オブジェクトを検出するために使用できるでしょう。
参照 :
- Charu C Aggarwal. Outlier analysis. In Data mining, 75–79. Springer, 2015.
from pyod.models.auto_encoder import AutoEncoder
# train AutoEncoder detector
clf_name = 'AutoEncoder'
clf = AutoEncoder(epochs=30, contamination=contamination)
clf.fit(X_train)
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 300) 90300 _________________________________________________________________ dropout (Dropout) (None, 300) 0 _________________________________________________________________ dense_1 (Dense) (None, 300) 90300 _________________________________________________________________ dropout_1 (Dropout) (None, 300) 0 _________________________________________________________________ dense_2 (Dense) (None, 64) 19264 _________________________________________________________________ dropout_2 (Dropout) (None, 64) 0 _________________________________________________________________ dense_3 (Dense) (None, 32) 2080 _________________________________________________________________ dropout_3 (Dropout) (None, 32) 0 _________________________________________________________________ dense_4 (Dense) (None, 32) 1056 _________________________________________________________________ dropout_4 (Dropout) (None, 32) 0 _________________________________________________________________ dense_5 (Dense) (None, 64) 2112 _________________________________________________________________ dropout_5 (Dropout) (None, 64) 0 _________________________________________________________________ dense_6 (Dense) (None, 300) 19500 ================================================================= Total params: 224,612 Trainable params: 224,612 Non-trainable params: 0 _________________________________________________________________ None Epoch 1/30 563/563 [==============================] - 2s 3ms/step - loss: 3.9799 - val_loss: 1.6536 Epoch 2/30 563/563 [==============================] - 1s 2ms/step - loss: 1.3378 - val_loss: 1.2611 Epoch 3/30 563/563 [==============================] - 1s 2ms/step - loss: 1.1653 - val_loss: 1.1830 Epoch 4/30 563/563 [==============================] - 1s 2ms/step - loss: 1.1163 - val_loss: 1.1421 Epoch 5/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0902 - val_loss: 1.1188 Epoch 6/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0745 - val_loss: 1.1108 Epoch 7/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0609 - val_loss: 1.0937 Epoch 8/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0519 - val_loss: 1.0851 Epoch 9/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0439 - val_loss: 1.0823 Epoch 10/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0372 - val_loss: 1.0715 Epoch 11/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0309 - val_loss: 1.0658 Epoch 12/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0247 - val_loss: 1.0612 Epoch 13/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0193 - val_loss: 1.0576 Epoch 14/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0158 - val_loss: 1.0543 Epoch 15/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0129 - val_loss: 1.0523 Epoch 16/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0103 - val_loss: 1.0495 Epoch 17/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0081 - val_loss: 1.0476 Epoch 18/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0062 - val_loss: 1.0460 Epoch 19/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0045 - val_loss: 1.0445 Epoch 20/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0031 - val_loss: 1.0433 Epoch 21/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0019 - val_loss: 1.0422 Epoch 22/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0008 - val_loss: 1.0413 Epoch 23/30 563/563 [==============================] - 1s 2ms/step - loss: 1.0001 - val_loss: 1.0405 Epoch 24/30 563/563 [==============================] - 1s 2ms/step - loss: 0.9993 - val_loss: 1.0399 Epoch 25/30 563/563 [==============================] - 1s 2ms/step - loss: 0.9987 - val_loss: 1.0394 Epoch 26/30 563/563 [==============================] - 1s 2ms/step - loss: 0.9982 - val_loss: 1.0389 Epoch 27/30 563/563 [==============================] - 1s 2ms/step - loss: 0.9978 - val_loss: 1.0386 Epoch 28/30 563/563 [==============================] - 1s 2ms/step - loss: 0.9975 - val_loss: 1.0383 Epoch 29/30 563/563 [==============================] - 1s 2ms/step - loss: 0.9972 - val_loss: 1.0380 Epoch 30/30 563/563 [==============================] - 1s 2ms/step - loss: 0.9969 - val_loss: 1.0378
AutoEncoder(batch_size=32, contamination=0.1, dropout_rate=0.2, epochs=30, hidden_activation='relu', hidden_neurons=[64, 32, 32, 64], l2_regularizer=0.1, loss=, optimizer='adam', output_activation='sigmoid', preprocessing=True, random_state=None, validation_size=0.1, verbose=1)
訓練データの予測ラベルと外れ値スコアを取得します :
y_train_pred = clf.labels_ # binary labels (0: inliers, 1: outliers)
y_train_scores = clf.decision_scores_ # raw outlier scores
y_train_pred
array([0, 0, 0, ..., 1, 1, 1])
y_train_scores[:10]
array([7.71661709, 8.3933272 , 8.03062351, 8.3012123 , 7.29930043, 7.8202035 , 8.25040261, 7.89435037, 8.68701496, 7.54829144])
予測と評価
先に正解ラベルを確認しておきます :
y_test
array([0., array([0., 0., 0., ..., 1., 1., 1.])
テストデータ上で予測を行ないます :
y_test_pred = clf.predict(X_test) # outlier labels (0 or 1)
y_test_scores = clf.decision_function(X_test) # outlier scores
y_test_pred
array([0, 0, 0, ..., 1, 1, 1])
y_test_scores[:10]
array([7.88748478, 7.4678574 , 7.68665623, 7.7947202 , 7.81147175, 7.44892412, 7.35455911, 7.74632114, 7.46644309, 8.08000442])
ROC と Precision @ Rank n pyod.utils.data.evaluate_print() を使用して予測を評価します。
from pyod.utils.data import evaluate_print
# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, y_train, y_train_scores)
print("\nOn Test Data:")
evaluate_print(clf_name, y_test, y_test_scores)
On Training Data: AutoEncoder ROC:1.0, precision @ rank n:1.0 On Test Data: AutoEncoder ROC:1.0, precision @ rank n:1.0
以上