Anomaly Detection by Auto Encoder (Deep Learning) in PyOD

Goal

This post aims to introduce how to detect anomaly using Auto Encoder (Deep Learning) in PyODand Keras / Tensorflow as backend. image

Side note

To train and use AutoEncoder, I needed to downgrade tensorflow from 2.0.0beta to '1.13.1' since I obtained the error AttributeError: module 'tensorflow' has no attribute 'get_default_graph. See more detail Stack overflow - keras issues#12379

Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# PyOD
from tensorflow.keras import backend as k
from pyod.utils.data import generate_data
from pyod.models.auto_encoder import AutoEncoder
from keras.utils import plot_model
Using TensorFlow backend.
In [18]:
from keras.utils.vis_utils import model_to_dot 
from IPython.display import SVG

Create a data

In [2]:
X_train, y_train = generate_data(behaviour='new', n_features=300, train_only=True)
df_train = pd.DataFrame(X_train)
df_train['y'] = y_train
In [3]:
df_train.shape
Out[3]:
(1000, 301)
In [4]:
df_train.head()
Out[4]:
0 1 2 3 4 5 6 7 8 9 ... 291 292 293 294 295 296 297 298 299 y
0 9.037948 8.992582 9.055109 9.022991 9.153327 9.016862 8.897888 8.845657 9.244953 8.774448 ... 9.075581 8.943452 9.121035 9.021359 8.914599 9.048999 8.911534 8.667534 8.961102 0.0
1 8.919313 8.815112 9.039712 8.956851 8.780776 8.913145 9.188461 9.094610 9.197732 8.900870 ... 8.907852 8.879664 8.819942 9.258486 9.002013 8.792095 8.744538 8.912435 8.825495 0.0
2 9.046329 9.153483 8.885572 8.925652 9.254798 8.993653 9.021508 9.358523 8.814769 9.136656 ... 9.147606 9.076697 9.194071 9.270966 9.262870 8.942777 8.889404 8.828650 8.920056 0.0
3 9.140573 8.917022 8.963374 9.064541 8.957604 8.839891 8.884479 8.947889 9.040254 8.912530 ... 9.102977 8.755141 8.945593 9.077484 9.041634 8.999024 8.966309 9.036304 9.021694 0.0
4 8.918198 9.094257 8.927479 8.967691 8.937096 8.653955 9.339410 9.265347 9.141084 8.921681 ... 9.057729 9.226690 8.942625 8.889547 8.960173 8.951798 8.814305 9.137144 9.019543 0.0

5 rows × 301 columns

In [26]:
df_train.iloc[:, :-1].plot(legend=None, title='Original 300 Dimension Data');
In [25]:
sns.scatterplot(x=0, y=1, data=df_train);
plt.title('Original Data only with 2 dimension out of 300');

Train a auto encoder model

Configuration

In [7]:
contamination = 0.1 
epochs = 30

Train a model

In [8]:
clf = AutoEncoder(epochs=epochs, contamination=contamination)
clf.fit(X_train)
WARNING:tensorflow:From /Users/hiro/anaconda3/envs/py367/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /Users/hiro/anaconda3/envs/py367/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 300)               90300     
_________________________________________________________________
dropout_1 (Dropout)          (None, 300)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 300)               90300     
_________________________________________________________________
dropout_2 (Dropout)          (None, 300)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 64)                19264     
_________________________________________________________________
dropout_3 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_4 (Dense)              (None, 32)                2080      
_________________________________________________________________
dropout_4 (Dropout)          (None, 32)                0         
_________________________________________________________________
dense_5 (Dense)              (None, 32)                1056      
_________________________________________________________________
dropout_5 (Dropout)          (None, 32)                0         
_________________________________________________________________
dense_6 (Dense)              (None, 64)                2112      
_________________________________________________________________
dropout_6 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_7 (Dense)              (None, 300)               19500     
=================================================================
Total params: 224,612
Trainable params: 224,612
Non-trainable params: 0
_________________________________________________________________
None
WARNING:tensorflow:From /Users/hiro/anaconda3/envs/py367/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Train on 900 samples, validate on 100 samples
Epoch 1/30
900/900 [==============================] - 1s 1ms/step - loss: 409.0245 - val_loss: 283.5180
Epoch 2/30
900/900 [==============================] - 0s 168us/step - loss: 245.5365 - val_loss: 171.2701
Epoch 3/30
900/900 [==============================] - 0s 167us/step - loss: 111.1675 - val_loss: 76.3028
Epoch 4/30
900/900 [==============================] - 0s 180us/step - loss: 65.6427 - val_loss: 59.7676
Epoch 5/30
900/900 [==============================] - 0s 165us/step - loss: 48.5084 - val_loss: 51.0341
Epoch 6/30
900/900 [==============================] - 0s 166us/step - loss: 38.3552 - val_loss: 45.0255
Epoch 7/30
900/900 [==============================] - 0s 164us/step - loss: 31.8401 - val_loss: 41.7299
Epoch 8/30
900/900 [==============================] - 0s 164us/step - loss: 27.0036 - val_loss: 38.4174
Epoch 9/30
900/900 [==============================] - 0s 161us/step - loss: 23.6374 - val_loss: 36.7532
Epoch 10/30
900/900 [==============================] - 0s 158us/step - loss: 21.0485 - val_loss: 34.0263
Epoch 11/30
900/900 [==============================] - 0s 161us/step - loss: 19.0564 - val_loss: 33.1850
Epoch 12/30
900/900 [==============================] - 0s 159us/step - loss: 17.1225 - val_loss: 33.4618
Epoch 13/30
900/900 [==============================] - 0s 159us/step - loss: 15.9848 - val_loss: 31.3626
Epoch 14/30
900/900 [==============================] - 0s 164us/step - loss: 14.8303 - val_loss: 31.7367
Epoch 15/30
900/900 [==============================] - 0s 164us/step - loss: 13.7908 - val_loss: 31.2365
Epoch 16/30
900/900 [==============================] - 0s 160us/step - loss: 12.9701 - val_loss: 31.1924
Epoch 17/30
900/900 [==============================] - 0s 159us/step - loss: 12.3456 - val_loss: 30.0029
Epoch 18/30
900/900 [==============================] - 0s 169us/step - loss: 11.6284 - val_loss: 29.8570
Epoch 19/30
900/900 [==============================] - 0s 163us/step - loss: 11.2143 - val_loss: 29.3987
Epoch 20/30
900/900 [==============================] - 0s 159us/step - loss: 10.6979 - val_loss: 29.3927
Epoch 21/30
900/900 [==============================] - 0s 179us/step - loss: 10.3287 - val_loss: 28.8785
Epoch 22/30
900/900 [==============================] - 0s 163us/step - loss: 9.9254 - val_loss: 28.8537
Epoch 23/30
900/900 [==============================] - 0s 163us/step - loss: 9.5745 - val_loss: 28.8735
Epoch 24/30
900/900 [==============================] - 0s 181us/step - loss: 9.2493 - val_loss: 28.1487
Epoch 25/30
900/900 [==============================] - 0s 164us/step - loss: 8.8706 - val_loss: 28.4695
Epoch 26/30
900/900 [==============================] - 0s 168us/step - loss: 8.5600 - val_loss: 28.3033
Epoch 27/30
900/900 [==============================] - 0s 163us/step - loss: 8.3266 - val_loss: 28.2146
Epoch 28/30
900/900 [==============================] - 0s 168us/step - loss: 8.0563 - val_loss: 27.9876
Epoch 29/30
900/900 [==============================] - 0s 157us/step - loss: 7.9045 - val_loss: 28.0378
Epoch 30/30
900/900 [==============================] - 0s 154us/step - loss: 7.6259 - val_loss: 27.9198
Out[8]:
AutoEncoder(batch_size=32, contamination=0.1, dropout_rate=0.2, epochs=30,
      hidden_activation='relu', hidden_neurons=[64, 32, 32, 64],
      l2_regularizer=0.1,
      loss=<function mean_squared_error at 0x12a56b9d8>, optimizer='adam',
      output_activation='sigmoid', preprocessing=True, random_state=None,
      validation_size=0.1, verbose=1)

Calculate anomaly scores

In [9]:
y_train_pred = clf.labels_  
y_train_scores = clf.decision_scores_ 
In [10]:
y_train_pred[:5]
Out[10]:
array([0, 0, 0, 0, 0])
In [11]:
y_train_scores[: 5]
Out[11]:
array([4.29377109, 4.35682625, 4.28639239, 4.27729013, 4.25157335])
In [12]:
plt.plot(y_train_scores);
plt.axhline(y=clf.threshold_, c='r', ls='dotted', label='threshoold');
plt.title('Anomaly Scores with automatically calculated threshold');

Visualize anomaly scores

In [13]:
sns.scatterplot(x=0, y=1, hue=y_train_scores, data=df_train, palette='RdBu_r');
plt.title('Anomaly Scores by PCA');

validation loss history

In [14]:
pd.DataFrame.from_dict(clf.history_).plot(title='Error Loss History');

Visualize the model

In [22]:
SVG(model_to_dot(clf.model_, show_shapes=True, show_layer_names=True, rankdir='TB').create(prog='dot', format='svg'))
Out[22]:
G 5018103536 dense_1: Denseinput:output:(None, 300)(None, 300)5018102472 dropout_1: Dropoutinput:output:(None, 300)(None, 300)5018103536->5018102472 5018101016 dense_2: Denseinput:output:(None, 300)(None, 300)5018102472->5018101016 5017992048 dropout_2: Dropoutinput:output:(None, 300)(None, 300)5018101016->5017992048 5017991824 dense_3: Denseinput:output:(None, 300)(None, 64)5017992048->5017991824 5053271064 dropout_3: Dropoutinput:output:(None, 64)(None, 64)5017991824->5053271064 5053399896 dense_4: Denseinput:output:(None, 64)(None, 32)5053271064->5053399896 5057313928 dropout_4: Dropoutinput:output:(None, 32)(None, 32)5053399896->5057313928 5057435520 dense_5: Denseinput:output:(None, 32)(None, 32)5057313928->5057435520 5057732280 dropout_5: Dropoutinput:output:(None, 32)(None, 32)5057435520->5057732280 5057858864 dense_6: Denseinput:output:(None, 32)(None, 64)5057732280->5057858864 5058308192 dropout_6: Dropoutinput:output:(None, 64)(None, 64)5057858864->5058308192 5058309816 dense_7: Denseinput:output:(None, 64)(None, 300)5058308192->5058309816 5018103368 50181033685018103368->5018103536

Comments

Comments powered by Disqus