Anomaly Detection by Auto Encoder (Deep Learning) in PyOD

h1ros

Jun 29, 2019, 7:21:18 AM

Goal¶

This post aims to introduce how to detect anomaly using Auto Encoder (Deep Learning) in PyODand Keras / Tensorflow as backend.

Anomaly Detection by PCA in PyOD

h1ros

Jun 28, 2019, 7:36:59 AM

Comments

Goal¶

This post aims to introduce how to detect anomaly using PCA in pyod.

Reference

Libraries¶

In [31]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# PyOD
from pyod.utils.data import generate_data, get_outliers_inliers
from pyod.models.pca import PCA
from pyod.utils.data import evaluate_print
from pyod.utils.example import visualize

Create a data¶

In [66]:

X_train, y_train = generate_data(behaviour='new', n_features=5, train_only=True)
df_train = pd.DataFrame(X_train)
df_train['y'] = y_train

In [50]:

df_train.head()

Out[50]:

	0	1	2	3	4
0	5.475324	4.882372	5.337351	5.376340	4.104947
1	5.244566	5.626358	5.356578	4.341500	4.856838
2	4.597031	5.787669	5.959738	5.823086	6.012408
3	4.637728	4.639901	5.400144	6.074926	4.627883
4	4.639908	4.667926	6.077212	5.012901	3.718718

In [57]:

sns.scatterplot(x=0, y=1, hue='y', data=df_train);
plt.title('Ground Truth');

Train an unsupervised PCA¶

In [52]:

clf = PCA()
clf.fit(X_train)

Out[52]:

PCA(contamination=0.1, copy=True, iterated_power='auto', n_components=None,
  n_selected_components=None, random_state=None, standardization=True,
  svd_solver='auto', tol=0.0, weighted=True, whiten=False)

Evaluate training score¶

In [65]:

y_train_pred = clf.labels_
y_train_scores = clf.decision_scores_
sns.scatterplot(x=0, y=1, hue=y_train_scores, data=df_train, palette='RdBu_r');
plt.title('Anomaly Scores by PCA');

Make Simulated Data For Anomaly Detection

h1ros

Jun 27, 2019, 5:43:54 AM

Comments

Goal¶

This post aims to introduce how to make simulated data for anomaly detection using PyOD, which is outlier detection package.

Reference

Libraries¶

In [58]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# PyOD
from pyod.utils.data import generate_data, get_outliers_inliers

Create an anomaly dataset¶

Create random data with 5 features¶

In [21]:

X_train, X_test, y_train, y_test = generate_data(behaviour='new', n_features=5)
df_tr = pd.DataFrame(X_train)
df_tr['y'] = y_train
df_te = pd.DataFrame(X_test)
df_te['y'] = y_test

In [22]:

df_tr.head()

Out[22]:

	0	1	2	3	4
0	2.392715	3.084379	2.972580	2.907177	3.155727
1	3.185049	2.789920	2.648234	3.062398	2.673828
2	3.683184	3.169288	2.973224	2.725969	2.213359
3	2.928545	2.823802	2.888037	3.109228	2.813928
4	3.112898	3.365741	2.599102	3.090721	3.391458

Visualize created anomaly data¶

In [57]:

axes = df_tr.plot(subplots=True, figsize=(16, 8), title='Simulated Anomaly Data for Training');
plt.tight_layout(rect=[0, 0.03, 1, 0.95])

In [56]:

axes = df_te.plot(subplots=True, figsize=(16, 8), title='Simulated Anomaly Data for Test');
plt.tight_layout(rect=[0, 0.03, 1, 0.95])