Goal¶

This post aims to introduce how to explain Iris classification by SHAP.

Reference

Github for SHAP - Iris classification with scikit-learn

Libraries¶

In [8]:

import sklearn
from sklearn.model_selection import train_test_split
import numpy as np
import shap
import time
shap.initjs()

Load Iris Data¶

In [2]:

X_train, X_test, Y_train, Y_test = train_test_split(
    *shap.datasets.iris(), test_size=0.2, random_state=0)

In [3]:

# Predictor 
X_train.head()

Out[3]:

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
137	6.4	3.1	5.5	1.8
84	5.4	3.0	4.5	1.5
27	5.2	3.5	1.5	0.2
127	6.1	3.0	4.9	1.8
132	6.4	2.8	5.6	2.2

In [19]:

# Label 
Y_train[:5]

Out[19]:

array([2, 1, 0, 2, 2])

Train K-nearest neighbors¶

In [4]:

clf = sklearn.neighbors.KNeighborsClassifier()
clf.fit(X_train, Y_train)

Out[4]:

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=5, p=2,
           weights='uniform')

Create an explainer¶

In [10]:

explainer = shap.KernelExplainer(clf.predict_proba, X_train)

Using 120 background data samples could cause slower run times. Consider using shap.kmeans(data, K) to summarize the background as K weighted samples.

Use summarized X by k-measn¶

In [21]:

X_train_summary = shap.kmeans(X_train, 50)
explainer = shap.KernelExplainer(clf.predict_proba, X_train_summary)

Explain one test prediction¶

In [22]:

shap_values = explainer.shap_values(X_test.iloc[0, :])
shap.force_plot(explainer.expected_value[0], shap_values[0], X_test.iloc[0, :])

/Users/hiro/anaconda3/envs/py367/lib/python3.6/site-packages/shap/explainers/kernel.py:545: UserWarning: l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \

Out[22]:

Posts about explainer

Explain Iris classification by SHAP

Goal¶

Libraries¶

Load Iris Data¶

Train K-nearest neighbors¶

Create an explainer¶

Use summarized X by k-measn¶

Explain one test prediction¶

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
137	6.4	3.1	5.5	1.8
84	5.4	3.0	4.5	1.5
27	5.2	3.5	1.5	0.2
127	6.1	3.0	4.9	1.8
132	6.4	2.8	5.6	2.2

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
137	6.4	3.1	5.5	1.8
84	5.4	3.0	4.5	1.5
27	5.2	3.5	1.5	0.2
127	6.1	3.0	4.9	1.8
132	6.4	2.8	5.6	2.2

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
137	6.4	3.1	5.5	1.8
84	5.4	3.0	4.5	1.5
27	5.2	3.5	1.5	0.2
127	6.1	3.0	4.9	1.8
132	6.4	2.8	5.6	2.2