Interpretability of prediction for Boston Housing using SHAP
Goal¶
This post aims to introduce how to interpret the prediction for Boston Housing using shap
.
What is SHAP
?
SHAP
is a module for making a prediction by some machine learning models interpretable, where we can see which feature variables have an impact on the predicted value. In other words, it can calculate SHAP
values, i.e., how much the predicted variable would be increased or decreased by a certain feature variable.
Reference
Libraries¶
import xgboost
import shap
shap.initjs()
Load Boston Housing Dataset¶
X, y = shap.datasets.boston()
X[:5]
y[:5]
Train a predictor by xgboost
¶
d_param = {
"learning_rate": 0.01
}
model = xgboost.train(params=d_param,
dtrain=xgboost.DMatrix(X, label=y),
num_boost_round=100)
Create an explainer¶
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)
Outcome of SHAP
¶
Single prediction explainer¶
The visualization below shows the explanations for one prediction based on i-th data.
- red: positive impacts on the prediction
- blue: negative impacts on the prediction
i = 0
shap.force_plot(explainer.expected_value, shap_values[i,:], X.iloc[i,:])
All prediction explainers¶
All explainers like the above are plotted in one graph as below.
shap.summary_plot(shap_values, X, plot_type="violin")
Variable importance¶
This variable importance shown as below simply aggregates the above by computing the sum of the absolute values of shap
values for all data points.
shap.summary_plot(shap_values, X, plot_type="bar")
Force Plot¶
The other way of visualizing shap
values are the one to stack all shap
values across samples or feature values themselves.
shap.force_plot(explainer.expected_value, shap_values, X)
Dependency Plot¶
This plot shows a certain value and its shap
value as a scatter plot with the color specified by automatically selected variable, which separates most the certain value and its shap
value.
# specify by the index of the features
shap.dependence_plot(ind=12, shap_values=shap_values, features=X)
# specify by the feature name
shap.dependence_plot(ind="RM", shap_values=shap_values, features=X)