Introduction to Bayesian Optimization

h1ros

2019-10-30

Goal¶

This notebook aims to introduce how Bayesian Optimization works using bayesian-optimization module.

Bayesian Optimization is the way of estimating the unknown function where we can choose the arbitrary input $x$ and obtain the response from that function. The outcome of Bayesian Optimization is to obtain the mean and confidence interval of the function we look for by step. You could also stop earlier or decide go further iteratively.

This will cover the very first toy example of Bayesian Optimization by defining "black-box" function and show how interactively or step-by-step Bayesian Optimization will figure and estimate this "black-box" function.

Reference

Libraries¶

In [41]:

from bayes_opt import BayesianOptimization
from bayes_opt import UtilityFunction
import numpy as np
import warnings
from IPython.display import display, clear_output
import matplotlib.pyplot as plt
from matplotlib import gridspec
%matplotlib inline

Unknown Function¶

We can have any function to estimate here. As an example, we will have 1-D function defined by the following equation:

$f(x) = 3e^{-(x-3)^{2}} - e^{-(x-2)^2} + 2 e^{-(x+3)^2}$

In [2]:

def unknown_func(x):
    return 3 * np.exp(-(x-3) **2) - np.exp(-(x-2) **2) + 2 * np.exp(-(x + 3) **2)

If we visualize the unknown function (as a reference), we can plot like below. Note that we are not supposed to know this plot since this function is "black-box"

In [4]:

x = np.linspace(-6, 6, 10000).reshape(-1, 1)
y = unknown_func(x)

plt.plot(x, y);
plt.title('1-D Unknown Function to be estimated');
plt.xlabel('X');
plt.ylabel('Response from the function');

Bayesian Optimization¶

First of all, we need to create BayesianOptimization object by passing the function f you want to estimate with its input boundary as pbounds.

In [5]:

optimizer = BayesianOptimization(f=unknown_func, pbounds={'x': (-6, 6)}, verbose=0)
optimizer

Out[5]:

<bayes_opt.bayesian_optimization.BayesianOptimization at 0x11ab55dd8>

Then, we can start to explore this function by trying different inputs.

init_points is the number of initial points to start with.
n_iter is the number of iteration. This optimizer.maximize hold the state so whenever you execute it, it will continue from the last iteration.

Helper functions¶

In [26]:

def posterior(optimizer, x_obs, y_obs, grid):
    optimizer._gp.fit(x_obs, y_obs)

    mu, sigma = optimizer._gp.predict(grid, return_std=True)
    return mu, sigma

def plot_gp(optimizer, x, y, fig=None, xlim=None):
    if fig is None:
        fig = plt.figure(figsize=(16, 10))
    steps = len(optimizer.space)
    fig.suptitle(
        'Gaussian Process and Utility Function After {} Steps'.format(steps),
        fontdict={'size':30}
    )
    
    gs = gridspec.GridSpec(2, 1, height_ratios=[3, 1]) 
    axis = plt.subplot(gs[0])
    acq = plt.subplot(gs[1])
    
    x_obs = np.array([[res["params"]["x"]] for res in optimizer.res])
    y_obs = np.array([res["target"] for res in optimizer.res])
    
    mu, sigma = posterior(optimizer, x_obs, y_obs, x)
    axis.plot(x, y, linewidth=3, label='Target')
    axis.plot(x_obs.flatten(), y_obs, 'D', markersize=8, label=u'Observations', color='r')
    axis.plot(x, mu, '--', color='k', label='Prediction')

    axis.fill(np.concatenate([x, x[::-1]]), 
              np.concatenate([mu - 1.9600 * sigma, (mu + 1.9600 * sigma)[::-1]]),
        alpha=.3, fc='C0', ec='None', label='95% confidence interval')
    if xlim is not None:
        axis.set_xlim(xlim)
    axis.set_ylim((None, None))
    axis.set_ylabel('f(x)', fontdict={'size':20})
    axis.set_xlabel('x', fontdict={'size':20})
    
    utility_function = UtilityFunction(kind="ucb", kappa=5, xi=0)
    utility = utility_function.utility(x, optimizer._gp, 0)
    acq.plot(x, utility, label='Utility Function', color='C3')
    acq.plot(x[np.argmax(utility)], np.max(utility), 'o', markersize=15, 
             label=u'Next Best Guess', markerfacecolor='gold', markeredgecolor='k', markeredgewidth=1)
    if xlim is not None:
        acq.set_xlim(xlim)
    acq.set_ylim((np.min(utility) , np.max(utility) + 0.5))
    acq.set_ylabel('Utility', fontdict={'size':20})
    acq.set_xlabel('x', fontdict={'size':20})
    
    axis.legend(loc=2, bbox_to_anchor=(1.01, 1), borderaxespad=0.)
    acq.legend(loc=2, bbox_to_anchor=(1.01, 1), borderaxespad=0.)
    return fig

Visualize the iterative step¶

In [54]:

# fig = plt.figure(figsize=(16, 10))
xlim = (-6, 6)
optimizer = BayesianOptimization(f=unknown_func, pbounds={'x': xlim}, verbose=0)

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    for i in range(15):
        break
#         optimizer.maximize(init_points=0, n_iter=1, kappa=5)
#         fig = plot_gp(optimizer, x, y, fig=fig, xlim=xlim)
#         display(plt.gcf())
#         clear_output(wait=True)

2019-10-31 16-11-38 2019-10-31 16_15_50_bayesian_optimization

Time Series Forecasting for Daily Births Dataset by Prophet

h1ros

2019-08-10

Comments

Goal¶

This post aims to introduce how to forecast the time series data for daily births dataset using Prophet.

Prohpeht Forecasting Model

$y (y) = g(t) + s(t) + h(t) + \epsilon_t$

$g(t)$ : a growth term, which is a trend curve
$s(t)$ : a seasonality term, which periodically changes
$h(t)$ : a holiday term, which indicates irregular events (given by users)
$\epsilon_t$ : an error term

Reference

Libraries¶

In [55]:

import pandas as pd
import numpy as np
import fbprophet
from fbprophet.plot import add_changepoints_to_plot
import warnings
import matplotlib.pyplot as plt
%matplotlib inline

Load a female daily births dataset¶

In [30]:

url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.csv"
df = pd.read_csv(url, parse_dates=['Date'], date_parser=pd.to_datetime)
df.columns = ['ds', 'y']
df.head()

Out[30]:

	ds	y
0	1959-01-01	35
1	1959-01-02	32
2	1959-01-03	30
3	1959-01-04	31
4	1959-01-05	44

Visualize the raw data¶

In [40]:

plt.plot(df['ds'], df['y']);
plt.title('Daily Female Births in 1959');

Create a Prophet instance¶

In [76]:

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    m = fbprophet.Prophet(yearly_seasonality=True, daily_seasonality=False, 
                          changepoint_range=0.9, 
                          changepoint_prior_scale=0.5,
                          seasonality_mode='multiplicative')
    m.fit(df);

In [77]:

future = m.make_future_dataframe(periods=50, freq='d')
forecast = m.predict(future)

Visualize¶

By Component¶

In [78]:

m.plot_components(forecast);

Plot raw data with forecast¶

In [79]:

m.plot(forecast);

Change point detection¶

No change is detected.

In [80]:

fig = m.plot(forecast)
a = add_changepoints_to_plot(fig.gca(), m, forecast);

Prophet 101: a time-series forecasting module

h1ros

2019-08-08

Comments

Goal¶

This post aims to introduce the basics of Prophet, which is a time-series forecasting module implemented by Facebook.

Reference

Official documentation - Prophet

Gradient descent implementation from scratch

h1ros

2019-08-07

Comments

Goal¶

This post aims to introduce how to implement Gradient Descent from scratch.

Reference

Libraries¶

In [ ]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Define the function to be explored¶

In [2]:

def surface_function(x):
    return np.sum(x**2)

In [63]:

delta = 0.025
x = np.arange(-3.0, 3.0, delta)
y = np.arange(-3.0, 3.0, delta)
X, Y = np.meshgrid(x, y)
Z = np.add(X**2, Y**2)
fig, ax = plt.subplots()
CS = ax.contour(X, Y, Z)
ax.clabel(CS, inline=1, fontsize=10);

Compute gradient given a function¶

In [36]:

def numerical_gradient(f, x0):
    h = 1e-5
    grad = np.zeros_like(x0)

    
    for i in range(x0.size):
        tmp = x0[i]
        
        # compute f(x0 + h)
        x0[i] = tmp + h
        fx0_p = f(x0)

        
        # compute f(x0 - h)
        x0[i] = tmp - h
        fx0_m = f(x0)
        
        grad[i] = (fx0_p - fx0_m) / (2*h)
        x0[i] = tmp
        
    return grad

In [37]:

numerical_gradient(surface_function, np.array([0., 1.]))

Out[37]:

array([0., 2.])

Define gradient descent function¶

In [84]:

class GradientDescent:
    def __init__(self):
        self.trace = []
    
    def gradient_descent(self, f, init_x, lr=0.01, steps=200):
        x = init_x
        for i in range(steps):
            self.trace.append(list(x))
            gradient = numerical_gradient(f, x)
            x -= lr * gradient
        

        return x

In [85]:

init = np.array([3., 3.])
gd = GradientDescent()
gd.gradient_descent(surface_function, init)

Out[85]:

array([0.05276384, 0.05276384])

Plot the trace on the surface function¶

In [89]:

df_trace = pd.DataFrame(gd.trace, columns=['x', 'y']).reset_index()
df_trace.head()

Out[89]:

	index	x	y
0	0	3.000000	3.000000
1	1	2.940000	2.940000
2	2	2.881200	2.881200
3	3	2.823576	2.823576
4	4	2.767104	2.767104

In [106]:

fig, ax = plt.subplots(figsize=(8, 5))
CS = ax.contour(X, Y, Z)
ax.clabel(CS, inline=1, fontsize=10);
df_trace.iloc[::15, :].plot(kind='scatter', x='x', y='y', c='index', cmap='Reds', marker='x', s=200, 
              ax=ax, title='Trace of Gradient Descent');

Saving Machine Learning Models by joblib

h1ros

2019-08-06

Comments

Goal¶

This post aims to introduce how to save the machine learning model using joblib.

Libraries¶

In [4]:

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston
import joblib

Create a data¶

In [5]:

boston = load_boston()
X, y = boston.data, boston.target

Train a model¶

In [6]:

reg = LinearRegression()
reg.fit(X, y)

Out[6]:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

Save the model as pickle¶

In [7]:

joblib.dump(reg, 'linear_regression.pkl')

Out[7]:

['linear_regression.pkl']

check the saved model¶

In [11]:

!ls | grep linear*pkl

linear_regression.pkl

Load the saved model¶

In [13]:

loaded_reg = joblib.load('linear_regression.pkl')
loaded_reg

Out[13]:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

Explain the interaction values by SHAP

h1ros

2019-08-05

Comments

Goal¶

This post aims to introduce how to explain the interaction values for the model's prediction by SHAP. In this post, we will use data NHANES I (1971-1974) from National Health and Nutrition Examaination Survey.

Reference

Explain the prediction for ImageNet using SHAP

h1ros

2019-08-03

Comments

Goal¶

This post aims to introduce how to explain the prediction for ImageNet using SHAP.

Reference

Github SHAP - ImageNet VGG16 Model with Keras

Libraries¶

In [82]:

import keras
from keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions
from keras.preprocessing import image
import requests
from skimage.segmentation import slic
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import shap
import warnings

%matplotlib inline

Configuration¶

In [77]:

# make a color map
from matplotlib.colors import LinearSegmentedColormap
colors = []
for l in np.linspace(1, 0, 100):
    colors.append((245 / 255, 39 / 255, 87 / 255, l))
for l in np.linspace(0, 1, 100):
    colors.append((24 / 255, 196 / 255, 93 / 255, l))
cm = LinearSegmentedColormap.from_list("shap", colors)

Load pre-trained VGG16 model¶

In [2]:

# load model data
r = requests.get('https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json')
feature_names = r.json()
model = VGG16()

WARNING:tensorflow:From /Users/hiro/anaconda3/envs/py367/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5
553467904/553467096 [==============================] - 62s 0us/step

Load an image data¶

In [16]:

# load an image
file = "../images/apple-banana.jpg"
img = image.load_img(file, target_size=(224, 224))
img_orig = image.img_to_array(img)
plt.imshow(img);
plt.axis('off');

Segmentation¶

In [18]:

# Create segmentation to explain by segment, not every pixel
segments_slic = slic(img, n_segments=30, compactness=30, sigma=3)

plt.imshow(segments_slic);
plt.axis('off');

In [37]:

segments_slic

Out[37]:

array([[ 0,  0,  0, ...,  4,  4,  4],
       [ 0,  0,  0, ...,  4,  4,  4],
       [ 0,  0,  0, ...,  4,  4,  4],
       ...,
       [22, 22, 22, ..., 21, 21, 21],
       [22, 22, 22, ..., 21, 21, 21],
       [22, 22, 22, ..., 21, 21, 21]])

Utility Functions for masking and preprocessing¶

In [19]:

# define a function that depends on a binary mask representing if an image region is hidden
def mask_image(zs, segmentation, image, background=None):
    
    if background is None:
        background = image.mean((0, 1))
        
    # Create an empty 4D array
    out = np.zeros((zs.shape[0], 
                    image.shape[0], 
                    image.shape[1], 
                    image.shape[2]))
    
    for i in range(zs.shape[0]):
        out[i, :, :, :] = image
        for j in range(zs.shape[1]):
            if zs[i, j] == 0:
                out[i][segmentation == j, :] = background
    return out


def f(z):
    return model.predict(
        preprocess_input(mask_image(z, segments_slic, img_orig, 255)))

def fill_segmentation(values, segmentation):
    out = np.zeros(segmentation.shape)
    for i in range(len(values)):
        out[segmentation == i] = values[i]
    return out

In [39]:

masked_images = mask_image(np.zeros((1,50)), segments_slic, img_orig, 255)

plt.imshow(masked_images[0][:,:, 0]);
plt.axis('off');

Create an explainer and shap values¶

In [83]:

# use Kernel SHAP to explain the network's predictions
explainer = shap.KernelExplainer(f, np.zeros((1,50)))

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    shap_values = explainer.shap_values(np.ones((1,50)), nsamples=100) # runs VGG16 1000 times

Obtain the prediction with the highest probability¶

In [85]:

predictions = model.predict(preprocess_input(np.expand_dims(img_orig.copy(), axis=0)))
top_preds = np.argsort(-predictions)

In [86]:

pd.Series(data={feature_names[str(inds[i])][1]:predictions[0, inds[i]]  for i in range(10)}).plot(kind='bar', title='Top 10 Predictions');

Explain the prediction by visualization¶

The following image is explained well for banana.

In [87]:

# Visualize the explanations
fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(12,4))
inds = top_preds[0]
axes[0].imshow(img)
axes[0].axis('off')

max_val = np.max([np.max(np.abs(shap_values[i][:,:-1])) for i in range(len(shap_values))])
for i in range(3):
    m = fill_segmentation(shap_values[inds[i]][0], segments_slic)
    axes[i+1].set_title(feature_names[str(inds[i])][1])
    axes[i+1].imshow(np.array(img.convert('LA'))[:, :, 0], alpha=0.15)
    im = axes[i+1].imshow(m, cmap=cm, vmin=-max_val, vmax=max_val)
    axes[i+1].axis('off')
cb = fig.colorbar(im, ax=axes.ravel().tolist(), label="SHAP value", orientation="horizontal", aspect=60)
cb.outline.set_visible(False)
plt.show()

Sentiment Analysis by SHAP with Logistic Regression

h1ros

2019-08-02

Comments

Goal¶

This post aims to introduce how to do sentiment analysis using SHAP with logistic regression.

Reference

Explain Iris classification by SHAP

h1ros

2019-08-01

Comments

Goal¶

This post aims to introduce how to explain Iris classification by SHAP.

Reference

Github for SHAP - Iris classification with scikit-learn

Libraries¶

In [8]:

import sklearn
from sklearn.model_selection import train_test_split
import numpy as np
import shap
import time
shap.initjs()

Load Iris Data¶

In [2]:

X_train, X_test, Y_train, Y_test = train_test_split(
    *shap.datasets.iris(), test_size=0.2, random_state=0)

In [3]:

# Predictor 
X_train.head()

Out[3]:

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
137	6.4	3.1	5.5	1.8
84	5.4	3.0	4.5	1.5
27	5.2	3.5	1.5	0.2
127	6.1	3.0	4.9	1.8
132	6.4	2.8	5.6	2.2

In [19]:

# Label 
Y_train[:5]

Out[19]:

array([2, 1, 0, 2, 2])

Train K-nearest neighbors¶

In [4]:

clf = sklearn.neighbors.KNeighborsClassifier()
clf.fit(X_train, Y_train)

Out[4]:

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=5, p=2,
           weights='uniform')

Create an explainer¶

In [10]:

explainer = shap.KernelExplainer(clf.predict_proba, X_train)

Using 120 background data samples could cause slower run times. Consider using shap.kmeans(data, K) to summarize the background as K weighted samples.

Use summarized X by k-measn¶

In [21]:

X_train_summary = shap.kmeans(X_train, 50)
explainer = shap.KernelExplainer(clf.predict_proba, X_train_summary)

Explain one test prediction¶

In [22]:

shap_values = explainer.shap_values(X_test.iloc[0, :])
shap.force_plot(explainer.expected_value[0], shap_values[0], X_test.iloc[0, :])

/Users/hiro/anaconda3/envs/py367/lib/python3.6/site-packages/shap/explainers/kernel.py:545: UserWarning: l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \

Out[22]:

Explain Image Classification by SHAP Deep Explainer

h1ros

2019-07-30

Comments

Goal¶

This post aims to introduce how to explain Image Classification (trained by PyTorch) via SHAP Deep Explainer.

Shap is the module to make the black box model interpretable. For example, image classification tasks can be explained by the scores on each pixel on a predicted image, which indicates how much it contributes to the probability positively or negatively.

Reference

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
137	6.4	3.1	5.5	1.8
84	5.4	3.0	4.5	1.5
27	5.2	3.5	1.5	0.2
127	6.1	3.0	4.9	1.8
132	6.4	2.8	5.6	2.2

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
137	6.4	3.1	5.5	1.8
84	5.4	3.0	4.5	1.5
27	5.2	3.5	1.5	0.2
127	6.1	3.0	4.9	1.8
132	6.4	2.8	5.6	2.2

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
137	6.4	3.1	5.5	1.8
84	5.4	3.0	4.5	1.5
27	5.2	3.5	1.5	0.2
127	6.1	3.0	4.9	1.8
132	6.4	2.8	5.6	2.2