Posts about Machine Learning

Introduction to Bayesian Optimization

Goal

This notebook aims to introduce how Bayesian Optimization works using bayesian-optimization module.

Bayesian Optimization is the way of estimating the unknown function where we can choose the arbitrary input $x$ and obtain the response from that function. The outcome of Bayesian Optimization is to obtain the mean and confidence interval of the function we look for by step. You could also stop earlier or decide go further iteratively.

This will cover the very first toy example of Bayesian Optimization by defining "black-box" function and show how interactively or step-by-step Bayesian Optimization will figure and estimate this "black-box" function.

Reference

Libraries

In [41]:
from bayes_opt import BayesianOptimization
from bayes_opt import UtilityFunction
import numpy as np
import warnings
from IPython.display import display, clear_output
import matplotlib.pyplot as plt
from matplotlib import gridspec
%matplotlib inline

Unknown Function

We can have any function to estimate here. As an example, we will have 1-D function defined by the following equation:

$$f(x) = 3e^{-(x-3)^{2}} - e^{-(x-2)^2} + 2 e^{-(x+3)^2}$$
In [2]:
def unknown_func(x):
    return 3 * np.exp(-(x-3) **2) - np.exp(-(x-2) **2) + 2 * np.exp(-(x + 3) **2)

If we visualize the unknown function (as a reference), we can plot like below. Note that we are not supposed to know this plot since this function is "black-box"

In [4]:
x = np.linspace(-6, 6, 10000).reshape(-1, 1)
y = unknown_func(x)

plt.plot(x, y);
plt.title('1-D Unknown Function to be estimated');
plt.xlabel('X');
plt.ylabel('Response from the function');

Bayesian Optimization

First of all, we need to create BayesianOptimization object by passing the function f you want to estimate with its input boundary as pbounds.

In [5]:
optimizer = BayesianOptimization(f=unknown_func, pbounds={'x': (-6, 6)}, verbose=0)
optimizer
Out[5]:
<bayes_opt.bayesian_optimization.BayesianOptimization at 0x11ab55dd8>

Then, we can start to explore this function by trying different inputs.

  • init_points is the number of initial points to start with.
  • n_iter is the number of iteration. This optimizer.maximize hold the state so whenever you execute it, it will continue from the last iteration.

Helper functions

In [26]:
def posterior(optimizer, x_obs, y_obs, grid):
    optimizer._gp.fit(x_obs, y_obs)

    mu, sigma = optimizer._gp.predict(grid, return_std=True)
    return mu, sigma

def plot_gp(optimizer, x, y, fig=None, xlim=None):
    if fig is None:
        fig = plt.figure(figsize=(16, 10))
    steps = len(optimizer.space)
    fig.suptitle(
        'Gaussian Process and Utility Function After {} Steps'.format(steps),
        fontdict={'size':30}
    )
    
    gs = gridspec.GridSpec(2, 1, height_ratios=[3, 1]) 
    axis = plt.subplot(gs[0])
    acq = plt.subplot(gs[1])
    
    x_obs = np.array([[res["params"]["x"]] for res in optimizer.res])
    y_obs = np.array([res["target"] for res in optimizer.res])
    
    mu, sigma = posterior(optimizer, x_obs, y_obs, x)
    axis.plot(x, y, linewidth=3, label='Target')
    axis.plot(x_obs.flatten(), y_obs, 'D', markersize=8, label=u'Observations', color='r')
    axis.plot(x, mu, '--', color='k', label='Prediction')

    axis.fill(np.concatenate([x, x[::-1]]), 
              np.concatenate([mu - 1.9600 * sigma, (mu + 1.9600 * sigma)[::-1]]),
        alpha=.3, fc='C0', ec='None', label='95% confidence interval')
    if xlim is not None:
        axis.set_xlim(xlim)
    axis.set_ylim((None, None))
    axis.set_ylabel('f(x)', fontdict={'size':20})
    axis.set_xlabel('x', fontdict={'size':20})
    
    utility_function = UtilityFunction(kind="ucb", kappa=5, xi=0)
    utility = utility_function.utility(x, optimizer._gp, 0)
    acq.plot(x, utility, label='Utility Function', color='C3')
    acq.plot(x[np.argmax(utility)], np.max(utility), 'o', markersize=15, 
             label=u'Next Best Guess', markerfacecolor='gold', markeredgecolor='k', markeredgewidth=1)
    if xlim is not None:
        acq.set_xlim(xlim)
    acq.set_ylim((np.min(utility) , np.max(utility) + 0.5))
    acq.set_ylabel('Utility', fontdict={'size':20})
    acq.set_xlabel('x', fontdict={'size':20})
    
    axis.legend(loc=2, bbox_to_anchor=(1.01, 1), borderaxespad=0.)
    acq.legend(loc=2, bbox_to_anchor=(1.01, 1), borderaxespad=0.)
    return fig

Visualize the iterative step

In [54]:
# fig = plt.figure(figsize=(16, 10))
xlim = (-6, 6)
optimizer = BayesianOptimization(f=unknown_func, pbounds={'x': xlim}, verbose=0)

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    for i in range(15):
        break
#         optimizer.maximize(init_points=0, n_iter=1, kappa=5)
#         fig = plot_gp(optimizer, x, y, fig=fig, xlim=xlim)
#         display(plt.gcf())
#         clear_output(wait=True)

2019-10-31 16-11-38 2019-10-31 16_15_50_bayesian_optimization

Time Series Forecasting for Daily Births Dataset by Prophet

Goal

This post aims to introduce how to forecast the time series data for daily births dataset using Prophet.

Prohpeht Forecasting Model

$$ y (y) = g(t) + s(t) + h(t) + \epsilon_t $$
  • $g(t)$: a growth term, which is a trend curve
  • $s(t)$: a seasonality term, which periodically changes
  • $h(t)$: a holiday term, which indicates irregular events (given by users)
  • $\epsilon_t$: an error term

Reference

Libraries

In [55]:
import pandas as pd
import numpy as np
import fbprophet
from fbprophet.plot import add_changepoints_to_plot
import warnings
import matplotlib.pyplot as plt
%matplotlib inline

Load a female daily births dataset

In [30]:
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.csv"
df = pd.read_csv(url, parse_dates=['Date'], date_parser=pd.to_datetime)
df.columns = ['ds', 'y']
df.head()
Out[30]:
ds y
0 1959-01-01 35
1 1959-01-02 32
2 1959-01-03 30
3 1959-01-04 31
4 1959-01-05 44

Visualize the raw data

In [40]:
plt.plot(df['ds'], df['y']);
plt.title('Daily Female Births in 1959');

Create a Prophet instance

In [76]:
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    m = fbprophet.Prophet(yearly_seasonality=True, daily_seasonality=False, 
                          changepoint_range=0.9, 
                          changepoint_prior_scale=0.5,
                          seasonality_mode='multiplicative')
    m.fit(df);
In [77]:
future = m.make_future_dataframe(periods=50, freq='d')
forecast = m.predict(future)

Visualize

By Component

In [78]:
m.plot_components(forecast);

Plot raw data with forecast

In [79]:
m.plot(forecast);

Change point detection

No change is detected.

In [80]:
fig = m.plot(forecast)
a = add_changepoints_to_plot(fig.gca(), m, forecast);

Gradient descent implementation from scratch

Goal

This post aims to introduce how to implement Gradient Descent from scratch.

image

Reference

Libraries

In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Define the function to be explored

In [2]:
def surface_function(x):
    return np.sum(x**2)
In [63]:
delta = 0.025
x = np.arange(-3.0, 3.0, delta)
y = np.arange(-3.0, 3.0, delta)
X, Y = np.meshgrid(x, y)
Z = np.add(X**2, Y**2)
fig, ax = plt.subplots()
CS = ax.contour(X, Y, Z)
ax.clabel(CS, inline=1, fontsize=10);

Compute gradient given a function

In [36]:
def numerical_gradient(f, x0):
    h = 1e-5
    grad = np.zeros_like(x0)

    
    for i in range(x0.size):
        tmp = x0[i]
        
        # compute f(x0 + h)
        x0[i] = tmp + h
        fx0_p = f(x0)

        
        # compute f(x0 - h)
        x0[i] = tmp - h
        fx0_m = f(x0)
        
        grad[i] = (fx0_p - fx0_m) / (2*h)
        x0[i] = tmp
        
    return grad
        
        
In [37]:
numerical_gradient(surface_function, np.array([0., 1.]))
Out[37]:
array([0., 2.])

Define gradient descent function

In [84]:
class GradientDescent:
    def __init__(self):
        self.trace = []
    
    def gradient_descent(self, f, init_x, lr=0.01, steps=200):
        x = init_x
        for i in range(steps):
            self.trace.append(list(x))
            gradient = numerical_gradient(f, x)
            x -= lr * gradient
        

        return x
In [85]:
init = np.array([3., 3.])
gd = GradientDescent()
gd.gradient_descent(surface_function, init)
Out[85]:
array([0.05276384, 0.05276384])

Plot the trace on the surface function

In [89]:
df_trace = pd.DataFrame(gd.trace, columns=['x', 'y']).reset_index()
df_trace.head()
Out[89]:
index x y
0 0 3.000000 3.000000
1 1 2.940000 2.940000
2 2 2.881200 2.881200
3 3 2.823576 2.823576
4 4 2.767104 2.767104
In [106]:
fig, ax = plt.subplots(figsize=(8, 5))
CS = ax.contour(X, Y, Z)
ax.clabel(CS, inline=1, fontsize=10);
df_trace.iloc[::15, :].plot(kind='scatter', x='x', y='y', c='index', cmap='Reds', marker='x', s=200, 
              ax=ax, title='Trace of Gradient Descent');

Saving Machine Learning Models by joblib

Goal

This post aims to introduce how to save the machine learning model using joblib.

Libraries

In [4]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston
import joblib

Create a data

In [5]:
boston = load_boston()
X, y = boston.data, boston.target

Train a model

In [6]:
reg = LinearRegression()
reg.fit(X, y)
Out[6]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

Save the model as pickle

In [7]:
joblib.dump(reg, 'linear_regression.pkl')
Out[7]:
['linear_regression.pkl']

check the saved model

In [11]:
!ls | grep linear*pkl
linear_regression.pkl

Load the saved model

In [13]:
loaded_reg = joblib.load('linear_regression.pkl')
loaded_reg
Out[13]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

Explain the prediction for ImageNet using SHAP

Goal

This post aims to introduce how to explain the prediction for ImageNet using SHAP.

image

Reference

Libraries

In [82]:
import keras
from keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions
from keras.preprocessing import image
import requests
from skimage.segmentation import slic
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import shap
import warnings

%matplotlib inline

Configuration

In [77]:
# make a color map
from matplotlib.colors import LinearSegmentedColormap
colors = []
for l in np.linspace(1, 0, 100):
    colors.append((245 / 255, 39 / 255, 87 / 255, l))
for l in np.linspace(0, 1, 100):
    colors.append((24 / 255, 196 / 255, 93 / 255, l))
cm = LinearSegmentedColormap.from_list("shap", colors)

Load pre-trained VGG16 model

In [2]:
# load model data
r = requests.get('https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json')
feature_names = r.json()
model = VGG16()
WARNING:tensorflow:From /Users/hiro/anaconda3/envs/py367/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5
553467904/553467096 [==============================] - 62s 0us/step

Load an image data

In [16]:
# load an image
file = "../images/apple-banana.jpg"
img = image.load_img(file, target_size=(224, 224))
img_orig = image.img_to_array(img)
plt.imshow(img);
plt.axis('off');

Segmentation

In [18]:
# Create segmentation to explain by segment, not every pixel
segments_slic = slic(img, n_segments=30, compactness=30, sigma=3)

plt.imshow(segments_slic);
plt.axis('off');
In [37]:
segments_slic
Out[37]:
array([[ 0,  0,  0, ...,  4,  4,  4],
       [ 0,  0,  0, ...,  4,  4,  4],
       [ 0,  0,  0, ...,  4,  4,  4],
       ...,
       [22, 22, 22, ..., 21, 21, 21],
       [22, 22, 22, ..., 21, 21, 21],
       [22, 22, 22, ..., 21, 21, 21]])

Utility Functions for masking and preprocessing

In [19]:
# define a function that depends on a binary mask representing if an image region is hidden
def mask_image(zs, segmentation, image, background=None):
    
    if background is None:
        background = image.mean((0, 1))
        
    # Create an empty 4D array
    out = np.zeros((zs.shape[0], 
                    image.shape[0], 
                    image.shape[1], 
                    image.shape[2]))
    
    for i in range(zs.shape[0]):
        out[i, :, :, :] = image
        for j in range(zs.shape[1]):
            if zs[i, j] == 0:
                out[i][segmentation == j, :] = background
    return out


def f(z):
    return model.predict(
        preprocess_input(mask_image(z, segments_slic, img_orig, 255)))

def fill_segmentation(values, segmentation):
    out = np.zeros(segmentation.shape)
    for i in range(len(values)):
        out[segmentation == i] = values[i]
    return out
In [39]:
masked_images = mask_image(np.zeros((1,50)), segments_slic, img_orig, 255)

plt.imshow(masked_images[0][:,:, 0]);
plt.axis('off');

Create an explainer and shap values

In [83]:
# use Kernel SHAP to explain the network's predictions
explainer = shap.KernelExplainer(f, np.zeros((1,50)))

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    shap_values = explainer.shap_values(np.ones((1,50)), nsamples=100) # runs VGG16 1000 times

Obtain the prediction with the highest probability

In [85]:
predictions = model.predict(preprocess_input(np.expand_dims(img_orig.copy(), axis=0)))
top_preds = np.argsort(-predictions)
In [86]:
pd.Series(data={feature_names[str(inds[i])][1]:predictions[0, inds[i]]  for i in range(10)}).plot(kind='bar', title='Top 10 Predictions');

Explain the prediction by visualization

The following image is explained well for banana.

In [87]:
# Visualize the explanations
fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(12,4))
inds = top_preds[0]
axes[0].imshow(img)
axes[0].axis('off')

max_val = np.max([np.max(np.abs(shap_values[i][:,:-1])) for i in range(len(shap_values))])
for i in range(3):
    m = fill_segmentation(shap_values[inds[i]][0], segments_slic)
    axes[i+1].set_title(feature_names[str(inds[i])][1])
    axes[i+1].imshow(np.array(img.convert('LA'))[:, :, 0], alpha=0.15)
    im = axes[i+1].imshow(m, cmap=cm, vmin=-max_val, vmax=max_val)
    axes[i+1].axis('off')
cb = fig.colorbar(im, ax=axes.ravel().tolist(), label="SHAP value", orientation="horizontal", aspect=60)
cb.outline.set_visible(False)
plt.show()

Explain Iris classification by SHAP

Goal

This post aims to introduce how to explain Iris classification by SHAP.

Reference

Libraries

In [8]:
import sklearn
from sklearn.model_selection import train_test_split
import numpy as np
import shap
import time
shap.initjs()

Load Iris Data

In [2]:
X_train, X_test, Y_train, Y_test = train_test_split(
    *shap.datasets.iris(), test_size=0.2, random_state=0)
In [3]:
# Predictor 
X_train.head()
Out[3]:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
137 6.4 3.1 5.5 1.8
84 5.4 3.0 4.5 1.5
27 5.2 3.5 1.5 0.2
127 6.1 3.0 4.9 1.8
132 6.4 2.8 5.6 2.2
In [19]:
# Label 
Y_train[:5]
Out[19]:
array([2, 1, 0, 2, 2])

Train K-nearest neighbors

In [4]:
clf = sklearn.neighbors.KNeighborsClassifier()
clf.fit(X_train, Y_train)
Out[4]:
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=5, p=2,
           weights='uniform')

Create an explainer

In [10]:
explainer = shap.KernelExplainer(clf.predict_proba, X_train)
Using 120 background data samples could cause slower run times. Consider using shap.kmeans(data, K) to summarize the background as K weighted samples.

Use summarized X by k-measn

In [21]:
X_train_summary = shap.kmeans(X_train, 50)
explainer = shap.KernelExplainer(clf.predict_proba, X_train_summary)

Explain one test prediction

In [22]:
shap_values = explainer.shap_values(X_test.iloc[0, :])
shap.force_plot(explainer.expected_value[0], shap_values[0], X_test.iloc[0, :])
/Users/hiro/anaconda3/envs/py367/lib/python3.6/site-packages/shap/explainers/kernel.py:545: UserWarning: l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!
  "l1_reg=\"auto\" is deprecated and in the next version (v0.29) the behavior will change from a " \
Out[22]:
Visualization omitted, Javascript library not loaded!
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.

Explain Image Classification by SHAP Deep Explainer

Goal

This post aims to introduce how to explain Image Classification (trained by PyTorch) via SHAP Deep Explainer.

Shap is the module to make the black box model interpretable. For example, image classification tasks can be explained by the scores on each pixel on a predicted image, which indicates how much it contributes to the probability positively or negatively.

image

Reference