Posts about Machine Learning

Introduction to Bayesian Optimization


This notebook aims to introduce how Bayesian Optimization works using bayesian-optimization module.

Bayesian Optimization is the way of estimating the unknown function where we can choose the arbitrary input $x$ and obtain the response from that function. The outcome of Bayesian Optimization is to obtain the mean and confidence interval of the function we look for by step. You could also stop earlier or decide go further iteratively.

This will cover the very first toy example of Bayesian Optimization by defining "black-box" function and show how interactively or step-by-step Bayesian Optimization will figure and estimate this "black-box" function.



In [41]:
from bayes_opt import BayesianOptimization
from bayes_opt import UtilityFunction
import numpy as np
import warnings
from IPython.display import display, clear_output
import matplotlib.pyplot as plt
from matplotlib import gridspec
%matplotlib inline

Unknown Function

We can have any function to estimate here. As an example, we will have 1-D function defined by the following equation:

$$f(x) = 3e^{-(x-3)^{2}} - e^{-(x-2)^2} + 2 e^{-(x+3)^2}$$
In [2]:
def unknown_func(x):
    return 3 * np.exp(-(x-3) **2) - np.exp(-(x-2) **2) + 2 * np.exp(-(x + 3) **2)

If we visualize the unknown function (as a reference), we can plot like below. Note that we are not supposed to know this plot since this function is "black-box"

In [4]:
x = np.linspace(-6, 6, 10000).reshape(-1, 1)
y = unknown_func(x)

plt.plot(x, y);
plt.title('1-D Unknown Function to be estimated');
plt.ylabel('Response from the function');

Bayesian Optimization

First of all, we need to create BayesianOptimization object by passing the function f you want to estimate with its input boundary as pbounds.

In [5]:
optimizer = BayesianOptimization(f=unknown_func, pbounds={'x': (-6, 6)}, verbose=0)
<bayes_opt.bayesian_optimization.BayesianOptimization at 0x11ab55dd8>

Then, we can start to explore this function by trying different inputs.

  • init_points is the number of initial points to start with.
  • n_iter is the number of iteration. This optimizer.maximize hold the state so whenever you execute it, it will continue from the last iteration.

Helper functions

In [26]:
def posterior(optimizer, x_obs, y_obs, grid):, y_obs)

    mu, sigma = optimizer._gp.predict(grid, return_std=True)
    return mu, sigma

def plot_gp(optimizer, x, y, fig=None, xlim=None):
    if fig is None:
        fig = plt.figure(figsize=(16, 10))
    steps = len(
        'Gaussian Process and Utility Function After {} Steps'.format(steps),
    gs = gridspec.GridSpec(2, 1, height_ratios=[3, 1]) 
    axis = plt.subplot(gs[0])
    acq = plt.subplot(gs[1])
    x_obs = np.array([[res["params"]["x"]] for res in optimizer.res])
    y_obs = np.array([res["target"] for res in optimizer.res])
    mu, sigma = posterior(optimizer, x_obs, y_obs, x)
    axis.plot(x, y, linewidth=3, label='Target')
    axis.plot(x_obs.flatten(), y_obs, 'D', markersize=8, label=u'Observations', color='r')
    axis.plot(x, mu, '--', color='k', label='Prediction')

    axis.fill(np.concatenate([x, x[::-1]]), 
              np.concatenate([mu - 1.9600 * sigma, (mu + 1.9600 * sigma)[::-1]]),
        alpha=.3, fc='C0', ec='None', label='95% confidence interval')
    if xlim is not None:
    axis.set_ylim((None, None))
    axis.set_ylabel('f(x)', fontdict={'size':20})
    axis.set_xlabel('x', fontdict={'size':20})
    utility_function = UtilityFunction(kind="ucb", kappa=5, xi=0)
    utility = utility_function.utility(x, optimizer._gp, 0)
    acq.plot(x, utility, label='Utility Function', color='C3')
    acq.plot(x[np.argmax(utility)], np.max(utility), 'o', markersize=15, 
             label=u'Next Best Guess', markerfacecolor='gold', markeredgecolor='k', markeredgewidth=1)
    if xlim is not None:
    acq.set_ylim((np.min(utility) , np.max(utility) + 0.5))
    acq.set_ylabel('Utility', fontdict={'size':20})
    acq.set_xlabel('x', fontdict={'size':20})
    axis.legend(loc=2, bbox_to_anchor=(1.01, 1), borderaxespad=0.)
    acq.legend(loc=2, bbox_to_anchor=(1.01, 1), borderaxespad=0.)
    return fig

Visualize the iterative step

In [54]:
# fig = plt.figure(figsize=(16, 10))
xlim = (-6, 6)
optimizer = BayesianOptimization(f=unknown_func, pbounds={'x': xlim}, verbose=0)

with warnings.catch_warnings():
    for i in range(15):
#         optimizer.maximize(init_points=0, n_iter=1, kappa=5)
#         fig = plot_gp(optimizer, x, y, fig=fig, xlim=xlim)
#         display(plt.gcf())
#         clear_output(wait=True)

2019-10-31 16-11-38 2019-10-31 16_15_50_bayesian_optimization

Time Series Forecasting for Daily Births Dataset by Prophet


This post aims to introduce how to forecast the time series data for daily births dataset using Prophet.

Prohpeht Forecasting Model

$$ y (y) = g(t) + s(t) + h(t) + \epsilon_t $$
  • $g(t)$: a growth term, which is a trend curve
  • $s(t)$: a seasonality term, which periodically changes
  • $h(t)$: a holiday term, which indicates irregular events (given by users)
  • $\epsilon_t$: an error term



In [55]:
import pandas as pd
import numpy as np
import fbprophet
from fbprophet.plot import add_changepoints_to_plot
import warnings
import matplotlib.pyplot as plt
%matplotlib inline

Load a female daily births dataset

In [30]:
url = ""
df = pd.read_csv(url, parse_dates=['Date'], date_parser=pd.to_datetime)
df.columns = ['ds', 'y']
ds y
0 1959-01-01 35
1 1959-01-02 32
2 1959-01-03 30
3 1959-01-04 31
4 1959-01-05 44

Visualize the raw data

In [40]:
plt.plot(df['ds'], df['y']);
plt.title('Daily Female Births in 1959');

Create a Prophet instance

In [76]:
with warnings.catch_warnings():
    m = fbprophet.Prophet(yearly_seasonality=True, daily_seasonality=False, 
In [77]:
future = m.make_future_dataframe(periods=50, freq='d')
forecast = m.predict(future)


By Component

In [78]:

Plot raw data with forecast

In [79]:

Change point detection

No change is detected.

In [80]:
fig = m.plot(forecast)
a = add_changepoints_to_plot(fig.gca(), m, forecast);

Gradient descent implementation from scratch


This post aims to introduce how to implement Gradient Descent from scratch.




In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Define the function to be explored

In [2]:
def surface_function(x):
    return np.sum(x**2)
In [63]:
delta = 0.025
x = np.arange(-3.0, 3.0, delta)
y = np.arange(-3.0, 3.0, delta)
X, Y = np.meshgrid(x, y)
Z = np.add(X**2, Y**2)
fig, ax = plt.subplots()
CS = ax.contour(X, Y, Z)
ax.clabel(CS, inline=1, fontsize=10);

Compute gradient given a function

In [36]:
def numerical_gradient(f, x0):
    h = 1e-5
    grad = np.zeros_like(x0)

    for i in range(x0.size):
        tmp = x0[i]
        # compute f(x0 + h)
        x0[i] = tmp + h
        fx0_p = f(x0)

        # compute f(x0 - h)
        x0[i] = tmp - h
        fx0_m = f(x0)
        grad[i] = (fx0_p - fx0_m) / (2*h)
        x0[i] = tmp
    return grad
In [37]:
numerical_gradient(surface_function, np.array([0., 1.]))
array([0., 2.])

Define gradient descent function

In [84]:
class GradientDescent:
    def __init__(self):
        self.trace = []
    def gradient_descent(self, f, init_x, lr=0.01, steps=200):
        x = init_x
        for i in range(steps):
            gradient = numerical_gradient(f, x)
            x -= lr * gradient

        return x
In [85]:
init = np.array([3., 3.])
gd = GradientDescent()
gd.gradient_descent(surface_function, init)
array([0.05276384, 0.05276384])

Plot the trace on the surface function

In [89]:
df_trace = pd.DataFrame(gd.trace, columns=['x', 'y']).reset_index()
index x y
0 0 3.000000 3.000000
1 1 2.940000 2.940000
2 2 2.881200 2.881200
3 3 2.823576 2.823576
4 4 2.767104 2.767104
In [106]:
fig, ax = plt.subplots(figsize=(8, 5))
CS = ax.contour(X, Y, Z)
ax.clabel(CS, inline=1, fontsize=10);
df_trace.iloc[::15, :].plot(kind='scatter', x='x', y='y', c='index', cmap='Reds', marker='x', s=200, 
              ax=ax, title='Trace of Gradient Descent');

Saving Machine Learning Models by joblib


This post aims to introduce how to save the machine learning model using joblib.


In [4]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston
import joblib

Create a data

In [5]:
boston = load_boston()
X, y =,

Train a model

In [6]:
reg = LinearRegression(), y)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,

Save the model as pickle

In [7]:
joblib.dump(reg, 'linear_regression.pkl')

check the saved model

In [11]:
!ls | grep linear*pkl

Load the saved model

In [13]:
loaded_reg = joblib.load('linear_regression.pkl')
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,

Explain the prediction for ImageNet using SHAP


This post aims to introduce how to explain the prediction for ImageNet using SHAP.




In [82]:
import keras
from keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions
from keras.preprocessing import image
import requests
from skimage.segmentation import slic
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import shap
import warnings

%matplotlib inline


In [77]:
# make a color map
from matplotlib.colors import LinearSegmentedColormap
colors = []
for l in np.linspace(1, 0, 100):
    colors.append((245 / 255, 39 / 255, 87 / 255, l))
for l in np.linspace(0, 1, 100):
    colors.append((24 / 255, 196 / 255, 93 / 255, l))
cm = LinearSegmentedColormap.from_list("shap", colors)

Load pre-trained VGG16 model

In [2]:
# load model data
r = requests.get('')
feature_names = r.json()
model = VGG16()
WARNING:tensorflow:From /Users/hiro/anaconda3/envs/py367/lib/python3.6/site-packages/tensorflow/python/framework/ colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Downloading data from
553467904/553467096 [==============================] - 62s 0us/step

Load an image data

In [16]:
# load an image
file = "../images/apple-banana.jpg"
img = image.load_img(file, target_size=(224, 224))
img_orig = image.img_to_array(img)


In [18]:
# Create segmentation to explain by segment, not every pixel
segments_slic = slic(img, n_segments=30, compactness=30, sigma=3)

In [37]:
array([[ 0,  0,  0, ...,  4,  4,  4],
       [ 0,  0,  0, ...,  4,  4,  4],
       [ 0,  0,  0, ...,  4,  4,  4],
       [22, 22, 22, ..., 21, 21, 21],
       [22, 22, 22, ..., 21, 21, 21],
       [22, 22, 22, ..., 21, 21, 21]])

Utility Functions for masking and preprocessing

In [19]:
# define a function that depends on a binary mask representing if an image region is hidden
def mask_image(zs, segmentation, image, background=None):
    if background is None:
        background = image.mean((0, 1))
    # Create an empty 4D array
    out = np.zeros((zs.shape[0], 
    for i in range(zs.shape[0]):
        out[i, :, :, :] = image
        for j in range(zs.shape[1]):
            if zs[i, j] == 0:
                out[i][segmentation == j, :] = background
    return out

def f(z):
    return model.predict(
        preprocess_input(mask_image(z, segments_slic, img_orig, 255)))

def fill_segmentation(values, segmentation):
    out = np.zeros(segmentation.shape)
    for i in range(len(values)):
        out[segmentation == i] = values[i]
    return out
In [39]:
masked_images = mask_image(np.zeros((1,50)), segments_slic, img_orig, 255)

plt.imshow(masked_images[0][:,:, 0]);

Create an explainer and shap values

In [83]:
# use Kernel SHAP to explain the network's predictions
explainer = shap.KernelExplainer(f, np.zeros((1,50)))

with warnings.catch_warnings():
    shap_values = explainer.shap_values(np.ones((1,50)), nsamples=100) # runs VGG16 1000 times

Obtain the prediction with the highest probability

In [85]:
predictions = model.predict(preprocess_input(np.expand_dims(img_orig.copy(), axis=0)))
top_preds = np.argsort(-predictions)
In [86]:
pd.Series(data={feature_names[str(inds[i])][1]:predictions[0, inds[i]]  for i in range(10)}).plot(kind='bar', title='Top 10 Predictions');

Explain the prediction by visualization

The following image is explained well for banana.

In [87]:
# Visualize the explanations
fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(12,4))
inds = top_preds[0]

max_val = np.max([np.max(np.abs(shap_values[i][:,:-1])) for i in range(len(shap_values))])
for i in range(3):
    m = fill_segmentation(shap_values[inds[i]][0], segments_slic)
    axes[i+1].imshow(np.array(img.convert('LA'))[:, :, 0], alpha=0.15)
    im = axes[i+1].imshow(m, cmap=cm, vmin=-max_val, vmax=max_val)
cb = fig.colorbar(im, ax=axes.ravel().tolist(), label="SHAP value", orientation="horizontal", aspect=60)

Explain Iris classification by SHAP


This post aims to introduce how to explain Iris classification by SHAP.



In [8]:
import sklearn
from sklearn.model_selection import train_test_split
import numpy as np
import shap
import time