Bayesian Regression using pymc3
Goal¶
This post aims to introduce how to use pymc3
for Bayesian regression by showing the simplest single variable example.
Reference
Libraries¶
In [63]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import pymc3 as pm
%matplotlib inline
Create a data for Bayesian regression¶
To compare non-Bayesian linear regression, the way to generate data follows the one used in this post Linear Regression
\begin{equation*} \mathbf{y} = \mathbf{A}\mathbf{x} + \mathbf{b} + \mathbf{e} \end{equation*}Here $\mathbf{x}$ is a 1 dimension vector, $\mathbf{b}$ is a constant variable, $\mathbf{e}$ is white noise.
In [58]:
a = 10
b = 4
n = 100
sigma = 3
e = sigma * np.random.randn(n)
x = np.linspace(-1, 1, num=n)
y = a * x + b + e
plt.plot(x, y, '.', label='observed y');
plt.plot(x, a * x + b, 'r', label='true y');
plt.legend();
Modeling using pymc
¶
In Bayesian world, the above formula is reformulated as below:
\begin{equation*} \mathbf{y} \sim \mathcal{N} (\mathbf{A}\mathbf{x} + \mathbf{b}, \sigma^2) \end{equation*}In this case, we regard $\mathbf{y}$ as a random variable following the Normal distribution defined by the mean $\mathbf{A}\mathbf{x} + \mathbf{b}$ and the variance $\sigma^2$.
In [59]:
model = pm.Model()
with model:
a_0 = pm.Normal('a_0', mu=1, sigma=10)
b_0 = pm.Normal('b_0', mu=1, sigma=10)
x_0 = pm.Normal('x_0', mu=0, sigma=1, observed=x)
mu_0 = a_0 * x_0 + b_0
sigma_0 = pm.HalfCauchy('sigma_0', beta=10)
y_0 = pm.Normal('y_0', mu=mu_0, sigma=sigma_0, observed=y)
trace = pm.sample(500)
In [60]:
pm.traceplot(trace);
In [61]:
pm.summary(trace)
Out[61]:
Linear Regression¶
In [65]:
reg = LinearRegression()
reg.fit(x.reshape(-1, 1), y);
print(f'Coefficients A: {reg.coef_[0]:.3}, Intercept b: {reg.intercept_:.2}')
Plot Detrministic and Bayesian Regression Lines¶
In [83]:
plt.plot(x, y, '.', label='observed y', c='C0')
plt.plot(x, a * x + b, label='true y', lw=3., c='C3')
pm.plot_posterior_predictive_glm(trace, samples=30,
eval=x,
lm=lambda x, sample: sample['b_0'] + sample['a_0'] * x,
label='posterior predictive regression', c='C2')
plt.plot(x, reg.coef_[0] * x + reg.intercept_ , label='deterministic linear regression', ls='dotted',c='C1', lw=3)
plt.legend(loc=0);
plt.title('Posterior Predictive Regression Lines');