Bayesian Regression using pymc3
Goal¶
This post aims to introduce how to use pymc3
for Bayesian regression by showing the simplest single variable example.
Reference
Libraries¶
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import pymc3 as pm
%matplotlib inline
Create a data for Bayesian regression¶
To compare non-Bayesian linear regression, the way to generate data follows the one used in this post Linear Regression
\begin{equation*} \mathbf{y} = \mathbf{A}\mathbf{x} + \mathbf{b} + \mathbf{e} \end{equation*}Here $\mathbf{x}$ is a 1 dimension vector, $\mathbf{b}$ is a constant variable, $\mathbf{e}$ is white noise.
a = 10
b = 4
n = 100
sigma = 3
e = sigma * np.random.randn(n)
x = np.linspace(-1, 1, num=n)
y = a * x + b + e
plt.plot(x, y, '.', label='observed y');
plt.plot(x, a * x + b, 'r', label='true y');
plt.legend();
Modeling using pymc
¶
In Bayesian world, the above formula is reformulated as below:
\begin{equation*} \mathbf{y} \sim \mathcal{N} (\mathbf{A}\mathbf{x} + \mathbf{b}, \sigma^2) \end{equation*}In this case, we regard $\mathbf{y}$ as a random variable following the Normal distribution defined by the mean $\mathbf{A}\mathbf{x} + \mathbf{b}$ and the variance $\sigma^2$.
model = pm.Model()
with model:
a_0 = pm.Normal('a_0', mu=1, sigma=10)
b_0 = pm.Normal('b_0', mu=1, sigma=10)
x_0 = pm.Normal('x_0', mu=0, sigma=1, observed=x)
mu_0 = a_0 * x_0 + b_0
sigma_0 = pm.HalfCauchy('sigma_0', beta=10)
y_0 = pm.Normal('y_0', mu=mu_0, sigma=sigma_0, observed=y)
trace = pm.sample(500)
pm.traceplot(trace);
pm.summary(trace)
Linear Regression¶
reg = LinearRegression()
reg.fit(x.reshape(-1, 1), y);
print(f'Coefficients A: {reg.coef_[0]:.3}, Intercept b: {reg.intercept_:.2}')
Plot Detrministic and Bayesian Regression Lines¶
plt.plot(x, y, '.', label='observed y', c='C0')
plt.plot(x, a * x + b, label='true y', lw=3., c='C3')
pm.plot_posterior_predictive_glm(trace, samples=30,
eval=x,
lm=lambda x, sample: sample['b_0'] + sample['a_0'] * x,
label='posterior predictive regression', c='C2')
plt.plot(x, reg.coef_[0] * x + reg.intercept_ , label='deterministic linear regression', ls='dotted',c='C1', lw=3)
plt.legend(loc=0);
plt.title('Posterior Predictive Regression Lines');