# Normalizing Observations

## Goal¶

This post aims to introduce how to normalize the observations including the followings:

• Min-Max scaling
• Standard scaling

## Libraries¶

In [53]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import minmax_scale, StandardScaler
import matplotlib.pyplot as plt
%matplotlib inline

import warnings
warnings.simplefilter('ignore')


## Create a data¶

In [63]:
df = pd.DataFrame(data=60*np.random.randn(100)+20)
df.describe()

Out[63]:
0
count 100.000000
mean 22.664418
std 54.655875
min -123.482962
25% -13.641444
50% 27.253111
75% 56.597637
max 179.529729
In [64]:
df.hist();
plt.title('Original Data');


## Normalizing¶

### Min-Max Scaling¶

$$x_{min-max\,normalized} =\frac{x - min(x)}{max(x) - min(x)}$$
In [65]:
data_minmax = minmax_scale(df, feature_range=(0, 1))
pd.DataFrame(pd.Series(data_minmax.ravel()).describe())

Out[65]:
0
count 100.000000
mean 0.482314
std 0.180375
min 0.000000
25% 0.362498
50% 0.497458
75% 0.594301
max 1.000000
In [72]:
plt.hist(data_minmax);
plt.title('Min-Max Scaled Data');
plt.axvline(x=np.min(data_minmax), ls=':', c='C0', label='Min');
plt.axvline(x=np.max(data_minmax), ls=':', c='C1', label='Max');
plt.legend();


### Standard Scaler¶

This scaling assumes that the data is sampled from Normal distribution. $$x_{standard\,normalized} = \frac{x - mean(x)}{std(x)}$$

In [67]:
ss = StandardScaler()
ss.fit(df)
data_standard_scaled = ss.transform(df)

In [68]:
pd.DataFrame(pd.Series(data_standard_scaled.ravel()).describe())

Out[68]:
0
count 1.000000e+02
mean 3.552714e-17
std 1.005038e+00
min -2.687426e+00
25% -6.676092e-01
50% 8.437905e-02
75% 6.239799e-01
max 2.884513e+00
In [88]:
plt.axvspan(xmin=np.mean(data_standard_scaled)-3*np.std(data_standard_scaled), xmax=np.mean(data_standard_scaled)+3*np.std(data_standard_scaled), color='red', alpha=0.05,label=r'$Mean \pm 3\sigma$');
plt.hist(data_standard_scaled);
plt.title('Standard Scaled Data');
plt.axvline(x=np.mean(data_standard_scaled), ls='-.', c='red', label='Mean');
plt.legend();