One-Hot Encode Nominal Categorical Features
Goal¶
This post aims to introduce how to create one-hot-encoded features for categorical variables. In this post, two ways of creating one hot encoded features: OneHotEncoder in scikit-learn and get_dummies in pandas.
Peronally, I like get_dummies in pandas since pandas takes care of columns names, type of data and therefore, it looks cleaner and simpler with less code.

Reference
Libraries¶
In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
Create a data for one hot encoding¶
In [4]:
df = pd.DataFrame(data={'fruit': ['apple', 'apple', 'banana', 'orange', 'banana', 'apple'],
'size': ['large', 'medium', 'small','large', 'medium', 'small']})
df
Out[4]:
Create one-hot encoded columns¶
Using OneHotEncoder in sklearn¶
In [17]:
encoder = OneHotEncoder()
df_fruit_encoded = pd.DataFrame(encoder.fit_transform(df[['fruit']]).todense(),
columns=encoder.get_feature_names())
df_fruit_encoded
Out[17]:
Using get_dummies method in pandas¶
In [18]:
pd.get_dummies(df['size'])
Out[18]:
Comments
Comments powered by Disqus