Ordinal Encoding using Scikit-learn

Goal

This post aims to convert one of the categorical columns for further process using scikit-learn:

Library

In [1]:
import pandas as pd
import sklearn.preprocessing

Create categorical data

In [2]:
df = pd.DataFrame(data={'type': ['cat', 'dog', 'sheep'], 
                       'weight': [10, 15, 50]})
df
Out[2]:
type weight
0 cat 10
1 dog 15
2 sheep 50

Ordinal Encoding

Ordinal encoding is replacing the categories into numbers.

In [3]:
# Instanciate ordinal encoder class
oe = sklearn.preprocessing.OrdinalEncoder()

# Learn the mapping from categories to the numbers
oe.fit(df.loc[:, ['type']])
Out[3]:
OrdinalEncoder(categories='auto', dtype=<class 'numpy.float64'>)
In [4]:
# Apply this ordinal encoder to new data 
oe.transform(pd.DataFrame(['cat'] * 3 + 
                          ['dog'] * 2 + 
                          ['sheep'] * 5))
Out[4]:
array([[0.],
       [0.],
       [0.],
       [1.],
       [1.],
       [2.],
       [2.],
       [2.],
       [2.],
       [2.]])

Comments

Comments powered by Disqus