Posts about DictVectorizer

Converting A Dictionary Into A Matrix using DictVectorizer

Goal

This post aims to introduce how to convert a dictionary into a matrix using DictVectorizer from scikit-learn. This is useful when you have data stored in a list of a sparse dictionary format and would like to convert it into a feature vector digestable in a scikit-learn format.

image

Reference

Libraries

In [6]:
from sklearn.feature_extraction import DictVectorizer
import pandas as pd

Create a list of a dictionary as an input

In [20]:
d_house= [{'area': 300.0, 'price': 1000, 'location': 'NY'},
          {'area': 600.0, 'price': 2000, 'location': 'CA'},
          {'price': 1500, 'location': 'CH'}
         ]
d_house
Out[20]:
[{'area': 300.0, 'price': 1000, 'location': 'NY'},
 {'area': 600.0, 'price': 2000, 'location': 'CA'},
 {'price': 1500, 'location': 'CH'}]

Convert a list of dictionary into a feature vector

In [18]:
dv = DictVectorizer()
dv.fit(d_house)
Out[18]:
DictVectorizer(dtype=<class 'numpy.float64'>, separator='=', sort=True,
        sparse=True)
In [19]:
pd.DataFrame(dv.fit_transform(d_house).todense(), columns=dv.feature_names_)
Out[19]:
area location=CA location=CH location=NY price
0 300.0 0.0 0.0 1.0 1000.0
1 600.0 1.0 0.0 0.0 2000.0
2 0.0 0.0 1.0 0.0 1500.0