Converting A Dictionary Into A Matrix using DictVectorizer

h1ros

Jun 7, 2019, 6:08:08 AM

Goal¶

This post aims to introduce how to convert a dictionary into a matrix using DictVectorizer from scikit-learn. This is useful when you have data stored in a list of a sparse dictionary format and would like to convert it into a feature vector digestable in a scikit-learn format.

Reference

Scikit-learn DictVectorizer

Libraries¶

In [6]:

from sklearn.feature_extraction import DictVectorizer
import pandas as pd

Create a list of a dictionary as an input¶

In [20]:

d_house= [{'area': 300.0, 'price': 1000, 'location': 'NY'},
          {'area': 600.0, 'price': 2000, 'location': 'CA'},
          {'price': 1500, 'location': 'CH'}
         ]
d_house

Out[20]:

[{'area': 300.0, 'price': 1000, 'location': 'NY'},
 {'area': 600.0, 'price': 2000, 'location': 'CA'},
 {'price': 1500, 'location': 'CH'}]

Convert a list of dictionary into a feature vector¶

In [18]:

dv = DictVectorizer()
dv.fit(d_house)

Out[18]:

DictVectorizer(dtype=<class 'numpy.float64'>, separator='=', sort=True,
        sparse=True)

In [19]:

pd.DataFrame(dv.fit_transform(d_house).todense(), columns=dv.feature_names_)

Out[19]:

	area	location=CA	location=CH	location=NY	price
0	300.0	0.0	0.0	1.0	1000.0
1	600.0	1.0	0.0	0.0	2000.0
2	0.0	0.0	1.0	0.0	1500.0