Converting A Dictionary Into A Matrix using DictVectorizer
Goal¶
This post aims to introduce how to convert a dictionary into a matrix using DictVectorizer from scikit-learn. This is useful when you have data stored in a list of a sparse dictionary format and would like to convert it into a feature vector digestable in a scikit-learn format.
Reference
Libraries¶
In [6]:
from sklearn.feature_extraction import DictVectorizer
import pandas as pd
Create a list of a dictionary as an input¶
In [20]:
d_house= [{'area': 300.0, 'price': 1000, 'location': 'NY'},
{'area': 600.0, 'price': 2000, 'location': 'CA'},
{'price': 1500, 'location': 'CH'}
]
d_house
Out[20]:
Convert a list of dictionary into a feature vector¶
In [18]:
dv = DictVectorizer()
dv.fit(d_house)
Out[18]:
In [19]:
pd.DataFrame(dv.fit_transform(d_house).todense(), columns=dv.feature_names_)
Out[19]: