Random Forest Classifer
Goal¶
This post aims to introduce how to train random forest classifier, which is one of most popular machine learning model.
Reference
Libraries¶
In [12]:
import pandas as pd
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt
%matplotlib inline
Load Data¶
In [6]:
X, y = make_blobs(n_samples=10000, n_features=10, centers=100, random_state=0)
df_X = pd.DataFrame(X)
df_X.head()
Out[6]:
In [8]:
df_y = pd.DataFrame(y, columns=['y'])
df_y.head()
Out[8]:
Train a model using Cross Validation¶
In [19]:
clf = RandomForestClassifier(n_estimators=10, max_depth=None, min_samples_split=2, random_state=0)
scores = cross_val_score(clf, X, y, cv=5, verbose=1)
scores.mean()
Out[19]:
In [15]:
pd.DataFrame(scores, columns=['CV Scores']).plot();