Random Forest Classifer
Goal¶
This post aims to introduce how to train random forest classifier, which is one of most popular machine learning model.
Reference
Libraries¶
import pandas as pd
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt
%matplotlib inline
Load Data¶
X, y = make_blobs(n_samples=10000, n_features=10, centers=100, random_state=0)
df_X = pd.DataFrame(X)
df_X.head()
df_y = pd.DataFrame(y, columns=['y'])
df_y.head()
Train a model using Cross Validation¶
clf = RandomForestClassifier(n_estimators=10, max_depth=None, min_samples_split=2, random_state=0)
scores = cross_val_score(clf, X, y, cv=5, verbose=1)
scores.mean()
pd.DataFrame(scores, columns=['CV Scores']).plot();