Draw Perceptron graph by graphviz

h1ros

Jul 26, 2019, 3:50:34 AM

Comments

Goal¶

This post aims to introduce how to draw a diagram for perceptron.

Reference

Step-by-step Data Science - Implement Perceptron

Libraries¶

In [1]:

from graphviz import Digraph

Create a node list and dictionary for the edges¶

In [12]:

# List of nodes
l_nodes = ['1', 'x0', 'x1', 'y']

# Dictionary mapping from label name to the edge between two nodes
d_edges = {'b': ('1', 'y'), 
           'w0': ('x0', 'y'), 
           'w1': ('x1', 'y')}

Visualize a graph for perceptron¶

In [13]:

# Create Digraph object
dot = Digraph()
dot.attr(rankdir='LR')

# Add nodes
for n in l_nodes:
    dot.node(n)        

# Add edges
for label, edges in d_edges.items(): 
    dot.edge(edges[0], edges[1], label=label)

# Fill node 1 by gray
dot.node('1', style='filled')
    
# Visualize the graph
dot

Out[13]:

Implement Perceptron

h1ros

Jul 25, 2019, 6:46:00 AM

Comments

Goal¶

This post aims to introduce how to implement Perceptron, which is the foundation of neural network and a simple gate function returning 0 (no signal) or 1 (signal) given a certain input.

In this post, the following fate functions are implemented:

AND
NAND
OR
XOR

$$ y = f(\mathbf{x})=\begin{cases} 0 & (b + \mathbf{wx} \le 0)\\ 1 &(b + \mathbf{wx} \gt 0) \end{cases}$$

Implement `AND` gate¶

In [10]:

def AND(x0, x1, w0=0.5, w1=0.5, b=0.6):
    return ((x0 * w0 + x1 * w1) > b) * 1.0

In [11]:

for x0, x1 in [(0, 0), (0, 1), (1, 0), (1, 1)]:
    print(f"AND(x0={x0}, x1={x1}) = {AND(x0, x1)}")

AND(x0=0, x1=0) = 0.0
AND(x0=0, x1=1) = 0.0
AND(x0=1, x1=0) = 0.0
AND(x0=1, x1=1) = 1.0

Implement `NAND` gate¶

In [24]:

def NAND(x0, x1, w0=-0.5, w1=-0.5, b=-0.6):
    return ((x0 * w0 + x1 * w1) > b) * 1.0

In [25]:

for x0, x1 in [(0, 0), (0, 1), (1, 0), (1, 1)]:
    print(f"NAND(x0={x0}, x1={x1}) = {NAND(x0, x1)}")

NAND(x0=0, x1=0) = 1.0
NAND(x0=0, x1=1) = 1.0
NAND(x0=1, x1=0) = 1.0
NAND(x0=1, x1=1) = 0.0

Implement `OR` gate¶

In [34]:

def OR(x0, x1, w0=0.5, w1=0.5, b=0.2):
    return ((x0 * w0 + x1 * w1) > b) * 1.0

In [35]:

for x0, x1 in [(0, 0), (0, 1), (1, 0), (1, 1)]:
    print(f"OR(x0={x0}, x1={x1}) = {OR(x0, x1)}")

OR(x0=0, x1=0) = 0.0
OR(x0=0, x1=1) = 1.0
OR(x0=1, x1=0) = 1.0
OR(x0=1, x1=1) = 1.0

Implement XOR gate¶

In [36]:

def XOR(x0, x1):
    n0 = NAND(x0, x1)
    n1 = OR(x0, x1)
    return AND(n0, n1)

In [37]:

for x0, x1 in [(0, 0), (0, 1), (1, 0), (1, 1)]:
    print(f"XOR(x0={x0}, x1={x1}) = {XOR(x0, x1)}")

XOR(x0=0, x1=0) = 0.0
XOR(x0=0, x1=1) = 1.0
XOR(x0=1, x1=0) = 1.0
XOR(x0=1, x1=1) = 0.0

Parallel Plot for Cateogrical and Continuous variable by Plotly Express

h1ros

Jul 22, 2019, 8:37:07 AM

Comments

Goal¶

This post aims to introduce how to draw Parallel Plot for categorical and continuous variable by Plotly Express

Reference

Libraries¶

In [19]:

import pandas as pd
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "png"

Create continuous data¶

In [4]:

df = px.data.election()
df.head()

Out[4]:

	district	Coderre	Bergeron	Joly	total	winner	result
0	101-Bois-de-Liesse	2481	1829	3024	7334	Joly	plurality
1	102-Cap-Saint-Jacques	2525	1163	2675	6363	Joly	plurality
2	11-Sault-au-Récollet	3348	2770	2532	8650	Coderre	plurality
3	111-Mile-End	1734	4782	2514	9030	Bergeron	majority
4	112-DeLorimier	1770	5933	3044	10747	Bergeron	majority

Visualize Parallel Plot for continuous data¶

In [20]:

fig = px.parallel_coordinates(df,color='total', color_continuous_scale=px.colors.sequential.Inferno)
fig

Create categorical data¶

In [9]:

df = px.data.election()
df.head()

Out[9]:

	district	Coderre	Bergeron	Joly	total	winner	result
0	101-Bois-de-Liesse	2481	1829	3024	7334	Joly	plurality
1	102-Cap-Saint-Jacques	2525	1163	2675	6363	Joly	plurality
2	11-Sault-au-Récollet	3348	2770	2532	8650	Coderre	plurality
3	111-Mile-End	1734	4782	2514	9030	Bergeron	majority
4	112-DeLorimier	1770	5933	3044	10747	Bergeron	majority

Visualize Parallel for continuous data¶

In [21]:

fig = px.parallel_categories(df, color="total", color_continuous_scale=px.colors.sequential.Inferno)
fig

Split Up: dtreeviz (Part 5)

h1ros

Jul 21, 2019, 1:40:22 AM

Comments

Goal¶

This post aims to break down the module dtreeviz module step by step to fully understand what is implemented. After fully understanding this, I would like to contribute to this module and submit a pull request.

I really like this module and would like to see this works for other tree-based modules like XGBoost or Lightgbm. I found the exact same issue (issues 15) in github so I hope I could contribute to this issue.

This post is the 5th part:

ctreeviz_univar

Reference

`trees.ctreeviz_univar`¶

L267: the beginning of the definition for ctreeviz_univar
L272-275: treatment for pandas input
L277: load color property
L280-288: load decision tree classifier object as shadow_tree and other relevant attributes e.g., # of class, target values.
L290-302: setting labels and spines visibility
L304-319: plotting stacked bar chart with histogram when gtype=='barstacked'
L320-330: plotting scatter plot with gitter
L332: setting tick parameters
L352-353: setting legend
L355-358: setting a title
L360-362: setting splits vertical line between categories

In [53]:

from pathlib import Path
from graphviz.backend import run, view
import matplotlib.pyplot as plt
from dtreeviz.shadow import *
from numbers import Number
import matplotlib.patches as patches
import tempfile
import os
from sys import platform as PLATFORM
from colour import Color, rgb2hex
from typing import Mapping, List
from dtreeviz.utils import inline_svg_images, myround
from dtreeviz.shadow import ShadowDecTree, ShadowDecTreeNode
from dtreeviz.colors import adjust_colors
from sklearn import tree
import graphviz

from dtreeviz.trees import *

# How many bins should we have based upon number of classes
NUM_BINS = [0, 0, 10, 9, 8, 6, 6, 6, 5, 5, 5]
          # 0, 1, 2,  3, 4, 5, 6, 7, 8, 9, 10

def ctreeviz_univar(ax, x_train, y_train, max_depth, feature_name, class_names,
                    target_name,
                    fontsize=14, fontname="Arial", nbins=25, gtype='strip',
                    show={'title','legend','splits'},
                    colors=None):
    if isinstance(x_train, pd.Series):
        x_train = x_train.values
    if isinstance(y_train, pd.Series):
        y_train = y_train.values

    colors = adjust_colors(colors)

    #    ax.set_facecolor('#F9F9F9')
    ct = tree.DecisionTreeClassifier(max_depth=max_depth)
    ct.fit(x_train.reshape(-1, 1), y_train)

    shadow_tree = ShadowDecTree(ct, x_train.reshape(-1, 1), y_train,
                                feature_names=[feature_name], class_names=class_names)

    n_classes = shadow_tree.nclasses()
    overall_feature_range = (np.min(x_train), np.max(x_train))
    class_values = shadow_tree.unique_target_values
    color_values = colors['classes'][n_classes]
    color_map = {v: color_values[i] for i, v in enumerate(class_values)}
    X_colors = [color_map[cl] for cl in class_values]

    ax.set_xlabel(f"{feature_name}", fontsize=fontsize, fontname=fontname,
                  color=colors['axis_label'])
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.yaxis.set_visible(False)
    ax.spines['left'].set_visible(False)
    ax.spines['bottom'].set_linewidth(.3)

    X_hist = [x_train[y_train == cl] for cl in class_values]

    if gtype == 'barstacked':
        bins = np.linspace(start=overall_feature_range[0], stop=overall_feature_range[1], num=nbins, endpoint=True)
        hist, bins, barcontainers = ax.hist(X_hist,
                                            color=X_colors,
                                            align='mid',
                                            histtype='barstacked',
                                            bins=bins,
                                            label=class_names)

        for patch in barcontainers:
            for rect in patch.patches:
                rect.set_linewidth(.5)
                rect.set_edgecolor(colors['edge'])
        ax.set_xlim(*overall_feature_range)
        ax.set_xticks(overall_feature_range)
        ax.set_yticks([0, max([max(h) for h in hist])])
    elif gtype == 'strip':
        # user should pass in short and wide fig
        sigma = .013
        mu = .08
        class_step = .08
        dot_w = 20
        ax.set_ylim(0, mu + n_classes*class_step)
        print('X_hist', X_hist)
        for i, bucket in enumerate(X_hist):
            y_noise = np.random.normal(mu+i*class_step, sigma, size=len(bucket))
            ax.scatter(bucket, y_noise, alpha=.7, marker='o', s=dot_w, c=color_map[i],
                       edgecolors=colors['scatter_edge'], lw=.3)

    ax.tick_params(axis='both', which='major', width=.3, labelcolor=colors['tick_label'],
                   labelsize=fontsize)

    splits = [node.split() for node in shadow_tree.internal]
    splits = sorted(splits)
    bins = [ax.get_xlim()[0]] + splits + [ax.get_xlim()[1]]

    pred_box_height = .07 * ax.get_ylim()[1]
    preds = []
    for i in range(len(bins) - 1):
        left = bins[i]
        right = bins[i + 1]
        inrange = y_train[(x_train >= left) & (x_train <= right)]
        values, counts = np.unique(inrange, return_counts=True)
        pred = values[np.argmax(counts)]
        rect = patches.Rectangle((left, 0), (right - left), pred_box_height, linewidth=.3,
                                 edgecolor=colors['edge'], facecolor=color_map[pred])
        ax.add_patch(rect)
        preds.append(pred)

    if 'legend' in show:
        add_classifier_legend(ax, class_names, class_values, color_map, target_name, colors)

    if 'title' in show:
        accur = ct.score(x_train.reshape(-1, 1), y_train)
        title = f"Classifier tree depth {max_depth}, training accuracy={accur*100:.2f}%"
        plt.title(title, fontsize=fontsize, color=colors['title'])

    if 'splits' in show:
        for split in splits:
            plt.plot([split, split], [*ax.get_ylim()], '--', color=colors['split_line'], linewidth=1)

Create a toy classification example¶

In [48]:

import numpy as np
import graphviz 
from sklearn import tree

X = np.array([0, 1, 0.5, 10, 11, 12, 20, 21, 22, 30, 30, 32]).reshape(-1, 1)
Y = np.array(['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd']).reshape(-1, 1)
clf = tree.DecisionTreeClassifier(max_depth=3)
clf = clf.fit(X, Y)

df = pd.DataFrame(data={'X':X.ravel(), 'Y': Y.ravel()}, index=range(len(X)))
df.plot(kind='bar');
plt.title('Sample Data for Univariate Regression');

Visualize classification tree for univariate case¶

In [54]:

fig, ax = plt.subplots(1)
ctreeviz_univar(ax, pd.Series(X.ravel()), pd.Series(Y.ravel()), 
                feature_name='X', 
                target_name='Y',
                max_depth=4, 
                class_names=['a', 'b', 'c', 'd'], 
                gtype = 'barstacked',
                show={'title', 'splits'}
               )

Note When I apply show={'legend'}, I obtained the error below and still not figured out yet what was wrong.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-42-c31e8b14db34> in <module>
      4                 target_name='Y',
      5                 max_depth=4,
----> 6                 class_names=['a', 'b', 'c', 'd']
      7                )

<ipython-input-41-b466a69d927c> in ctreeviz_univar(ax, x_train, y_train, max_depth, feature_name, class_names, target_name, fontsize, fontname, nbins, gtype, show, colors)
     85         for i, bucket in enumerate(X_hist):
     86             y_noise = np.random.normal(mu+i*class_step, sigma, size=len(bucket))
---> 87             ax.scatter(bucket, y_noise, alpha=.7, marker='o', s=dot_w, c=color_map[i],
     88                        edgecolors=colors['scatter_edge'], lw=.3)
     89 

KeyError: 0

Split Up: dtreeviz (Part 4)

h1ros

Jul 20, 2019, 1:51:36 PM

Comments

Goal¶

This post is the 4th part: breaking down DTreeViz class and rtreeviz_univar method.

Reference

`DTreeViz` class¶

L23: the beginning of DTreeViz class
L24-25: __init__ method taking dot object as an input
L26-78: deal with save, view the visualization as svg file

rtreeviz_univar¶

L81: the beginning of rtreeviz_univar method
L94-102: initial setting for the range of X, y data and converting them into numpy array.
L104-105: create a scikit-learn decision tree
L121-122: plot the original X and y data points
L125-126: plot the vertical line for decision boundary (gray line)
L128-134: plot the horizontal line for mean line (red line by default)
L136: Change the appearance of ticks
L138-140: setting title
L142-143: setting x and y label based on feature_name and target_name

In [4]:

from pathlib import Path
from graphviz.backend import run, view
import matplotlib.pyplot as plt
from dtreeviz.shadow import *
from numbers import Number
import matplotlib.patches as patches
import tempfile
import os
from sys import platform as PLATFORM
from colour import Color, rgb2hex
from typing import Mapping, List
from dtreeviz.utils import inline_svg_images, myround
from dtreeviz.shadow import ShadowDecTree, ShadowDecTreeNode
from dtreeviz.colors import adjust_colors
from sklearn import tree
import graphviz

# How many bins should we have based upon number of classes
NUM_BINS = [0, 0, 10, 9, 8, 6, 6, 6, 5, 5, 5]
          # 0, 1, 2,  3, 4, 5, 6, 7, 8, 9, 10

def rtreeviz_univar(ax,
                    x_train: (pd.Series, np.ndarray),  # 1 vector of X data
                    y_train: (pd.Series, np.ndarray),
                    max_depth = 10,
                    feature_name: str = None,
                    target_name: str = None,
                    min_samples_leaf = 1,
                    fontsize: int = 14,
                    show={'title','splits'},
                    split_linewidth=.5,
                    mean_linewidth = 2,
                    markersize=None,
                    colors=None):
    if isinstance(x_train, pd.Series):
        x_train = x_train.values
    if isinstance(y_train, pd.Series):
        y_train = y_train.values

    colors = adjust_colors(colors)

    y_range = (min(y_train), max(y_train))  # same y axis for all
    overall_feature_range = (np.min(x_train), np.max(x_train))

    t = tree.DecisionTreeRegressor(max_depth=max_depth, min_samples_leaf=min_samples_leaf)
    t.fit(x_train.reshape(-1,1), y_train)

    shadow_tree = ShadowDecTree(t, x_train.reshape(-1,1), y_train, feature_names=[feature_name])
    splits = []
    for node in shadow_tree.internal:
        splits.append(node.split())
    splits = sorted(splits)
    bins = [overall_feature_range[0]] + splits + [overall_feature_range[1]]

    means = []
    for i in range(len(bins) - 1):
        left = bins[i]
        right = bins[i + 1]
        inrange = y_train[(x_train >= left) & (x_train <= right)]
        means.append(np.mean(inrange))

    ax.scatter(x_train, y_train, marker='o', alpha=.4, c=colors['scatter_marker'], s=markersize,
               edgecolor=colors['scatter_edge'], lw=.3)

    if 'splits' in show:
        for split in splits:
            ax.plot([split, split], [*y_range], '--', color=colors['split_line'], linewidth=split_linewidth)

        prevX = overall_feature_range[0]
        for i, m in enumerate(means):
            split = overall_feature_range[1]
            if i < len(splits):
                split = splits[i]
            ax.plot([prevX, split], [m, m], '-', color=colors['mean_line'], linewidth=mean_linewidth)
            prevX = split

    ax.tick_params(axis='both', which='major', width=.3, labelcolor=colors['tick_label'], labelsize=fontsize)

    if 'title' in show:
        title = f"Regression tree depth {max_depth}, samples per leaf {min_samples_leaf},\nTraining $R^2$={t.score(x_train.reshape(-1,1),y_train):.3f}"
        plt.title(title, fontsize=fontsize, color=colors['title'])

    plt.xlabel(feature_name, fontsize=fontsize, color=colors['axis_label'])
    plt.ylabel(target_name, fontsize=fontsize, color=colors['axis_label'])

Create a toy sample¶

In [42]:

import numpy as np
import graphviz 
from sklearn import tree

X = np.array([0, 1, 0.5, 10, 11, 12, 20, 21, 22, 30, 30, 32]).reshape(-1, 1)
Y = np.array([0., 0, 0, 50, 49, 50, 20, 21, 19, 90, 89, 91]).reshape(-1, 1)
clf = tree.DecisionTreeRegressor(max_depth=3)
clf = clf.fit(X, Y)

plt.scatter(x=X, y=Y, s=5);
plt.title('Sample Data for Univariate Regression');

Visualize a tree using `rtreeviz_univar`¶

In [51]:

fig, ax = plt.subplots(1)
rtreeviz_univar(ax, pd.Series(X.ravel()), pd.Series(Y.ravel()), 
                feature_name='X', 
                target_name='Y',
                markersize=15)

Goal¶

Libraries¶

Create a node list and dictionary for the edges¶

Visualize a graph for perceptron¶

Goal¶

Implement AND gate¶

Implement NAND gate¶

Implement OR gate¶

Implement XOR gate¶

Goal¶

Libraries¶

Create continuous data¶

Visualize Parallel Plot for continuous data¶

Create categorical data¶

Visualize Parallel for continuous data¶

Goal¶

trees.ctreeviz_univar¶

Create a toy classification example¶

Visualize classification tree for univariate case¶

Goal¶

DTreeViz class¶

rtreeviz_univar¶

Create a toy sample¶

Visualize a tree using rtreeviz_univar¶

Implement `AND` gate¶

Implement `NAND` gate¶

Implement `OR` gate¶

`trees.ctreeviz_univar`¶

`DTreeViz` class¶

Visualize a tree using `rtreeviz_univar`¶