Posts about Visualization

Introduction to Bayesian Optimization

h1ros

Oct 29, 2019, 8:21:37 PM

Goal¶

This notebook aims to introduce how Bayesian Optimization works using bayesian-optimization module.

Bayesian Optimization is the way of estimating the unknown function where we can choose the arbitrary input $x$ and obtain the response from that function. The outcome of Bayesian Optimization is to obtain the mean and confidence interval of the function we look for by step. You could also stop earlier or decide go further iteratively.

This will cover the very first toy example of Bayesian Optimization by defining "black-box" function and show how interactively or step-by-step Bayesian Optimization will figure and estimate this "black-box" function.

Reference

Libraries¶

In [41]:

from bayes_opt import BayesianOptimization
from bayes_opt import UtilityFunction
import numpy as np
import warnings
from IPython.display import display, clear_output
import matplotlib.pyplot as plt
from matplotlib import gridspec
%matplotlib inline

Unknown Function¶

We can have any function to estimate here. As an example, we will have 1-D function defined by the following equation:

$$f(x) = 3e^{-(x-3)^{2}} - e^{-(x-2)^2} + 2 e^{-(x+3)^2}$$

In [2]:

def unknown_func(x):
    return 3 * np.exp(-(x-3) **2) - np.exp(-(x-2) **2) + 2 * np.exp(-(x + 3) **2)

If we visualize the unknown function (as a reference), we can plot like below. Note that we are not supposed to know this plot since this function is "black-box"

In [4]:

x = np.linspace(-6, 6, 10000).reshape(-1, 1)
y = unknown_func(x)

plt.plot(x, y);
plt.title('1-D Unknown Function to be estimated');
plt.xlabel('X');
plt.ylabel('Response from the function');

Bayesian Optimization¶

First of all, we need to create BayesianOptimization object by passing the function f you want to estimate with its input boundary as pbounds.

In [5]:

optimizer = BayesianOptimization(f=unknown_func, pbounds={'x': (-6, 6)}, verbose=0)
optimizer

Out[5]:

<bayes_opt.bayesian_optimization.BayesianOptimization at 0x11ab55dd8>

Then, we can start to explore this function by trying different inputs.

init_points is the number of initial points to start with.
n_iter is the number of iteration. This optimizer.maximize hold the state so whenever you execute it, it will continue from the last iteration.

Helper functions¶

In [26]:

def posterior(optimizer, x_obs, y_obs, grid):
    optimizer._gp.fit(x_obs, y_obs)

    mu, sigma = optimizer._gp.predict(grid, return_std=True)
    return mu, sigma

def plot_gp(optimizer, x, y, fig=None, xlim=None):
    if fig is None:
        fig = plt.figure(figsize=(16, 10))
    steps = len(optimizer.space)
    fig.suptitle(
        'Gaussian Process and Utility Function After {} Steps'.format(steps),
        fontdict={'size':30}
    )
    
    gs = gridspec.GridSpec(2, 1, height_ratios=[3, 1]) 
    axis = plt.subplot(gs[0])
    acq = plt.subplot(gs[1])
    
    x_obs = np.array([[res["params"]["x"]] for res in optimizer.res])
    y_obs = np.array([res["target"] for res in optimizer.res])
    
    mu, sigma = posterior(optimizer, x_obs, y_obs, x)
    axis.plot(x, y, linewidth=3, label='Target')
    axis.plot(x_obs.flatten(), y_obs, 'D', markersize=8, label=u'Observations', color='r')
    axis.plot(x, mu, '--', color='k', label='Prediction')

    axis.fill(np.concatenate([x, x[::-1]]), 
              np.concatenate([mu - 1.9600 * sigma, (mu + 1.9600 * sigma)[::-1]]),
        alpha=.3, fc='C0', ec='None', label='95% confidence interval')
    if xlim is not None:
        axis.set_xlim(xlim)
    axis.set_ylim((None, None))
    axis.set_ylabel('f(x)', fontdict={'size':20})
    axis.set_xlabel('x', fontdict={'size':20})
    
    utility_function = UtilityFunction(kind="ucb", kappa=5, xi=0)
    utility = utility_function.utility(x, optimizer._gp, 0)
    acq.plot(x, utility, label='Utility Function', color='C3')
    acq.plot(x[np.argmax(utility)], np.max(utility), 'o', markersize=15, 
             label=u'Next Best Guess', markerfacecolor='gold', markeredgecolor='k', markeredgewidth=1)
    if xlim is not None:
        acq.set_xlim(xlim)
    acq.set_ylim((np.min(utility) , np.max(utility) + 0.5))
    acq.set_ylabel('Utility', fontdict={'size':20})
    acq.set_xlabel('x', fontdict={'size':20})
    
    axis.legend(loc=2, bbox_to_anchor=(1.01, 1), borderaxespad=0.)
    acq.legend(loc=2, bbox_to_anchor=(1.01, 1), borderaxespad=0.)
    return fig

Visualize the iterative step¶

In [54]:

# fig = plt.figure(figsize=(16, 10))
xlim = (-6, 6)
optimizer = BayesianOptimization(f=unknown_func, pbounds={'x': xlim}, verbose=0)

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    for i in range(15):
        break
#         optimizer.maximize(init_points=0, n_iter=1, kappa=5)
#         fig = plot_gp(optimizer, x, y, fig=fig, xlim=xlim)
#         display(plt.gcf())
#         clear_output(wait=True)

2019-10-31 16-11-38 2019-10-31 16_15_50_bayesian_optimization

Draw Perceptron graph by graphviz

h1ros

Jul 26, 2019, 3:50:34 AM

Comments

Goal¶

This post aims to introduce how to draw a diagram for perceptron.

Reference

Step-by-step Data Science - Implement Perceptron

Libraries¶

In [1]:

from graphviz import Digraph

Create a node list and dictionary for the edges¶

In [12]:

# List of nodes
l_nodes = ['1', 'x0', 'x1', 'y']

# Dictionary mapping from label name to the edge between two nodes
d_edges = {'b': ('1', 'y'), 
           'w0': ('x0', 'y'), 
           'w1': ('x1', 'y')}

Visualize a graph for perceptron¶

In [13]:

# Create Digraph object
dot = Digraph()
dot.attr(rankdir='LR')

# Add nodes
for n in l_nodes:
    dot.node(n)        

# Add edges
for label, edges in d_edges.items(): 
    dot.edge(edges[0], edges[1], label=label)

# Fill node 1 by gray
dot.node('1', style='filled')
    
# Visualize the graph
dot

Out[13]:

Parallel Plot for Cateogrical and Continuous variable by Plotly Express

h1ros

Jul 22, 2019, 8:37:07 AM

Comments

Goal¶

This post aims to introduce how to draw Parallel Plot for categorical and continuous variable by Plotly Express

Reference

Libraries¶

In [19]:

import pandas as pd
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "png"

Create continuous data¶

In [4]:

df = px.data.election()
df.head()

Out[4]:

	district	Coderre	Bergeron	Joly	total	winner	result
0	101-Bois-de-Liesse	2481	1829	3024	7334	Joly	plurality
1	102-Cap-Saint-Jacques	2525	1163	2675	6363	Joly	plurality
2	11-Sault-au-Récollet	3348	2770	2532	8650	Coderre	plurality
3	111-Mile-End	1734	4782	2514	9030	Bergeron	majority
4	112-DeLorimier	1770	5933	3044	10747	Bergeron	majority

Visualize Parallel Plot for continuous data¶

In [20]:

fig = px.parallel_coordinates(df,color='total', color_continuous_scale=px.colors.sequential.Inferno)
fig

Create categorical data¶

In [9]:

df = px.data.election()
df.head()

Out[9]:

	district	Coderre	Bergeron	Joly	total	winner	result
0	101-Bois-de-Liesse	2481	1829	3024	7334	Joly	plurality
1	102-Cap-Saint-Jacques	2525	1163	2675	6363	Joly	plurality
2	11-Sault-au-Récollet	3348	2770	2532	8650	Coderre	plurality
3	111-Mile-End	1734	4782	2514	9030	Bergeron	majority
4	112-DeLorimier	1770	5933	3044	10747	Bergeron	majority

Visualize Parallel for continuous data¶

In [21]:

fig = px.parallel_categories(df, color="total", color_continuous_scale=px.colors.sequential.Inferno)
fig

Split Up: dtreeviz (Part 5)

h1ros

Jul 21, 2019, 1:40:22 AM

Comments

Goal¶

This post aims to break down the module dtreeviz module step by step to fully understand what is implemented. After fully understanding this, I would like to contribute to this module and submit a pull request.

I really like this module and would like to see this works for other tree-based modules like XGBoost or Lightgbm. I found the exact same issue (issues 15) in github so I hope I could contribute to this issue.

This post is the 5th part:

ctreeviz_univar

Reference

`trees.ctreeviz_univar`¶

L267: the beginning of the definition for ctreeviz_univar
L272-275: treatment for pandas input
L277: load color property
L280-288: load decision tree classifier object as shadow_tree and other relevant attributes e.g., # of class, target values.
L290-302: setting labels and spines visibility
L304-319: plotting stacked bar chart with histogram when gtype=='barstacked'
L320-330: plotting scatter plot with gitter
L332: setting tick parameters
L352-353: setting legend
L355-358: setting a title
L360-362: setting splits vertical line between categories

In [53]:

from pathlib import Path
from graphviz.backend import run, view
import matplotlib.pyplot as plt
from dtreeviz.shadow import *
from numbers import Number
import matplotlib.patches as patches
import tempfile
import os
from sys import platform as PLATFORM
from colour import Color, rgb2hex
from typing import Mapping, List
from dtreeviz.utils import inline_svg_images, myround
from dtreeviz.shadow import ShadowDecTree, ShadowDecTreeNode
from dtreeviz.colors import adjust_colors
from sklearn import tree
import graphviz

from dtreeviz.trees import *

# How many bins should we have based upon number of classes
NUM_BINS = [0, 0, 10, 9, 8, 6, 6, 6, 5, 5, 5]
          # 0, 1, 2,  3, 4, 5, 6, 7, 8, 9, 10

def ctreeviz_univar(ax, x_train, y_train, max_depth, feature_name, class_names,
                    target_name,
                    fontsize=14, fontname="Arial", nbins=25, gtype='strip',
                    show={'title','legend','splits'},
                    colors=None):
    if isinstance(x_train, pd.Series):
        x_train = x_train.values
    if isinstance(y_train, pd.Series):
        y_train = y_train.values

    colors = adjust_colors(colors)

    #    ax.set_facecolor('#F9F9F9')
    ct = tree.DecisionTreeClassifier(max_depth=max_depth)
    ct.fit(x_train.reshape(-1, 1), y_train)

    shadow_tree = ShadowDecTree(ct, x_train.reshape(-1, 1), y_train,
                                feature_names=[feature_name], class_names=class_names)

    n_classes = shadow_tree.nclasses()
    overall_feature_range = (np.min(x_train), np.max(x_train))
    class_values = shadow_tree.unique_target_values
    color_values = colors['classes'][n_classes]
    color_map = {v: color_values[i] for i, v in enumerate(class_values)}
    X_colors = [color_map[cl] for cl in class_values]

    ax.set_xlabel(f"{feature_name}", fontsize=fontsize, fontname=fontname,
                  color=colors['axis_label'])
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.yaxis.set_visible(False)
    ax.spines['left'].set_visible(False)
    ax.spines['bottom'].set_linewidth(.3)

    X_hist = [x_train[y_train == cl] for cl in class_values]

    if gtype == 'barstacked':
        bins = np.linspace(start=overall_feature_range[0], stop=overall_feature_range[1], num=nbins, endpoint=True)
        hist, bins, barcontainers = ax.hist(X_hist,
                                            color=X_colors,
                                            align='mid',
                                            histtype='barstacked',
                                            bins=bins,
                                            label=class_names)

        for patch in barcontainers:
            for rect in patch.patches:
                rect.set_linewidth(.5)
                rect.set_edgecolor(colors['edge'])
        ax.set_xlim(*overall_feature_range)
        ax.set_xticks(overall_feature_range)
        ax.set_yticks([0, max([max(h) for h in hist])])
    elif gtype == 'strip':
        # user should pass in short and wide fig
        sigma = .013
        mu = .08
        class_step = .08
        dot_w = 20
        ax.set_ylim(0, mu + n_classes*class_step)
        print('X_hist', X_hist)
        for i, bucket in enumerate(X_hist):
            y_noise = np.random.normal(mu+i*class_step, sigma, size=len(bucket))
            ax.scatter(bucket, y_noise, alpha=.7, marker='o', s=dot_w, c=color_map[i],
                       edgecolors=colors['scatter_edge'], lw=.3)

    ax.tick_params(axis='both', which='major', width=.3, labelcolor=colors['tick_label'],
                   labelsize=fontsize)

    splits = [node.split() for node in shadow_tree.internal]
    splits = sorted(splits)
    bins = [ax.get_xlim()[0]] + splits + [ax.get_xlim()[1]]

    pred_box_height = .07 * ax.get_ylim()[1]
    preds = []
    for i in range(len(bins) - 1):
        left = bins[i]
        right = bins[i + 1]
        inrange = y_train[(x_train >= left) & (x_train <= right)]
        values, counts = np.unique(inrange, return_counts=True)
        pred = values[np.argmax(counts)]
        rect = patches.Rectangle((left, 0), (right - left), pred_box_height, linewidth=.3,
                                 edgecolor=colors['edge'], facecolor=color_map[pred])
        ax.add_patch(rect)
        preds.append(pred)

    if 'legend' in show:
        add_classifier_legend(ax, class_names, class_values, color_map, target_name, colors)

    if 'title' in show:
        accur = ct.score(x_train.reshape(-1, 1), y_train)
        title = f"Classifier tree depth {max_depth}, training accuracy={accur*100:.2f}%"
        plt.title(title, fontsize=fontsize, color=colors['title'])

    if 'splits' in show:
        for split in splits:
            plt.plot([split, split], [*ax.get_ylim()], '--', color=colors['split_line'], linewidth=1)

Create a toy classification example¶

In [48]:

import numpy as np
import graphviz 
from sklearn import tree

X = np.array([0, 1, 0.5, 10, 11, 12, 20, 21, 22, 30, 30, 32]).reshape(-1, 1)
Y = np.array(['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd']).reshape(-1, 1)
clf = tree.DecisionTreeClassifier(max_depth=3)
clf = clf.fit(X, Y)

df = pd.DataFrame(data={'X':X.ravel(), 'Y': Y.ravel()}, index=range(len(X)))
df.plot(kind='bar');
plt.title('Sample Data for Univariate Regression');

Visualize classification tree for univariate case¶

In [54]:

fig, ax = plt.subplots(1)
ctreeviz_univar(ax, pd.Series(X.ravel()), pd.Series(Y.ravel()), 
                feature_name='X', 
                target_name='Y',
                max_depth=4, 
                class_names=['a', 'b', 'c', 'd'], 
                gtype = 'barstacked',
                show={'title', 'splits'}
               )

Note When I apply show={'legend'}, I obtained the error below and still not figured out yet what was wrong.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-42-c31e8b14db34> in <module>
      4                 target_name='Y',
      5                 max_depth=4,
----> 6                 class_names=['a', 'b', 'c', 'd']
      7                )

<ipython-input-41-b466a69d927c> in ctreeviz_univar(ax, x_train, y_train, max_depth, feature_name, class_names, target_name, fontsize, fontname, nbins, gtype, show, colors)
     85         for i, bucket in enumerate(X_hist):
     86             y_noise = np.random.normal(mu+i*class_step, sigma, size=len(bucket))
---> 87             ax.scatter(bucket, y_noise, alpha=.7, marker='o', s=dot_w, c=color_map[i],
     88                        edgecolors=colors['scatter_edge'], lw=.3)
     89 

KeyError: 0

Split Up: dtreeviz (Part 4)

h1ros

Jul 20, 2019, 1:51:36 PM

Comments

Goal¶

This post is the 4th part: breaking down DTreeViz class and rtreeviz_univar method.

Reference

`DTreeViz` class¶

L23: the beginning of DTreeViz class
L24-25: __init__ method taking dot object as an input
L26-78: deal with save, view the visualization as svg file

rtreeviz_univar¶

L81: the beginning of rtreeviz_univar method
L94-102: initial setting for the range of X, y data and converting them into numpy array.
L104-105: create a scikit-learn decision tree
L121-122: plot the original X and y data points
L125-126: plot the vertical line for decision boundary (gray line)
L128-134: plot the horizontal line for mean line (red line by default)
L136: Change the appearance of ticks
L138-140: setting title
L142-143: setting x and y label based on feature_name and target_name

In [4]:

from pathlib import Path
from graphviz.backend import run, view
import matplotlib.pyplot as plt
from dtreeviz.shadow import *
from numbers import Number
import matplotlib.patches as patches
import tempfile
import os
from sys import platform as PLATFORM
from colour import Color, rgb2hex
from typing import Mapping, List
from dtreeviz.utils import inline_svg_images, myround
from dtreeviz.shadow import ShadowDecTree, ShadowDecTreeNode
from dtreeviz.colors import adjust_colors
from sklearn import tree
import graphviz

# How many bins should we have based upon number of classes
NUM_BINS = [0, 0, 10, 9, 8, 6, 6, 6, 5, 5, 5]
          # 0, 1, 2,  3, 4, 5, 6, 7, 8, 9, 10

def rtreeviz_univar(ax,
                    x_train: (pd.Series, np.ndarray),  # 1 vector of X data
                    y_train: (pd.Series, np.ndarray),
                    max_depth = 10,
                    feature_name: str = None,
                    target_name: str = None,
                    min_samples_leaf = 1,
                    fontsize: int = 14,
                    show={'title','splits'},
                    split_linewidth=.5,
                    mean_linewidth = 2,
                    markersize=None,
                    colors=None):
    if isinstance(x_train, pd.Series):
        x_train = x_train.values
    if isinstance(y_train, pd.Series):
        y_train = y_train.values

    colors = adjust_colors(colors)

    y_range = (min(y_train), max(y_train))  # same y axis for all
    overall_feature_range = (np.min(x_train), np.max(x_train))

    t = tree.DecisionTreeRegressor(max_depth=max_depth, min_samples_leaf=min_samples_leaf)
    t.fit(x_train.reshape(-1,1), y_train)

    shadow_tree = ShadowDecTree(t, x_train.reshape(-1,1), y_train, feature_names=[feature_name])
    splits = []
    for node in shadow_tree.internal:
        splits.append(node.split())
    splits = sorted(splits)
    bins = [overall_feature_range[0]] + splits + [overall_feature_range[1]]

    means = []
    for i in range(len(bins) - 1):
        left = bins[i]
        right = bins[i + 1]
        inrange = y_train[(x_train >= left) & (x_train <= right)]
        means.append(np.mean(inrange))

    ax.scatter(x_train, y_train, marker='o', alpha=.4, c=colors['scatter_marker'], s=markersize,
               edgecolor=colors['scatter_edge'], lw=.3)

    if 'splits' in show:
        for split in splits:
            ax.plot([split, split], [*y_range], '--', color=colors['split_line'], linewidth=split_linewidth)

        prevX = overall_feature_range[0]
        for i, m in enumerate(means):
            split = overall_feature_range[1]
            if i < len(splits):
                split = splits[i]
            ax.plot([prevX, split], [m, m], '-', color=colors['mean_line'], linewidth=mean_linewidth)
            prevX = split

    ax.tick_params(axis='both', which='major', width=.3, labelcolor=colors['tick_label'], labelsize=fontsize)

    if 'title' in show:
        title = f"Regression tree depth {max_depth}, samples per leaf {min_samples_leaf},\nTraining $R^2$={t.score(x_train.reshape(-1,1),y_train):.3f}"
        plt.title(title, fontsize=fontsize, color=colors['title'])

    plt.xlabel(feature_name, fontsize=fontsize, color=colors['axis_label'])
    plt.ylabel(target_name, fontsize=fontsize, color=colors['axis_label'])

Create a toy sample¶

In [42]:

import numpy as np
import graphviz 
from sklearn import tree

X = np.array([0, 1, 0.5, 10, 11, 12, 20, 21, 22, 30, 30, 32]).reshape(-1, 1)
Y = np.array([0., 0, 0, 50, 49, 50, 20, 21, 19, 90, 89, 91]).reshape(-1, 1)
clf = tree.DecisionTreeRegressor(max_depth=3)
clf = clf.fit(X, Y)

plt.scatter(x=X, y=Y, s=5);
plt.title('Sample Data for Univariate Regression');

Visualize a tree using `rtreeviz_univar`¶

In [51]:

fig, ax = plt.subplots(1)
rtreeviz_univar(ax, pd.Series(X.ravel()), pd.Series(Y.ravel()), 
                feature_name='X', 
                target_name='Y',
                markersize=15)

Split Up: dtreeviz (Part 3)

h1ros

Jul 18, 2019, 8:17:01 AM

Comments

Goal¶

This post is the 3rd part: breaking down ShadowDecTree.

Reference

`ShadowDecTreeNode` class¶

Source github ¶

In [2]:

import numpy as np
import pandas as pd
from collections import defaultdict, Sequence
from typing import Mapping, List, Tuple
from numbers import Number
from sklearn.utils import compute_class_weight

from dtreeviz.shadow import ShadowDecTree 
# skip ShadowDecTree Class
#

class ShadowDecTreeNode:
    """
    A node in a shadow tree.  Each node has left and right
    pointers to child nodes, if any.  As part of tree construction process, the
    samples examined at each decision node or at each leaf node are
    saved into field node_samples.
    """
    def __init__(self, shadow_tree, id, left=None, right=None):
        self.shadow_tree = shadow_tree
        self.id = id
        self.left = left
        self.right = right

    def split(self) -> (int,float):
        return self.shadow_tree.tree_model.tree_.threshold[self.id]

    def feature(self) -> int:
        return self.shadow_tree.tree_model.tree_.feature[self.id]

    def feature_name(self) -> (str,None):
        if self.shadow_tree.feature_names is not None:
            return self.shadow_tree.feature_names[ self.feature()]
        return None

    def samples(self) -> List[int]:
        """
        Return a list of sample indexes associated with this node. If this is a
        leaf node, it indicates the samples used to compute the predicted value
        or class.  If this is an internal node, it is the number of samples used
        to compute the split point.
        """
        return self.shadow_tree.node_to_samples[self.id]

    def nsamples(self) -> int:
        """
        Return the number of samples associated with this node. If this is a
        leaf node, it indicates the samples used to compute the predicted value
        or class. If this is an internal node, it is the number of samples used
        to compute the split point.
        """
        return self.shadow_tree.tree_model.tree_.n_node_samples[self.id] # same as len(self.node_samples)

    def split_samples(self) -> Tuple[np.ndarray, np.ndarray]:
        """
        Return the list of indexes to the left and the right of the split value.
        """
        samples = np.array(self.samples())
        node_X_data = self.shadow_tree.X_train[samples, self.feature()]
        split = self.split()
        left = np.nonzero(node_X_data < split)[0]
        right = np.nonzero(node_X_data >= split)[0]
        return left, right

    def isleaf(self) -> bool:
        return self.left is None and self.right is None

    def isclassifier(self):
        return self.shadow_tree.tree_model.tree_.n_classes > 1

    def prediction(self) -> (Number,None):
        """
        If this is a leaf node, return the predicted continuous value, if this is a
        regressor, or the class number, if this is a classifier.
        """
        if not self.isleaf(): return None
        if self.isclassifier():
            counts = np.array(self.shadow_tree.tree_model.tree_.value[self.id][0])
            predicted_class = np.argmax(counts)
            return predicted_class
        else:
            return self.shadow_tree.tree_model.tree_.value[self.id][0][0]

    def prediction_name(self) -> (str,None):
        """
        If the tree model is a classifier and we know the class names,
        return the class name associated with the prediction for this leaf node.
        Return prediction class or value otherwise.
        """
        if self.isclassifier():
            if self.shadow_tree.class_names is not None:
                return self.shadow_tree.class_names[self.prediction()]
        return self.prediction()

    def class_counts(self) -> (List[int],None):
        """
        If this tree model is a classifier, return a list with the count
        associated with each class.
        """
        if self.isclassifier():
            if self.shadow_tree.class_weight is None:
                return np.array(np.round(self.shadow_tree.tree_model.tree_.value[self.id][0]), dtype=int)
            else:
                return np.round(self.shadow_tree.tree_model.tree_.value[self.id][0]/self.shadow_tree.class_weights).astype(int)
        return None

    def __str__(self):
        if self.left is None and self.right is None:
            return "<pred={value},n={n}>".format(value=round(self.prediction(),1), n=self.nsamples())
        else:
            return "({f}@{s} {left} {right})".format(f=self.feature_name(),
                                                     s=round(self.split(),1),
                                                     left=self.left if self.left is not None else '',
                                                     right=self.right if self.right is not None else '')

Instantiate class objects¶

Create a tree model by scikit learn¶

In [3]:

import numpy as np
import graphviz 
from sklearn import tree

X = np.array([[0, 0], [1, 1]])
Y = np.array([0, 1])
# Y = [0, 1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)
dot_data = tree.export_graphviz(clf, out_file=None, 
                     feature_names=[0, 1],  
                     class_names=['0', '1'],  
                     filled=True, rounded=True,  
                     special_characters=True)  
graph = graphviz.Source(dot_data)  
graph

Out[3]:

Create a `ShadowDecTreeNode`¶

ShadowDecTreeNode __init__

L222-226: store input arguments as class members
L228-308: define the same functions in tree objects like split, feature etc. or utility functions

In [4]:

# instantiate ShadowDecTree
shadow_tree = ShadowDecTree(tree_model=clf, X_train=X, y_train=Y, feature_names=[0, 1], class_names=[0, 1])

In [5]:

# instantiate ShadowDecTreeNode
shadow_tree_node0 = ShadowDecTreeNode(shadow_tree=shadow_tree, id=0)
shadow_tree_node0

Out[5]:

<__main__.ShadowDecTreeNode at 0x120eda908>

Methods under ``ShadowTreeDecNode¶

In [6]:

# L228 split
shadow_tree_node0.split()

Out[6]:

0.5

In [7]:

# L231 feature
shadow_tree_node0.feature()

Out[7]:

In [8]:

# L239 samples
shadow_tree_node0.samples()

Out[8]:

[0, 1]

In [9]:

# L248 nsamples
shadow_tree_node0.nsamples()

Out[9]:

In [10]:

# L257 split_samples
shadow_tree_node0.split_samples()

Out[10]:

(array([0]), array([1]))

In [11]:

# L268 isleaf
shadow_tree_node0.isleaf()

Out[11]:

True

In [12]:

# L271 isclassifier
shadow_tree_node0.isclassifier()

Out[12]:

array([ True])

In [13]:

# L287 prediction_name
shadow_tree_node0.prediction_name()

Out[13]:

In [14]:

# L298 class_counts
shadow_tree_node0.class_counts()

Out[14]:

array([1, 1])

Visualization Samples by Plotly Express

h1ros

Jul 17, 2019, 5:21:37 PM

Comments

Goal¶

This post aims to introduce examples of visualization by Plotly Express.

The followings are introduced:

Prepared example data
Scatter plot
- basic
- basic + size
- basic + size + color
- basic + size + color + time
- heatmap + histogram

2019-07-19 21-54-58 2019-07-19 21_56_00_gapminder_plotly_express

Reference

Libraries¶

In [8]:

import pandas as pd
import numpy as np
import plotly_express as px
import plotly.io as pio
pio.renderers.default = "png"

Load a prepared data¶

In [2]:

df = px.data.carshare()
df.head()

Out[2]:

	centroid_lat	centroid_lon	car_hours	peak_hour
0	45.471549	-73.588684	1772.750000	2
1	45.543865	-73.562456	986.333333	23
2	45.487640	-73.642767	354.750000	20
3	45.522870	-73.595677	560.166667	23
4	45.453971	-73.738946	2836.666667	19

Tips¶

In [3]:

df = px.data.tips()
df.head()

Out[3]:

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4

Election¶

In [4]:

df = px.data.election()
df.head()

Out[4]:

	district	Coderre	Bergeron	Joly	total	winner	result
0	101-Bois-de-Liesse	2481	1829	3024	7334	Joly	plurality
1	102-Cap-Saint-Jacques	2525	1163	2675	6363	Joly	plurality
2	11-Sault-au-Récollet	3348	2770	2532	8650	Coderre	plurality
3	111-Mile-End	1734	4782	2514	9030	Bergeron	majority
4	112-DeLorimier	1770	5933	3044	10747	Bergeron	majority

Wind¶

In [5]:

df = px.data.wind()
df.head()

Out[5]:

	direction	strength	frequency
0	N	0-1	0.5
1	NNE	0-1	0.6
2	NE	0-1	0.5
3	ENE	0-1	0.4
4	E	0-1	0.4

Gap Minder¶

In [6]:

df = px.data.gapminder()
df.head()

Out[6]:

	country	continent	year	lifeExp	pop	gdpPercap	iso_alpha	iso_num
0	Afghanistan	Asia	1952	28.801	8425333	779.445314	AFG	4
1	Afghanistan	Asia	1957	30.332	9240934	820.853030	AFG	4
2	Afghanistan	Asia	1962	31.997	10267083	853.100710	AFG	4
3	Afghanistan	Asia	1967	34.020	11537966	836.197138	AFG	4
4	Afghanistan	Asia	1972	36.088	13079460	739.981106	AFG	4

Scatter Plot¶

Basic Scatter plot¶

In [10]:

px.scatter(df, x='gdpPercap', y='lifeExp', width=900, height=400)

Scatter plot + size¶

In [11]:

px.scatter(df, x='gdpPercap', y='lifeExp', size='pop', width=900, height=400)

Scatter plot + size + color¶

In [12]:

px.scatter(df, x='gdpPercap', y='lifeExp', size='pop', color='country', width=900, height=400)

Scatter plot + size + color + time¶

In [13]:

px.scatter(df, x='gdpPercap', y='lifeExp', size='pop', color='country', animation_frame='year', width=900, height=400)

In [14]:

px.density_heatmap(df, x="gdpPercap", y="lifeExp", marginal_y="histogram", marginal_x="histogram")

Split Up: dtreevis (Part 2)

h1ros

Jul 17, 2019, 5:51:04 AM

Comments

Goal¶

This post is the 2nd part of the process of breaking down ShadowDecTree.

Reference

`ShadowDecTree` class¶

Source github ¶

In [109]:

import numpy as np
import pandas as pd
from collections import defaultdict, Sequence
from typing import Mapping, List, Tuple
from numbers import Number
from sklearn.utils import compute_class_weight
from dtreeviz.shadow import ShadowDecTreeNode 


class ShadowDecTree:
    """
    The decision trees for classifiers and regressors from scikit-learn
    are built for efficiency, not ease of tree walking. This class
    is intended as a way to wrap all of that information in an easy to use
    package.
    This tree shadows a decision tree as constructed by scikit-learn's
    DecisionTree(Regressor|Classifier).  As part of build process, the
    samples considered at each decision node or at each leaf node are
    saved as a big dictionary for use by the nodes.
    Field leaves is list of shadow leaf nodes. Field internal is list of
    shadow non-leaf nodes.
    Field root is the shadow tree root.
    Parameters
    ----------
    class_names : (List[str],Mapping[int,str]). A mapping from target value
                  to target class name. If you pass in a list of strings,
                  target value i must be associated with class name[i]. You
                  can also pass in a dict that maps value to name.
    """
    def __init__(self, tree_model,
                 X_train,
                 y_train,
                 feature_names : List[str],
                 class_names : (List[str],Mapping[int,str])=None):
        self.tree_model = tree_model
        self.feature_names = feature_names
        self.class_names = class_names
        self.class_weight = tree_model.class_weight

        if getattr(tree_model, 'tree_') is None: # make sure model is fit
            tree_model.fit(X_train, y_train)

        if tree_model.tree_.n_classes > 1:
            if isinstance(self.class_names, dict):
                self.class_names = self.class_names
            elif isinstance(self.class_names, Sequence):
                self.class_names = {i:n for i, n in enumerate(self.class_names)}
            else:
                raise Exception(f"class_names must be dict or sequence, not {self.class_names.__class__.__name__}")

        if isinstance(X_train, pd.DataFrame):
            X_train = X_train.values
        self.X_train = X_train
        if isinstance(y_train, pd.Series):
            y_train = y_train.values
        self.y_train = y_train
        self.node_to_samples = ShadowDecTree.node_samples(tree_model, X_train)
        if self.isclassifier():
            self.unique_target_values = np.unique(y_train)
            self.class_weights = compute_class_weight(tree_model.class_weight, self.unique_target_values, self.y_train)

        tree = tree_model.tree_
        children_left = tree.children_left
        children_right = tree.children_right

        # use locals not args to walk() for recursion speed in python
        leaves = []
        internal = [] # non-leaf nodes

        def walk(node_id):
            if (children_left[node_id] == -1 and children_right[node_id] == -1):  # leaf
                t = ShadowDecTreeNode(self, node_id)
                leaves.append(t)
                return t
            else:  # decision node
                left = walk(children_left[node_id])
                right = walk(children_right[node_id])
                t = ShadowDecTreeNode(self, node_id, left, right)
                internal.append(t)
                return t

        root_node_id = 0
        # record root to actual shadow nodes
        self.root = walk(root_node_id)
        self.leaves = leaves
        self.internal = internal

    def nclasses(self):
        return self.tree_model.tree_.n_classes[0]

    def nnodes(self) -> int:
        "Return total nodes in the tree"
        return self.tree_model.tree_.node_count

    def leaf_sample_counts(self) -> List[int]:
        return [self.tree_model.tree_.n_node_samples[leaf.id] for leaf in self.leaves]

    def isclassifier(self):
        return self.tree_model.tree_.n_classes > 1

    def get_split_node_heights(self, X_train, y_train, nbins) -> Mapping[int,int]:
        class_values = self.unique_target_values
        node_heights = {}
        # print(f"Goal {nbins} bins")
        for node in self.internal:
            # print(node.feature_name(), node.id)
            X_feature = X_train[:, node.feature()]
            overall_feature_range = (np.min(X_feature), np.max(X_feature))
            # print(f"range {overall_feature_range}")
            r = overall_feature_range[1] - overall_feature_range[0]

            bins = np.linspace(overall_feature_range[0],
                               overall_feature_range[1], nbins+1)
            # bins = np.arange(overall_feature_range[0],
            #                  overall_feature_range[1] + binwidth, binwidth)
            # print(f"\tlen(bins)={len(bins):2d} bins={bins}")
            X, y = X_feature[node.samples()], y_train[node.samples()]
            X_hist = [X[y == cl] for cl in class_values]
            height_of_bins = np.zeros(nbins)
            for cl in class_values:
                hist, foo = np.histogram(X_hist[cl], bins=bins, range=overall_feature_range)
                # print(f"class {cl}: goal_n={len(bins):2d} n={len(hist):2d} {hist}")
                height_of_bins += hist
            node_heights[node.id] = np.max(height_of_bins)

            # print(f"\tmax={np.max(height_of_bins):2.0f}, heights={list(height_of_bins)}, {len(height_of_bins)} bins")
        return node_heights

    def predict(self, x : np.ndarray) -> Tuple[Number,List]:
        """
        Given an x-vector of features, return predicted class or value based upon
        this tree. Also return path from root to leaf as 2nd value in return tuple.
        Recursively walk down tree from root to appropriate leaf by
        comparing feature in x to node's split value. Also return
        :param x: Feature vector to run down the tree to a leaf.
        :type x: np.ndarray
        :return: Predicted class or value based
        :rtype: Number
        """
        def walk(t, x, path):
            if t is None:
                return None
            path.append(t)
            if t.isleaf():
                return t
            if x[t.feature()] < t.split():
                return walk(t.left, x, path)
            return walk(t.right, x, path)

        path = []
        leaf = walk(self.root, x, path)
        return leaf.prediction(), path

    def tesselation(self):
        """
        Walk tree and return list of tuples containing a leaf node and bounding box
        list of (x1,y1,x2,y2) coordinates
        :return:
        :rtype:
        """
        bboxes = []

        def walk(t, bbox):
            if t is None:
                return None
            # print(f"Node {t.id} bbox {bbox} {'   LEAF' if t.isleaf() else ''}")
            if t.isleaf():
                bboxes.append((t, bbox))
                return t
            # shrink bbox for left, right and recurse
            s = t.split()
            if t.feature()==0:
                walk(t.left,  (bbox[0],bbox[1],s,bbox[3]))
                walk(t.right, (s,bbox[1],bbox[2],bbox[3]))
            else:
                walk(t.left,  (bbox[0],bbox[1],bbox[2],s))
                walk(t.right, (bbox[0],s,bbox[2],bbox[3]))

        # create bounding box in feature space (not zeroed)
        f1_values = self.X_train[:, 0]
        f2_values = self.X_train[:, 1]
        overall_bbox = (np.min(f1_values), np.min(f2_values), # x,y of lower left edge
                        np.max(f1_values), np.max(f2_values)) # x,y of upper right edge
        walk(self.root, overall_bbox)

        return bboxes

    @staticmethod
    def node_samples(tree_model, data) -> Mapping[int, list]:
        """
        Return dictionary mapping node id to list of sample indexes considered by
        the feature/split decision.
        """
        # Doc say: "Return a node indicator matrix where non zero elements
        #           indicates that the samples goes through the nodes."
        dec_paths = tree_model.decision_path(data)

        # each sample has path taken down tree
        node_to_samples = defaultdict(list)
        for sample_i, dec in enumerate(dec_paths):
            _, nz_nodes = dec.nonzero()
            for node_id in nz_nodes:
                node_to_samples[node_id].append(sample_i)

        return node_to_samples

    def __str__(self):
        return str(self.root)

Instantiate class objects¶

Create a tree model by scikit learn¶

In [93]:

import numpy as np
import graphviz 
from sklearn import tree

X = np.array([[0, 0], [1, 1]])
Y = np.array([0, 1])
# Y = [0, 1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)
dot_data = tree.export_graphviz(clf, out_file=None, 
                     feature_names=[0, 1],  
                     class_names=['0', '1'],  
                     filled=True, rounded=True,  
                     special_characters=True)  
graph = graphviz.Source(dot_data)  
graph

Out[93]:

Create a `ShadowDecTree`¶

ShadowDecTree __init__

L33-41: define __initi__ with 5 input arguments.
L38-41: store the input arguments as a class member
L43-44: check if the trained model exists in tree_model, and if not, it enforces to train the tree model.
L46-52: treatment for multi label classification
L54-59: treatment for pandas if pandas.DataFrame is used for X_train and y_train. Convert them into np.array
L60: a static method node_samples in ShadowDecTree to create a map from node id in tree_model to list of sample indices.
L61-63: treatment for target values and class weights if tree_model is for classification
L65-71: preparation for re-organizing tree object into the one for dtreeviz
L73-83: define the recursive function to walk through nodes by post order traversal through Depth-First Search (DFS) algorithm.
L85-89: execute walk method from the root node. Store a list of end nodes as leaves and a list of intermediate nodes as internal.

In [94]:

# instantiate ShadowDecTree
shadow_tree = ShadowDecTree(tree_model=clf, X_train=X, y_train=Y, feature_names=[0, 1], class_names=[0, 1])

In [95]:

# A root node
shadow_tree.root

Out[95]:

<__main__.ShadowDecTreeNode at 0x1216dd4a8>

In [96]:

# A list of end nodes 
shadow_tree.leaves

Out[96]:

[<__main__.ShadowDecTreeNode at 0x121696470>,
 <__main__.ShadowDecTreeNode at 0x1216dd940>]

In [97]:

# A list of internal nodes
shadow_tree.internal

Out[97]:

[<__main__.ShadowDecTreeNode at 0x1216dd4a8>]

In [98]:

# A mapping from node id to sample id
shadow_tree.node_to_samples

Out[98]:

defaultdict(list, {0: [0, 1], 1: [0], 2: [1]})

Other methods for `ShadowDecTree`¶

In [99]:

# L91 nclasses
shadow_tree.nclasses()

Out[99]:

In [100]:

# L94 nnodes
shadow_tree.nnodes()

Out[100]:

In [101]:

# L98 leaf_sample_counts
shadow_tree.leaf_sample_counts()

Out[101]:

[1, 1]

In [102]:

# L101 isclassifier
shadow_tree.isclassifier()

Out[102]:

array([ True])

In [103]:

# L104 get_split_node_heights
nbins = 2
shadow_tree.get_split_node_heights(X_train=X, y_train=Y, nbins=nbins)

Out[103]:

{0: 1.0}

In [104]:

print(f"shadow_tree.internal[0].feature(): {shadow_tree.internal[0].feature()}")
X[:, shadow_tree.internal[0].feature()]

shadow_tree.internal[0].feature(): 1

Out[104]:

array([0, 1])

In [105]:

# L132 predict
shadow_tree.predict(np.array([0, 0.5]))

Out[105]:

(1,
 [<__main__.ShadowDecTreeNode at 0x1216dd4a8>,
  <__main__.ShadowDecTreeNode at 0x1216dd940>])

In [106]:

# L158 tesselation
shadow_tree.tesselation()

Out[106]:

[(<__main__.ShadowDecTreeNode at 0x121696470>, (0, 0, 1, 0.5)),
 (<__main__.ShadowDecTreeNode at 0x1216dd940>, (0, 0.5, 1, 1))]

Opportunity to contribute¶

Through line-by-line execution, I found the following opportunities I could potentially contribute to.

Add documentation for each methods
Add validation if it is np.array or not for X_train and y_train since when I pass the list as X_train and y_train, I got the error for get_split_node_heights and tesselation like below:

Feature Importance

h1ros

Jun 25, 2019, 5:56:54 AM

Comments

Goal¶

This post aims to introduce how to obtain feature importance using random forest and visualize it in a different format

Reference

Libraries¶

In [29]:

import pandas as pd
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestRegressor
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Configuration¶

In [69]:

import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (16, 6)

Load data¶

In [3]:

boston = load_boston()

df_boston = pd.DataFrame(data=boston.data, columns=boston.feature_names)
df_boston.head()

Out[3]:

	CRIM	ZN	INDUS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT
0	0.00632	18.0	2.31	0.538	6.575	65.2	4.0900	1.0	296.0	15.3	396.90	4.98
1	0.02731	0.0	7.07	0.469	6.421	78.9	4.9671	2.0	242.0	17.8	396.90	9.14
2	0.02729	0.0	7.07	0.469	7.185	61.1	4.9671	2.0	242.0	17.8	392.83	4.03
3	0.03237	0.0	2.18	0.458	6.998	45.8	6.0622	3.0	222.0	18.7	394.63	2.94
4	0.06905	0.0	2.18	0.458	7.147	54.2	6.0622	3.0	222.0	18.7	396.90	5.33

Train a Random Forest Regressor¶

In [56]:

reg = RandomForestRegressor(n_estimators=50)
reg.fit(df_boston, boston.target)

Out[56]:

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=50, n_jobs=None,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

Obtain feature importance¶

average feature importance¶

In [70]:

df_feature_importance = pd.DataFrame(reg.feature_importances_, index=boston.feature_names, columns=['feature importance']).sort_values('feature importance', ascending=False)
df_feature_importance

Out[70]:

	feature importance
RM	0.434691
LSTAT	0.362675
DIS	0.065282
CRIM	0.048311
NOX	0.024685
PTRATIO	0.018163
TAX	0.012388
AGE	0.011825
B	0.010220
INDUS	0.006348
RAD	0.002961
ZN	0.001503
CHAS	0.000950

all feature importance for each tree¶

In [58]:

df_feature_all = pd.DataFrame([tree.feature_importances_ for tree in reg.estimators_], columns=boston.feature_names)
df_feature_all.head()

Out[58]:

	CRIM	ZN	INDUS	CHAS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT
0	0.014397	0.000270	0.000067	0.001098	0.030470	0.160704	0.005805	0.040896	0.000915	0.009357	0.006712	0.008223	0.721085
1	0.027748	0.000151	0.004632	0.000844	0.079595	0.290730	0.020392	0.055907	0.012544	0.011589	0.018765	0.006700	0.470404
2	0.082172	0.000353	0.003930	0.002729	0.009873	0.182772	0.009487	0.053868	0.002023	0.014475	0.025605	0.004799	0.607914
3	0.020085	0.000592	0.006886	0.001462	0.016882	0.290993	0.007097	0.074538	0.001960	0.003679	0.012879	0.011265	0.551682
4	0.012873	0.001554	0.003002	0.000521	0.013372	0.251145	0.010757	0.110498	0.002889	0.007838	0.009357	0.027501	0.548694

In [97]:

# Melted data i.e., long format
df_feature_long = pd.melt(df_feature_all,var_name='feature name', value_name='values')

Visualize feature importance¶

The feature importance is visualized in the following format:

Bar chart
Box Plot
Strip Plot
Swarm Plot
~~Factor plot~~

Bar chart¶

In [71]:

df_feature_importance.plot(kind='bar');

Box plot¶

In [98]:

sns.boxplot(x="feature name", y="values", data=df_feature_long, order=df_feature_importance.index);

Strip Plot¶

In [99]:

sns.stripplot(x="feature name", y="values", data=df_feature_long, order=df_feature_importance.index);

Swarm plot¶

In [78]:

sns.swarmplot(x="feature name", y="values", data=df_feature_long, order=df_feature_importance.index);

All¶

In [108]:

fig, axes = plt.subplots(4, 1, figsize=(16, 8))
df_feature_importance.plot(kind='bar', ax=axes[0], title='Plots Comparison for Feature Importance');
sns.boxplot(ax=axes[1], x="feature name", y="values", data=df_feature_long, order=df_feature_importance.index);
sns.stripplot(ax=axes[2], x="feature name", y="values", data=df_feature_long, order=df_feature_importance.index);
sns.swarmplot(ax=axes[3], x="feature name", y="values", data=df_feature_long, order=df_feature_importance.index);
plt.tight_layout()

Getting real-time stock market data and visualization

h1ros

Jun 19, 2019, 6:34:00 AM

Comments

Goal¶

This post aims to introduce how to get real time stock market data using Yahoo finance API yahoo_fin and visualize it as candle chart using cufflinks.

Reference

Goal¶

Libraries¶

Unknown Function¶

Bayesian Optimization¶

Helper functions¶

Visualize the iterative step¶

Goal¶

Libraries¶

Create a node list and dictionary for the edges¶

Visualize a graph for perceptron¶

Goal¶

Libraries¶

Create continuous data¶

Visualize Parallel Plot for continuous data¶

Create categorical data¶

Visualize Parallel for continuous data¶

Goal¶

trees.ctreeviz_univar¶

Create a toy classification example¶

Visualize classification tree for univariate case¶

Goal¶

DTreeViz class¶

rtreeviz_univar¶

Create a toy sample¶

Visualize a tree using rtreeviz_univar¶

Goal¶

ShadowDecTreeNode class¶

Source github¶

Instantiate class objects¶

Create a tree model by scikit learn¶

Create a ShadowDecTreeNode¶

Methods under ``ShadowTreeDecNode¶

Goal¶

Libraries¶

Load a prepared data¶

Car Share¶

Tips¶

Election¶

Wind¶

Gap Minder¶

Scatter Plot¶

Basic Scatter plot¶

Scatter plot + size¶

Scatter plot + size + color¶

Scatter plot + size + color + time¶

Goal¶

ShadowDecTree class¶

Source github¶

Instantiate class objects¶

Create a tree model by scikit learn¶

Create a ShadowDecTree¶

Other methods for ShadowDecTree¶

Opportunity to contribute¶

Goal¶

Libraries¶

Configuration¶

Load data¶

Train a Random Forest Regressor¶

Obtain feature importance¶

average feature importance¶

all feature importance for each tree¶

Visualize feature importance¶

Bar chart¶

Box plot¶

Strip Plot¶

Swarm plot¶

All¶

Goal¶

`trees.ctreeviz_univar`¶

`DTreeViz` class¶

Visualize a tree using `rtreeviz_univar`¶

`ShadowDecTreeNode` class¶

Source github ¶

Create a `ShadowDecTreeNode`¶

`ShadowDecTree` class¶

Source github ¶

Create a `ShadowDecTree`¶

Other methods for `ShadowDecTree`¶