Data Exploration Tool - Lantern Part 1

Overview

Lantern is a python module for a toolkit collection for data exploration from a variety of dataset to visualization.

In this post, I will walk through the followings:

  • How to set up lantern
  • What lantern can do
    • dataset
    • plot (visualization)
    • grid (interactive table view)
    • widget

How to set up Lantern

In [90]:
# !pip install pylantern 
# !jupyter labextension install pylantern # for jupyter lab

Dataset

The available dataset (as of Mar 2019) are as follows:

Dummy data from Mimesis

  • person
  • people (multiple records of person)
  • company
  • companies (multiple records of company)
  • ticker
  • currency
  • trade
  • superstore

Simply test visulization

  • line
  • bar
  • scatter
In [30]:
import lantern as l
import matplotlib.pyplot as plt
%matplotlib inline

import cufflinks as cf
from plotly.offline import plot, download_plotlyjs, init_notebook_mode

cf.go_offline()
init_notebook_mode()

Person

In [4]:
# people from Mimesis - Fake Data Generator 
l.person()
Out[4]:
{'first_name': 'Sanora',
 'last_name': 'Hogan',
 'name': 'Sanora Hogan',
 'age': 45,
 'gender': 'Female',
 'id': '16-29/08',
 'occupation': 'Fireman',
 'telephone': '570-172-2629',
 'title': 'Mrs.',
 'username': 'centrarchidae_1975',
 'university': 'Framingham State University'}
In [5]:
# multiple records of person with locale
l.people(count=5, locale='en')
Out[5]:
age first_name gender id last_name name occupation telephone title university username
0 47 Lucio Male 40-95/92 Ellis Lucio Ellis Barrister 1-595-658-7527 Master University of Louisville (Louisville, U of L, ... brockman.1839
1 22 Rickie Female 67-08/77 Frazier Rickie Frazier Property Dealer 083-799-7676 M.D. University of North Georgia (UNG) Artaba.1906
2 18 Oneida Female 77-63/62 Rojas Oneida Rojas Furniture Restorer (119) 801-1945 Mrs. University of Kentucky (Kentucky or UK) ImproveCuttlefish.2048
3 32 Benton Male 11-73/43 Caldwell Benton Caldwell Shipyard Worker 710-859-8365 B.Sc Clayton State University armado-1969
4 24 Melia Female 25-05/28 Winters Melia Winters Airport Controller 221.989.3939 Madam University of California, Berkeley (UC Berkeley) PintaPuma_1945
In [6]:
# Visualize people
people = l.people(count=50, locale='en')
people['gender'].value_counts().plot(kind='bar');
In [7]:
people['age'].hist();
In [8]:
people['occupation'].value_counts().plot(kind='bar');
In [9]:
people['university'].value_counts().plot(kind='bar');

Company

In [10]:
# company
l.company()
Out[10]:
{'name': 'Thomas, Jones and Leon',
 'address': '8786 Paul Route\nEast Jamesbury, IA 65487',
 'ticker': 'OTAZ',
 'last_price': 22.960503127702392,
 'market_cap': 73875560603,
 'exchange': 'AR',
 'ceo': 'Shelia Wilkins DDS',
 'sector': 'Consumer Staples',
 'industry': 'Household Products'}
In [11]:
# Multiple companies
l.companies(count=5)
Out[11]:
address ceo exchange industry last_price market_cap name sector ticker
0 14040 Padilla Summit Suite 594\nSouth Erikaber... Alexander Myers C Diversified Consumer Services 34.442286 21873402760 Green, Green and Sosa Consumer Discretionary ZTF
1 250 William Unions Suite 704\nJamesside, GA 28548 Doris Lopez Y Containers & Packaging 71.158993 96542209146 Murillo, Clark and Allen Materials VIU
2 07471 Mcneil Parkways\nLake Kevin, MT 61033 Julie Rangel O Household Products 34.334247 29289469664 Lam-Johnson Consumer Staples AJM
3 2352 Debra Green\nLake Yvetteton, AZ 29457 Carolyn Davis C Automobiles 22.537401 18986325709 Castillo, Roberts and Lloyd Consumer Discretionary TNA
4 04793 Amanda Squares\nDoyleburgh, RI 14976 James Mayer BV Software 54.470216 90189492463 Torres, Washington and Clark Information Technology WQVN
In [12]:
# Visualize comapanies 
companies = l.companies(count=50)
companies.columns.values
Out[12]:
array(['address', 'ceo', 'exchange', 'industry', 'last_price',
       'market_cap', 'name', 'sector', 'ticker'], dtype=object)
In [13]:
companies['exchange'].value_counts().plot(kind='bar');
In [14]:
companies['industry'].value_counts().plot(kind='bar');

Financial

In [15]:
[l.ticker(country='us') for i in range(10)]
Out[15]:
['579131.A',
 'CGAP.X',
 '287928.I',
 'ACN.R',
 '8472.WD',
 'ZTH.R',
 '2342.PR',
 'HDCV.N',
 '112469.HA',
 'NAJY.U']
In [16]:
[l.currency() for i in range(10)]
Out[16]:
['PEN', 'BHD', 'RWF', 'SEK', 'SLL', 'MXN', 'MDL', 'CUP', 'JOD', 'UGX']
In [17]:
l.trades(count=5)
Out[17]:
exchange industry last_price market_cap name price sector ticker volume
0 X Real Estate Management & Development 23.800173 30254310593 Richardson, White and Krueger 18.879295 Real Estate GCE 510
1 YG Communications Equipment 77.161741 78877582334 Schultz, Reid and Henderson 75.680939 Information Technology NEHW 360
2 HX Real Estate Management & Development 97.271968 16819057032 Kim, Mitchell and Friedman 96.126543 Real Estate ACEP 700
3 WI Internet Software & Services 94.945142 30202743224 Todd LLC 95.246048 Information Technology PFR 520
4 HX Energy Equipment & Services 88.546604 41040783132 Miller-Hebert 88.201215 Energy YZN 690
In [18]:
# Visualization
trades = l.trades(count=50)
trades['price'].hist(bins=50).plot();
In [19]:
trades['sector'].value_counts().plot(kind='bar');
In [20]:
### General Purpose
l.superstore(count=5)
Out[20]:
Category City Country Customer ID Discount Order Date Order ID Postal Code Product ID Profit Quantity Region Row ID Sales Segment Ship Date Ship Mode State Sub-Category
0 Consumer Discretionary South Maria US 068-ZSN 99.14 2019-01-02 93-0980218 50853 OQVY9448657859274 759.01 730 Region 1 0 9900 B 2019-03-02 Second Class Minnesota Automobiles
1 Industrials East Eric US GYX-1915 14.03 2019-02-19 94-2470382 29854 HIJP3794715324490 544.01 930 Region 3 1 5300 B 2019-03-23 First Class Colorado Air Freight & Logistics
2 Telecommunication Services Katiefort US IZ2 D0F 85.85 2019-02-01 24-9296271 84287 LRBM2291723076094 365.88 110 Region 3 2 6000 C 2019-03-26 Second Class Montana Wireless Telecommunication Services
3 Utilities Mariamouth US LDB 419 79.65 2019-01-12 45-6590128 30013 XADA8655597087870 758.99 580 Region 2 3 1300 C 2019-01-29 Second Class North Carolina Electric Utilities
4 Consumer Staples South Jesse US LNY 818 5.10 2019-01-25 48-4858932 26721 UIED2770994152152 18.59 60 Region 1 4 2800 C 2019-02-12 First Class New Jersey Food Products
In [21]:
# Visualization
superstore=l.superstore(count=50)
superstore['Country'].value_counts().plot(kind='bar');
In [22]:
superstore['Profit'].plot(kind='hist');
In [23]:
superstore['Sales'].plot(kind='hist');
In [24]:
superstore['State'].value_counts().plot(kind='bar');

Area

In [25]:
l.area().head()
Out[25]:
XBS.UA SYJ.CF CPA.LT GNF.FI MSN.DN
2015-01-01 0.198289 -0.619617 -0.511978 1.399578 0.348256
2015-01-02 0.182105 -0.038003 -0.656218 1.683401 2.302187
2015-01-03 -0.851996 1.841785 -0.781817 -0.315559 1.292873
2015-01-04 -1.319194 1.734769 -1.619817 -0.830086 1.026853
2015-01-05 -0.748739 3.883546 -2.147773 0.435858 2.307991
In [39]:
fig = l.area().iplot(kind='area', fill=True, asFigure=True)
# plot(fig, include_plotlyjs=False, output_type='div')

Bar

In [27]:
l.bar().head()
Out[27]:
YTJ.DR PBY.EK ZUQ.YR YSL.FA DRY.TS
2015-01-01 0.958018 0.444328 0.101012 0.164150 -0.889406
2015-01-02 -0.352005 0.872672 -0.793665 -0.419978 -0.057499
2015-01-03 -0.562561 1.547919 -2.575546 0.694849 -0.803420
2015-01-04 -0.903195 2.627425 -3.231857 0.792253 0.353367
2015-01-05 -0.801367 3.111390 -4.454500 0.248623 1.110684
In [33]:
fig = l.bar().iplot(kind='bar', asFigure=True)

Box

In [98]:
l.box().head()
Out[98]:
QOR.YP YJN.UC JAI.RR EBC.QC HFH.UI
0 7.615681 10.496136 4.615632 5.841398 6.940979
1 21.126128 7.007890 6.240759 5.392560 5.556378
2 1.002351 0.633254 1.654453 0.385170 3.620671
3 6.404937 6.213862 2.171438 9.474788 5.510254
4 1.615638 0.895569 1.616745 0.816806 2.987322
In [43]:
fig = l.box().iplot(kind='box', asFigure=True)
# plot(fig, include_plotlyjs=False, output_type='div')

Comments

Comments powered by Disqus