Replace Characters

h1ros

May 9, 2019, 11:02:21 PM

Goal¶

This post aims to introduce how to replace the characters in python.

Create strings¶

In [2]:

# Create strings
strings = 'String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects'
strings

Out[2]:

'String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects'

Replace characters¶

`.replace('{old}', '{new}')`¶

In [16]:

# .replace('{old}', '{new}')
strings.replace('S', 'a')

Out[16]:

'atringstructurealongflexiblestructuremadefromthreadstwistedtogetherwhichisusedtotiebindorhangotherobjects'

`.replace` can be chained¶

In [5]:

# .replace can be chained
strings.replace('(', '').replace(' ', '_')

Out[5]:

'String_structure),_a_long_flexible_structure_made_from_threads_twisted_together,_which_is_used_to_tie,_bind,_or_hang_other_objects'

replace multiple characters using dictionary¶

In [15]:

d_replace = {'(': '',
             ')': '',
             ' ': '',
             ',': ''}

for old, new in d_replace.items():
    print(f'replace {old} with {new} ')
    strings = strings.replace(old, new)
strings

replace ( with  
replace ) with  
replace   with  
replace , with

Out[15]:

'Stringstructurealongflexiblestructuremadefromthreadstwistedtogetherwhichisusedtotiebindorhangotherobjects'

Add Padding Around String

h1ros

May 8, 2019, 11:43:55 PM

Comments

Goal¶

This post aims to introduce how to add padding around string.

Reference:

Create a string and number¶

In [6]:

string = 'abc_def'
num = 10

Add padding by " "(space) or other character¶

There is a method for string, called ljust

S.ljust(width[, fillchar]) -> str

In [3]:

string.ljust(10)

Out[3]:

'abc_def   '

In [5]:

string.ljust(10, 'a')

Out[5]:

'abc_defaaa'

Add zero padding to numbers¶

In [10]:

# the chaaracter after ":" is the one used for padding
'{:010}'.format(num)

Out[10]:

'0000000010'

In [13]:

# python >= 3.6 
# the character after ":" is the one used for padding
f'{num:010}'

Out[13]:

'0000000010'

Create a word cloud

h1ros

May 7, 2019, 12:08:10 AM

Comments

Goal¶

This post aims to introduce how to create a word cloud using wordcloud

As the source of words, I use one of my posts in 200Wordsaday a.k.a. 200WaD where is the community for those who want to build a writing habit.

Reference

Datacamp - Generating WordClouds in Python

Invert A Matrix

h1ros

May 6, 2019, 11:26:46 PM

Comments

Goal¶

This post aims to show how to invert a matrix using numpy i.e., calculating a inverse matrix $A^{-1}$ from $A$

For example, if we have

$$A = \begin{bmatrix} a & b \\ c & d \end{bmatrix} $$

Then, $A^{-1}$ should meet with

$$A A^{-1} = I = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} $$

Reference:

Library¶

In [1]:

import numpy as np
from numpy.linalg import inv

Create a matrix¶

In [2]:

arr = np.array([[1, 2], [3, 4]])
arr

Out[2]:

array([[1, 2],
       [3, 4]])

Invert a matrix¶

In [3]:

arr_inv = inv(arr)
arr_inv

Out[3]:

array([[-2. ,  1. ],
       [ 1.5, -0.5]])

Check $A A^{-1} = I$¶

In [4]:

np.dot(arr_inv, arr)

Out[4]:

array([[1.00000000e+00, 0.00000000e+00],
       [2.22044605e-16, 1.00000000e+00]])

In [5]:

np.dot(arr, arr_inv)

Out[5]:

array([[1.0000000e+00, 0.0000000e+00],
       [8.8817842e-16, 1.0000000e+00]])

Reshape An Array

h1ros

May 5, 2019, 11:45:59 PM

Comments

Goal¶

This post aims to describe how to reshape an array from 1D to 2D or 2D to 1D using numpy.

Reference:

numpy.reshape

Library¶

In [1]:

import numpy as np

Create a 1D and 2D array¶

In [6]:

# 1D array
arr_1d = np.array(np.arange(0, 10))
arr_1d

Out[6]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [7]:

arr_1d.shape

Out[7]:

(10,)

In [12]:

# 2D array
arr_2d = np.array([np.arange(1, 20, 2), np.arange(100, 80, -2)]).T
arr_2d

Out[12]:

array([[  1, 100],
       [  3,  98],
       [  5,  96],
       [  7,  94],
       [  9,  92],
       [ 11,  90],
       [ 13,  88],
       [ 15,  86],
       [ 17,  84],
       [ 19,  82]])

In [13]:

arr_2d.shape

Out[13]:

(10, 2)

Reshape¶

reshape from 1D to 2D¶

In [9]:

# 1D with shape (10, ) to 2D with shape (2, 5)
np.reshape(arr_1d, [2, 5])

Out[9]:

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [18]:

np.reshape(arr_1d, [2, 5]).shape

Out[18]:

(2, 5)

reshape from 2D to 1D¶

In [16]:

# 2D with shape (10, 2) to 2D with shape (20, )
np.reshape(arr_2d, arr_2d.size)

Out[16]:

array([  1, 100,   3,  98,   5,  96,   7,  94,   9,  92,  11,  90,  13,
        88,  15,  86,  17,  84,  19,  82])

In [17]:

np.reshape(arr_2d, arr_2d.size).shape

Out[17]:

(20,)

Stemming Words and Sentences

h1ros

May 3, 2019, 8:23:39 AM

Comments

Goal¶

This post aims to introduce stemming words and sentences using nltk (Natural Language Tool Kit)

Reference:

Stemming and Lemmatization in Python
Chris Albon's blog (I look at his post's title and wrote my own contents to deepen my understanding about the topic.)

Library¶

In [1]:

import nltk
from nltk.stem import PorterStemmer
from nltk.stem import LancasterStemmer
from nltk.tokenize import sent_tokenize, word_tokenize
nltk.download('punkt')
porter = PorterStemmer()
lancaster=LancasterStemmer()

[nltk_data] Downloading package punkt to /Users/hiro/nltk_data...
[nltk_data]   Package punkt is already up-to-date!

Words & Sentences to be stemmed¶

In [2]:

l_words1 = ['cats', 'trouble', 'troubling', 'troubled']
l_words2 = ['dogs', 'programming', 'programs', 'programmed', 'cakes', 'indices', 'matrices']

print(l_words1)
print(l_words2)

['cats', 'trouble', 'troubling', 'troubled']
['dogs', 'programming', 'programs', 'programmed', 'cakes', 'indices', 'matrices']

The example of sentences is Wiki - Stemming #Examples

In [3]:

sentence = 'A stemmer for English operating on the stem cat should identify such strings as cats, catlike, and catty. A stemming algorithm might also reduce the words fishing, fished, and fisher to the stem fish. The stem need not be a word, for example the Porter algorithm reduces, argue, argued, argues, arguing, and argus to the stem argu.'
sentence

Out[3]:

'A stemmer for English operating on the stem cat should identify such strings as cats, catlike, and catty. A stemming algorithm might also reduce the words fishing, fished, and fisher to the stem fish. The stem need not be a word, for example the Porter algorithm reduces, argue, argued, argues, arguing, and argus to the stem argu.'

Stemming words¶

Porter Stemming¶

Porter Stemming keeps only prefix for each words and leave non English words like troubl. It might not be useful to see non English words for further analysis but it is simple and efficient.

In [4]:

for word in l_words1:
    print(f'{word} \t -> {porter.stem(word)}'.expandtabs(15))

cats            -> cat
trouble         -> troubl
troubling       -> troubl
troubled        -> troubl

In [5]:

for word in l_words2:
    print(f'{word} \t -> {porter.stem(word)}'.expandtabs(15))

dogs            -> dog
programming     -> program
programs        -> program
programmed      -> program
cakes           -> cake
indices         -> indic
matrices        -> matric

Lancaster Stemming¶

Lancaster stemming is a rule-based stemming based on the last letter of the words. It is computationally heavier than Porter stemming.

In [6]:

for word in l_words1:
    print(f'{word} \t -> {lancaster.stem(word)}'.expandtabs(15))

cats            -> cat
trouble         -> troubl
troubling       -> troubl
troubled        -> troubl

In [7]:

for word in l_words2:
    print(f'{word} \t -> {lancaster.stem(word)}'.expandtabs(15))

dogs            -> dog
programming     -> program
programs        -> program
programmed      -> program
cakes           -> cak
indices         -> ind
matrices        -> mat

Stemming sentences¶

Tokenize¶

In [8]:

tokenized_words=word_tokenize(sentence)
print(tokenized_words)

['A', 'stemmer', 'for', 'English', 'operating', 'on', 'the', 'stem', 'cat', 'should', 'identify', 'such', 'strings', 'as', 'cats', ',', 'catlike', ',', 'and', 'catty', '.', 'A', 'stemming', 'algorithm', 'might', 'also', 'reduce', 'the', 'words', 'fishing', ',', 'fished', ',', 'and', 'fisher', 'to', 'the', 'stem', 'fish', '.', 'The', 'stem', 'need', 'not', 'be', 'a', 'word', ',', 'for', 'example', 'the', 'Porter', 'algorithm', 'reduces', ',', 'argue', ',', 'argued', ',', 'argues', ',', 'arguing', ',', 'and', 'argus', 'to', 'the', 'stem', 'argu', '.']

Stemming by Porter stemming¶

In [9]:

tokenized_sentence = []
for word in tokenized_words:
    tokenized_sentence.append(porter.stem(word))
tokenized_sentence = " ".join(tokenized_sentence)
tokenized_sentence

Out[9]:

'A stemmer for english oper on the stem cat should identifi such string as cat , catlik , and catti . A stem algorithm might also reduc the word fish , fish , and fisher to the stem fish . the stem need not be a word , for exampl the porter algorithm reduc , argu , argu , argu , argu , and argu to the stem argu .'

Stemming by lancaster¶

In [10]:

tokenized_sentence = []
for word in tokenized_words:
    tokenized_sentence.append(lancaster.stem(word))
tokenized_sentence = " ".join(tokenized_sentence)
tokenized_sentence

Out[10]:

'a stem for engl op on the stem cat should ident such strings as cat , catlik , and catty . a stem algorithm might also reduc the word fish , fish , and fish to the stem fish . the stem nee not be a word , for exampl the port algorithm reduc , argu , argu , argu , argu , and arg to the stem argu .'

Adding Or Substracting Time

h1ros

May 2, 2019, 11:10:13 PM

Comments

Goal¶

This post aims to add or subtract time from date column using pandas:

Pandas

Reference:

Chris Albon's blog (I look at his post's title and wrote my own contents to deepen my understanding about the topic.)

Library¶

In [1]:

import pandas as pd

Create date columns using date_range¶

In [2]:

date_rng = pd.date_range(start='20160101', end='20190101', freq='m', closed='left')
date_rng

Out[2]:

DatetimeIndex(['2016-01-31', '2016-02-29', '2016-03-31', '2016-04-30',
               '2016-05-31', '2016-06-30', '2016-07-31', '2016-08-31',
               '2016-09-30', '2016-10-31', '2016-11-30', '2016-12-31',
               '2017-01-31', '2017-02-28', '2017-03-31', '2017-04-30',
               '2017-05-31', '2017-06-30', '2017-07-31', '2017-08-31',
               '2017-09-30', '2017-10-31', '2017-11-30', '2017-12-31',
               '2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30',
               '2018-05-31', '2018-06-30', '2018-07-31', '2018-08-31',
               '2018-09-30', '2018-10-31', '2018-11-30', '2018-12-31'],
              dtype='datetime64[ns]', freq='M')

1036. Escape a Large Maze

h1ros

May 1, 2019, 11:01:08 PM

Comments

Problem Setting¶

In a 1 million by 1 million grid, the coordinates of each grid square are (x, y) with 0 <= x, y < 10^6.

We start at the source square and want to reach the target square. Each move, we can walk to a 4-directionally adjacent square in the grid that isn't in the given list of blocked squares.

Return true if and only if it is possible to reach the target square through a sequence of moves

Link for Problem: leetcode

Example 1:¶

Input: blocked = [[0,1],[1,0]], source = [0,0], target = [0,2]

Output: false

Explanation: The target square is inaccessible starting from the source square, because we can't walk outside the grid.

Example 2:¶

Input: blocked = [], source = [0,0], target = [999999,999999]

Output: true

Explanation:

Because there are no blocked cells, it's possible to reach the target square.

Ordinal Encoding using Scikit-learn

h1ros

Apr 30, 2019, 8:17:07 PM

Comments

Goal¶

This post aims to convert one of the categorical columns for further process using scikit-learn:

Library¶

In [1]:

import pandas as pd
import sklearn.preprocessing

Create categorical data¶

In [2]:

df = pd.DataFrame(data={'type': ['cat', 'dog', 'sheep'], 
                       'weight': [10, 15, 50]})
df

Out[2]:

	type	weight
0	cat	10
1	dog	15
2	sheep	50

Ordinal Encoding¶

Ordinal encoding is replacing the categories into numbers.

In [3]:

# Instanciate ordinal encoder class
oe = sklearn.preprocessing.OrdinalEncoder()

# Learn the mapping from categories to the numbers
oe.fit(df.loc[:, ['type']])

Out[3]:

OrdinalEncoder(categories='auto', dtype=<class 'numpy.float64'>)

In [4]:

# Apply this ordinal encoder to new data 
oe.transform(pd.DataFrame(['cat'] * 3 + 
                          ['dog'] * 2 + 
                          ['sheep'] * 5))

Out[4]:

array([[0.],
       [0.],
       [0.],
       [1.],
       [1.],
       [2.],
       [2.],
       [2.],
       [2.],
       [2.]])

Create A Sparse Matrix

h1ros

Apr 29, 2019, 11:29:15 PM

Comments

Goal¶

This post aims to create a sparse matrix in python using following modules:

Numpy
Scipy

Reference:

Scipy Document
Chris Albon's blog (I look at his post's title and wrote my own contents to deepen my understanding about the topic.)

Library¶

In [8]:

import numpy as np
import scipy.sparse

Create a sparse matrix using csr_matrix¶

CSR stands for "Compressed Sparse Row" matrix

In [9]:

nrow = 10000
ncol = 10000

# CSR stands for "Compressed Sparse Row" matrix
arr_sparse = scipy.sparse.csr_matrix((nrow, ncol))
arr_sparse

Out[9]:

<10000x10000 sparse matrix of type '<class 'numpy.float64'>'
	with 0 stored elements in Compressed Sparse Row format>

Goal¶

Create strings¶

Replace characters¶

.replace('{old}', '{new}')¶

.replace can be chained¶

replace multiple characters using dictionary¶

Goal¶

Create a string and number¶

Add padding by " "(space) or other character¶

Add zero padding to numbers¶

Goal¶

Goal¶

Library¶

Create a matrix¶

Invert a matrix¶

Check $A A^{-1} = I$¶

Goal¶

Library¶

Create a 1D and 2D array¶

Reshape¶

reshape from 1D to 2D¶

reshape from 2D to 1D¶

Goal¶

Library¶

Words & Sentences to be stemmed¶

Stemming words¶

Porter Stemming¶

Lancaster Stemming¶

Stemming sentences¶

Tokenize¶

Stemming by Porter stemming¶

Stemming by lancaster¶

Goal¶

Library¶

Create date columns using date_range¶

Problem Setting¶

Example 1:¶

Example 2:¶

Goal¶

Library¶

Create categorical data¶

Ordinal Encoding¶

Goal¶

Library¶

Create a sparse matrix using csr_matrix¶

`.replace('{old}', '{new}')`¶

`.replace` can be chained¶