Create a word cloud

Goal

This post aims to introduce how to create a word cloud using wordcloud

As the source of words, I use one of my posts in 200Wordsaday a.k.a. 200WaD where is the community for those who want to build a writing habit.

image

Reference

Library

In [1]:
import numpy as np
import pandas as pd
from os import path
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

import matplotlib.pyplot as plt
%matplotlib inline


# For fetching data using REST call
import requests
from IPython.display import HTML

# For cleaning html tags & non-ASCII characters
from bs4 import BeautifulSoup
import unidecode

Configuration

In [2]:
private_key = '{your private key}' # your private key

Load your words

In [3]:
# Get the latest post data 
r = requests.get(f'https://200wordsaday.com/api/texts?api_key={private_key}')
r_json = r.json()
print(f'# of posts: {len(r_json)}')
# of posts: 128
In [4]:
# Each post has the following key / parameters
r_json[0].keys()
Out[4]:
dict_keys(['uuid', 'status', 'access_rights', 'slug', 'canonical_url', 'datetime', 'published_datetime', 'title', 'content', 'word_count', 'categories', 'collections', 'timezone', 'user'])

Create a word cloud

Load one post

In [5]:
# Raw json data is a bit dirty so let's clean it up
words = r_json[2]['content']
words[:300]
Out[5]:
'\r\n        \r\n        <p>Today, I read one article about&nbsp;</p><blockquote>let\'s not advise "Let\'s find what you love"&nbsp;</blockquote><p>Steve Jobs said <i>"<b>You\'ve got to find what you love</b>" </i>at the commencement speech at Stanford in 2005. It has been years quite a few people also advi'

Clean words

In [6]:
# Cleaning text by BeautifulSoup
soup = BeautifulSoup(words)
all_text = ''.join(soup.findAll(text=True))

# Convert non-ascii characters into ASCII equivalent
all_text = unidecode.unidecode(all_text)

# # Remove backslash
# all_text = all_text.replace("\'", "'")

all_text
Out[6]:
'Today, I read one article about let\'s not advise "Let\'s find what you love" Steve Jobs said "You\'ve got to find what you love" at the commencement speech at Stanford in 2005. It has been years quite a few people also advise similarly. It has been important to find what you can have passion for and follow your heart.   However, what if you cannot find "the one"? How would you advise if someone asks how to find it when they cannot. "The Passion Paradox" written by Brad Stulberg and Steve Magness would be one of the helpful pieces of advice. They talk about how to nurture your baby of passion. Find passion vs. nurture passionThere is one analogy for those who cannot find a passionate job or hobby. It would be similar to finding "your lover". In a typical romantic movie, characters meet in a miracle situation and dramatically fall in love. This does not happen to everybody. According to Fromm, the problem is not difficult to find those who you can love but difficult to learn how to actively love someone. In the book "The Passion Paradox", they introduced two mindsets:Fit mindset Developing mindsetThe one with "fit mindset" believes they can be happy when they find a passion for something in a job or hobby. The one with "developing mindset" believes they can cultivate their passion in the given job or context. 78% of people have the "fit mindset" according to this book. In the beginning, the one with "fit mindset" has higher "fit" feeling but later the one with "developing mindset" will catch up with them. The important point here is there is a way to cultivate your passion. Apply to a writing Let me also apply this "developing mindset" to writing. One might think a miracle or tragetic events to write would fit you one day. They wait or seek for it. However, it might not happen to everyone. On the other hand, if you have a developing mindset, you could nurture great writing out of the usual experience or trivial things. Personally, this mindset will help us to be more open-minded and make the writing activity more realistically sustainable.Word of the day: trivial \n'

generate a word cloud with default

In [7]:
# Create and generate a word cloud image:
wordcloud = WordCloud().generate(all_text)

# Display the generated image:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

Generate word cloud with specific parameters

In [8]:
# Create and generate a word cloud image:

param_wordcloud = {'max_font_size':30, 
                   'max_words':80, 
                   'background_color':"white"}
wordcloud = WordCloud(**param_wordcloud).generate(all_text)

# Display the generated image:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
In [9]:
# 200WaD Logo to overlay
image_filename = '200WaD_400x400.jpg'
Image.open(f"../images/{image_filename}")
Out[9]:
In [13]:
# Create a mask from image
mask = np.array(Image.open(f"../images/{image_filename}"))


# Set parameters
param_wordcloud = {'max_font_size':80, 
                   'max_words':200, 
                   'background_color':"white", 
                   'mask': mask}

# Create and generate a word cloud
wordcloud = WordCloud(**param_wordcloud).generate(all_text)

# Display the generated image
plt.figure(figsize=[6,6])
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
In [14]:
# Create coloring from image
image_colors = ImageColorGenerator(mask)
plt.figure(figsize=[10,10])
plt.imshow(wordcloud.recolor(color_func=image_colors), interpolation="bilinear");
plt.axis("off");

Comments

Comments powered by Disqus