Tokenize Text
Libraries¶
In [5]:
from nltk.tokenize import sent_tokenize, word_tokenize
Create a sentences¶
In [8]:
paragraph = "Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aims to help programmers write clear, logical code for small and large-scale projects"
paragraph
Out[8]:
Tokenize a paragraph into sentences¶
In [9]:
sent_tokenize(paragraph)
Out[9]:
Tokenize a paragraph into words¶
In [10]:
word_tokenize(paragraph)
Out[10]: