Split text by punctuation
Using nltk
Download nltk
models with the one-time setup and get the punkt
model for sentence parsing (also called sentence tokenizing):
import nltk
nltk.download()
NLTK Downloader
---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Downloader> q
True
then use it as:
import nltk.data
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
s = "hello, world! It's me, X! Testing this tool."
tokenizer.tokenize(str(s))
['hello, world!', "It's me, X!", 'Testing this tool.']