In this article we simply talk about two methods of text parsing in Python. What we will do is given a string like
>>> line = 'aaa bbb ccc'
Split it into substrings, create strings based on this string.
Slice string
The first method is fragment by fragment. Define the recording offset, and then extract the desired string. [start:end]
. Example:
>>> line = 'aaa bbb ccc'
>>> col1 = line[0: 3]
>>> col3 = line[8:]
>>> col1
'aaa'
>>> col3
'ccc'
>>>
However, this is undoable with a large string. Many developers use the .split ()
function.
Split function
The split() function turns a string into a list of strings. By default this function splits on spaces, meaning every word in a sentence will be a list item.
>>> line = 'aaa bbb ccc'
>>> a = line.split ( '')
>>> a
[ 'Aaa', 'bbb', 'ccc']
>>> a[0]
'Aaa'
>>> a[1]
'Bbb'
>>> a[2]
'Ccc'
>>>
You can split on character in the string, by setting the character in the split function. This can be a comma, a dash, a semicolon or even a dot (phrases).
>>> line = 'aaa, bbb, ccc'
>>> a = line.split(',')
>>> a
[ 'Aaa', 'bbb', 'ccc']
>>>
Top comments (1)
Just don't use
split
to break text into words. It will not work well with punctuation or Asian languages like Chinese or Japanese. In JavaScript there is a special object for this use case:dev.to/kamonwan/the-right-way-to-b...
Perhaps, in Python there is something similar?