DEV Community

Discussion on: Need help python text matching

Collapse
 
siawayforward profile image
Sia • Edited

Gotcha! Then you might be able to combine the two. NLTK has a collocations module where you are able to see if a word or phrase is in a block of text and get that word along with n closest terms to it on either side. If you have both full words and abbreviations e.g. NYC and New York City, you could use a method to transform and then use collocations to check for both forms of something from table 1 in table 2. If I remember correctly, it is part of nltk.collocations. There are methods for bi, tri, quad grams. Still new here so not entirely sure how to share a code snippet, but I hope this jolts some ideas.
P.S. for conjoined words that you may want to split (if there is cleaning involved), there is a wordninja module that does this: e.g. iamtiredtoday -> I am tired today.
Good luck!