Till now what we have done.
- Imported Corpus.
- Tokenize the text to word.
- Made list of tokenize word
- Operation on text
From that corpus we have made a dictionary of words or we can say a list of words where can check if the word exists or not, is the spelling is correct or not.
def edit(word):
return set(insert(word)+delete(word)+swap(word)+replace(word))
- Calling your four functions:
- insert(word)
- delete(word)
- swap(word)
- replace(word)
Each one returns a list of new words created by:
- inserting one character
- deleting one character
- swapping two adjacent characters
- replacing one character
- Combining them:
- insert(...) + delete(...) + ... creates a big list of all variations
- Converting to a set:
* Removes duplicates
* Gives you unique words that are one change away
Example: edit("lve")
insert("lve")might return:'alve','blve', ...,'lave','love', ...,'lvez'(104 total)delete("lve")might return:'ve','le','lv', ...swap("lve")might return:'vle','lev'replace("lve")might return:'ave','bve', ...,'lve','lze'(130 total)Then the combined
set(...)removes overlaps.
Function:
- Simulates all typos a human might make by one small mistake.
- Generates all possible "fixes" for a misspelled word.
- You then compare these "fixes" with your real dictionary (word_probability) to find the best suggestion.
Top comments (0)