Spell Checker-Predicting Correct Word-NLP-Part 2

#nlp #machinelearning #devto #python

def spell_checker(word,count=5):
    output=[]
    suggested_words=edit(word)
    for wrd in suggested_words:
        if wrd in word_probability.keys():
            output.append([wrd,word_probability[wrd]])
    return list(pd.DataFrame(output,columns=['word','prob']).sort_values(by='prob',ascending=False).head(count)['word'].values)

Let's break it down step by step.

def spell_checker(word,count=5):

Defines a function called spell_checker.
word is the misspelled word you want to correct.
count=5 is the number of top suggestions you want to return (default = 5).

output=[]

Initializes an empty list to store valid suggested words with their probabilities.

suggested_words=edit(word)

Calls the edit() function which is defined earlier.

def edit(word):
return set(insert(word) + delete(word) + swap(word) + replace(word))

This returns a set of all words that are one edit away from the input word.
Examples: For "lve" → ['love', 'live', 'lave', ...]

    for wrd in suggested_words:
        if wrd in word_probability.keys():
            output.append([wrd, word_probability[wrd]])

What happens here:

Loops through each wrd in the list of suggested words.
Checks: Is wrd a real word?
- If yes (i.e., it's in word_probability, which comes from your big.txt dictionary),
Then it appends a pair [wrd, probability] to the output list.

Example:

If 'love' is in the corpus and has probability 0.0042:

Output: 
[['love', 0.0042], ['live', 0.0021], ...]

    return list(pd.DataFrame(output, columns=['word', 'prob']).sort_values(by='prob', ascending=False).head(count)['word'].values)

pd.DataFrame(output, columns=['word', 'prob'])

Converts the list of [word, prob] pairs into a pandas DataFrame:

   word   prob
0  love  0.0042
1  live  0.0021

.sort_values(by='prob', ascending=False)

Sorts the DataFrame so the most frequent (most likely correct) words come first.

.head(count)
- Selects the top count words (default = 5)
['word'].values and list(...)

* Extracts just the `"word"` column as a list.

spell_checker('famili')

If the top edits (like family, familiar, fail, etc.) exist in the corpus and are frequent, you might get:

['family', 'familiar', 'fail', 'facility', 'famine']

DEV Community

Spell Checker-Predicting Correct Word-NLP-Part 2

Top comments (0)