DEV Community

Cover image for Spell Checker-Predicting Correct Word by Editing two time-NLP
datatoinfinity
datatoinfinity

Posted on

Spell Checker-Predicting Correct Word by Editing two time-NLP

We have already done checking spelling in one way, what I mean by that if spelling have one incorrect character then output will be correct that only. But now we are going to do it in two way meaning if we more than two character which is incorrect how we deal with that.

One Edit Way:


spell_checker('famili')

We have one character wrong, the code took near about similar word.
['family']
spell_checker('familea')

Now we two character wrong then it return empty list.

[]

Two Edit Way:

def spell_check_edit_2(word, count=5):
    output = []
    suggested_words = set(edit(word))  # Level-1 edits (as a set)

    for e1 in edit(word):
        suggested_words.update(edit(e1))  # Level-2 edits added

    for wrd in suggested_words:
        if wrd in word_probability:
            output.append([wrd, word_probability[wrd]])

    return list(
        pd.DataFrame(output, columns=['word', 'prob'])
        .sort_values(by='prob', ascending=False)
        .head(count)['word'].values
    )

Whole code work the same,

suggested_words = set(edit(word))

  • Calls the edit() function to get all 1-edit-away words (insert/delete/replace/swap).
  • Converts the result into a set to:
    • Remove duplicates
    • Allow fast lookups and union operations

for e1 in edit(word):

  • Loops through each word that is 1 edit away.
  • Each e1 is a candidate misspelling that might still be close to the correct word.

suggested_words.update(edit(e1))

  • Calls edit() again to get all words that are 2 edits away (edit of an edit).
  • Adds them into suggested_words using .update() (which merges sets).
  • After this, suggested_words contains both:
    • All 1-edit-away words
    • All 2-edit-away words
spell_check_edit_2("familea")
Output:
['family', 'familiar', 'failed', 'families', 'famine']

Top comments (0)