DEV Community

Cover image for What is NER in NLP? Real-World Examples and Use Cases Using Python and spaCy
datatoinfinity
datatoinfinity

Posted on • Edited on

What is NER in NLP? Real-World Examples and Use Cases Using Python and spaCy

Ever wondered how Google or Siri understands names, places, and brands from a sentence? That's Named Entity Recognition – the secret behind smart machines understanding real-world references!

Well, Name Entity Recognition (NER) which is a subtask of Natural Language Processing. It is a process to identify entities in a text from a predefined categories like person, organisation, location etc.

It helps in an information extraction, allowing automated extraction of structured data from unstructured text. By recognising named entities, systems can better understand the relationship between different pieces of information within the text.

Example

Steve Jobs was a founder of Apple, he created his company April 1, 1976. Now company headquarter located in Cupertino,California,United State

Person: Steve Jobs
Organisation: Apple
Date: April 1, 1976
Location: Cupertino, California, United States

Of course there is no way you can read whole text or corpus you just want context of that text. So these predefined libraries in python done these task easily for us to make better and efficient model

Where We Use It?

  • Chatbot: ChatGpt, MetaAI, Gemini and other chatbot use NER model they trained on this to identify relevant entities mentioned in conversation.
  • Search Engine: More Obvious, NER helps search engine to identify and categorise subject mentioned on the web and in searches.

Code

  • Print Entities of text
    import spacy 
    nlp=spacy.load('en_core_web_sm')
    text=nlp(u"Steve Jobs was a founder of Apple, he created his company April 1, 1976. Now company headquarter located in Cupertino,California,United State")

print(text.ents)

text.ents prints the entities of text.

Output:

(Steve Jobs, Apple, April 1, 1976, Cupertino, California, United State)
  • Print entity, label and label description Now iterate through text and print entity, label and description of that label.
def show_entities(text):
    if text.ents:
        for ent in text.ents:
            print(ent, '|' , ent.label_, '|' , spacy.explain(ent.label_))
    else:
        print('No Entities Found')

show_entities(text)
Output:
Steve Jobs | PERSON | People, including fictional
Apple | ORG | Companies, agencies, institutions, etc.
April 1, 1976 | DATE | Absolute or relative dates or periods
Cupertino | GPE | Countries, cities, states
California | GPE | Countries, cities, states
United State | GPE | Countries, cities, states
  • Make label a new entity

Sometimes .ent doesn't identify the entity because of data from which library made of doesn't have that word so '.ent' wouldn't able to identify it.

d=nlp(u"Foodles is earning money at an extensive rate")
def show_entities(d):
    if d.ents:
        for ent in d.ents:
            print(ent, '|' , ent.label_, '|' , spacy.explain(ent.label_))
    else:
        print('No Entities Found')
Output:
No Entities Found

Foodles is not identified as organisation because library vocabulary doesn't have this word

d=nlp(u"Foodles is earning money at an extensive rate")
def show_entities(d):
    if d.ents:
        for ent in d.ents:
            print(ent, '|' , ent.label_, '|' , spacy.explain(ent.label_))
    else:
        print('No Entities Found')

ORG=d.vocab.strings[u"ORG"]
new_entity=ss(d,0,1,label=ORG)
d.ents=list(d.ents)+[new_entity]

show_entities(d)
Output:

Foodles | ORG | Companies, agencies, institutions, etc
  • This gets the numerical ID for the label "ORG" (Organization) from spaCy's vocabulary.
  • ss is short for spacy.tokens.Span, so it's like: Span(doc, start, end, label=label) 0, 1 → refers to the position of the word you want to tag: 0 = start index of the word in the Doc 1 = end index (non-inclusive) → only the first token
  • d.ents = the existing entities (like "Google" as ORG, "India" as GPE, etc.).You're adding your new entity to the list of existing ones.

Make new label for an Entity

d=nlp(u"Playing Cricket and Football are both good for health")

def show_entities(d):
    if d.ents:
        for ent in d.ents:
            print(ent, '|' , ent.label_, '|' , spacy.explain(ent.label_))
    else:
        print('No Entities Found')
Output:

No Entities Found

As you can see there no label for Cricket and Football

from spacy.matcher import PhraseMatcher
d=nlp(u"Playing Cricket and Football are both good for health")

m=PhraseMatcher(nlp.vocab)

phrase=['Football','Cricket']
patterns=[nlp(text) for text in phrase]
m.add('sports',None,*patterns)
found=m(d)
sport=d.vocab.strings[u"Sports"]
new_ents=[ss(d,match[1],match[2],label=sport) for match in found]
d.ents=list(d.ents)+new_ents

def show_entities(d):
    if d.ents:
        for ent in d.ents:
            print(ent, '|' , ent.label_, '|' , spacy.explain(ent.label_))
    else:
        print('No Entities Found')
Output:
Cricket | Sports | None
Football | Sports | None
  1. from spacy.matcher import PhraseMatcher You’re importing a tool that can find exact phrases like “Cricket” or “Football” in text.
  2. d = nlp(u"Playing Cricket and Football are both good for health") nlp() tokenize the sentence into word.
  3. m = PhraseMatcher(nlp.vocab) Create a PhraseMatcher. This tool will help find specific words or phrases from nlp.vocab.
  4. phrase = ['Football', 'Cricket'] You want to tag these words as entities.
  5. patterns = [nlp(text) for text in phrase] Converts each word into a spaCy Doc object (required by the matcher).
  6. m.add('sports', None, *patterns) Adds your patterns to the matcher under the label "sports".
  7. found = m(d) Run the matcher on the sentence d.

This returns matches, for example:

[(match_id, 1, 2), (match_id, 3, 4)]

Here,
1, 2 = "Cricket"
3, 4 = "Football"

  1. sport = d.vocab.strings[u"Sports"] This gets a unique numeric ID for the label "Sports" (your custom entity name). 9 new_ents = [ss(d, match[1], match[2], label=sport) for match in found] ss = Span (new entity span in the Doc) You create new entities from the match positions: match[1] = start match[2] = end label=sport = label this word as “Sports”
  2. d.ents = list(d.ents) + new_ents You’re adding the new entities to the original sentence.

Want to learn Part of speech (POS)?
Learn About Word Cloud in NLP

Top comments (0)