DEV Community

Cover image for Spacy Library for NLP
datatoinfinity
datatoinfinity

Posted on

Spacy Library for NLP

Think of Spacy library more intelligent than nltk. Let's start with installing spacy library.

You can use google collab to avoid hassle for downloading it.

Write in terminal of code editor.

pip install spacy
import spacy

nlp = spacy.load('en_core_web_lg') loads a large pre-trained English language model in spaCy, making it available for natural language processing tasks. This specific model, en_core_web_lg, provides comprehensive capabilities like tokenization, part-of-speech tagging, dependency parsing, and named entity recognition.

python -m spacy download en_core_web_lg
import spacy 
nlp=spacy.load('en_core_web_lg')

Tokenisation

nltk.tokenize

import nltk
from nltk.tokenize import word_tokenize
txt="Hello How it going U.S.A."
print(word_tokenize(txt))
Output:
['Hello', 'How', 'it', 'going', 'U.S.A', '.']

nltk.tokenize made '.' full stop also split.

spacy tokenize

import spacy 
nlp=spacy.load('en_core_web_lg')
text=nlp("Hello How it going U.S.A.")
for token in text:
    print(token.text)
Hello
How
it
going
U.S.A.

It doesn't split '.' full stop.

Here is question for you.
txt=nlp("I can't came there")
for token in text:
print(token.text)
Output:
I
ca
n't
came
there
Why it is treating "can't" separately "ca" "n't" how to solve this thing.

Part of Speech (POS).

import spacy 
nlp=spacy.load('en_core_web_lg')
text=nlp("Hello How it going U.S.A. we are 83 block")
for token in text:
    print(token.text,token.pos)
Hello 91
How 98
it 95
going 100
U.S.A. 96
we 95
are 87
83 93
block 92

These number is given to the part of speech.

import spacy 
nlp=spacy.load('en_core_web_lg')
text=nlp("Hello How it going U.S.A. we are 83 block")
for token in text:
    print(token.text,token.pos_)
Hello INTJ
How SCONJ
it PRON
going VERB
U.S.A. PROPN
we PRON
are AUX
83 NUM
block NOUN

Now you see Hello is interjection it is pronoun and further more.

Sentence Tokenisation

s=nlp(u"This is the first sentence. I gave given fullstop please check. Let's study now")
for sentence in s.sents:
    print(sentence)
Output:
This is the first sentence.
I gave given fullstop please check.
Let's study now

Top comments (0)