Ananya S

Posted on Jan 27

From Words to Meaning: Core NLP Concepts Every Beginner Must Know

#python #nlp #beginners #ai

In the previous post, we covered the basics of NLP such as tokenization, stemming, lemmatization, and stop words.

In this continuation, we will understand how machines extract meaning from text and represent language numerically.

1. Named Entity Recognition (NER)

Named Entity Recognition (NER) is an NLP technique used to identify and classify real-world entities in text.

Common entity types:

Person
Organization
Location
Date
Time
Money
Percentage

Example sentence:

Elon Musk is the CEO of Tesla and lives in the USA.

NER output:

Elon Musk → PERSON

Tesla → ORGANIZATION

USA → LOCATION

Why NER is important:

Helps extract structured information from unstructured text
Used in resume parsing and document processing
Widely applied in medical and legal NLP systems
Improves search engines and chatbots

2. Bag of Words (BoW)

Bag of Words is one of the simplest techniques to convert text into numbers.

Core idea:

Word order is ignored
Grammar is ignored
Only word frequency matters

Example:

Sentence 1: I love NLP  
Sentence 2: I love AI

Vocabulary:

[I, love, NLP, AI]

Vector representation:

Sentence 1 → [1, 1, 1, 0]

Sentence 2 → [1, 1, 0, 1]

Advantages:

Very easy to implement
Works well for small datasets
Useful as a baseline model

Limitations:

No understanding of context
No semantic meaning
Treats all words as equally important

3. TF-IDF (Term Frequency – Inverse Document Frequency)

TF-IDF improves Bag of Words by assigning importance scores to words.

TF-IDF= TF*IDF

TF(t,d)= Total number of terms in document d/Number of times term t appears in document d

IDF(t)=log( Number of documents containing term t/Total number of documents)

Intuition:

Words that occur frequently in a document are important.
Words that occur in many documents are less important.

Components:

Term Frequency (TF): frequency of a word in a document

Inverse Document Frequency (IDF): rarity of the word across documents

Why TF-IDF is better than BoW:

Reduces importance of common words like the and is
Highlights meaningful words

Performs well in:

Search engines
Spam detection
Document similarity tasks

Limitations:

Does not capture semantic meaning
Synonyms are treated as different words

4. Word2Vec

Word2Vec represents words as dense numerical vectors that capture meaning and context.

Key idea:

Words used in similar contexts have similar meanings. The vectors used to represent King, Queen, Man and Woman when undergo arithmetic operations, give results as below.

Famous examples:

King − Man + Woman ≈ Queen

Paris − France + Italy ≈ Rome

Word2Vec has 2 components:

1. CBOW (Continuous Bag of Words)

Predicts a word using surrounding context.

Sentence: "Raj went to school yesterday"
Window size: 1

Input: [Raj, to] → Output: went
Input: [went, school] → Output: to
Input: [to, yesterday] → Output: school

Working:

The context words are converted to one-hot vectors.
These vectors are summed or averaged
They are passed through the hidden layer.
The model predicts the target word
Error is calculated
Weights are updated using backpropagation

2. Skip-Gram

Predicts surrounding words using a target word.
For same sentence,

Target word: went
Context words: Raj, to

Training pairs:
Input: went → Output: Raj
Input: went → Output: to

Target = to
Context words: went, school

Training pairs:
Input: to → Output: went
Input: to → Output: school

Target = school
Context words: to, yesterday

Training pairs:
Input: school → Output: to
Input: school → Output: yesterday

Working

The target word is converted to a one-hot vector
Passed through the hidden layer
The model predicts each context word
Error is calculated
Weights are updated using backpropagation

👉 The hidden layer weights become the word embeddings

Advantages:

Captures semantic relationships
Produces dense and meaningful embeddings
Useful for clustering and similarity tasks

Limitation:

Same word has the same vector in all contexts

Example:
bank (river) and bank (money)

This limitation is addressed by contextual models like BERT.

When to Use Each Technique?

Use Bag of Words when:

Building simple text classifiers
Creating baseline NLP models

Use TF-IDF when:

Working on search systems
Performing document similarity
Detecting spam

Use Word2Vec when:

Semantic similarity is important
Building recommendation systems
Clustering text data

Final Thoughts

These techniques show the evolution of NLP from counting words to weighting word importance to understanding semantic meaning.
They form the foundation for modern NLP and Generative AI systems.