DEV Community

Cover image for From Words to Meaning: Core NLP Concepts Every Beginner Must Know
Ananya S
Ananya S

Posted on

From Words to Meaning: Core NLP Concepts Every Beginner Must Know

In the previous post, we covered the basics of NLP such as tokenization, stemming, lemmatization, and stop words.

In this continuation, we will understand how machines extract meaning from text and represent language numerically.

1. Named Entity Recognition (NER)

Named Entity Recognition (NER) is an NLP technique used to identify and classify real-world entities in text.

Common entity types:

  1. Person
  2. Organization
  3. Location
  4. Date
  5. Time
  6. Money
  7. Percentage

Example sentence:

Elon Musk is the CEO of Tesla and lives in the USA.

Enter fullscreen mode Exit fullscreen mode

NER output:

Elon Musk → PERSON

Tesla → ORGANIZATION

USA → LOCATION

Why NER is important:

  • Helps extract structured information from unstructured text
  • Used in resume parsing and document processing
  • Widely applied in medical and legal NLP systems
  • Improves search engines and chatbots

2. Bag of Words (BoW)

Bag of Words is one of the simplest techniques to convert text into numbers.

Core idea:

  • Word order is ignored
  • Grammar is ignored
  • Only word frequency matters

Example:

Sentence 1: I love NLP  
Sentence 2: I love AI
Enter fullscreen mode Exit fullscreen mode

Vocabulary:

[I, love, NLP, AI]
Enter fullscreen mode Exit fullscreen mode

Vector representation:

Sentence 1 → [1, 1, 1, 0]

Sentence 2 → [1, 1, 0, 1]

Advantages:

  • Very easy to implement
  • Works well for small datasets
  • Useful as a baseline model

Limitations:

  • No understanding of context
  • No semantic meaning
  • Treats all words as equally important

3. TF-IDF (Term Frequency – Inverse Document Frequency)

TF-IDF improves Bag of Words by assigning importance scores to words.

TF-IDF= TF*IDF

TF(t,d)= Total number of terms in document d/Number of times term t appears in document d

IDF(t)=log( Number of documents containing term t/Total number of documents)
Enter fullscreen mode Exit fullscreen mode

Intuition:

  • Words that occur frequently in a document are important.
  • Words that occur in many documents are less important.

Components:

Term Frequency (TF): frequency of a word in a document

Inverse Document Frequency (IDF): rarity of the word across documents

Why TF-IDF is better than BoW:

  • Reduces importance of common words like the and is
  • Highlights meaningful words

Performs well in:

  1. Search engines
  2. Spam detection
  3. Document similarity tasks

Limitations:

  • Does not capture semantic meaning
  • Synonyms are treated as different words

4. Word2Vec

Word2Vec represents words as dense numerical vectors that capture meaning and context.

Key idea:

Words used in similar contexts have similar meanings. The vectors used to represent King, Queen, Man and Woman when undergo arithmetic operations, give results as below.

Famous examples:

King − Man + Woman ≈ Queen

Paris − France + Italy ≈ Rome

Word2Vec has 2 components:

1. CBOW (Continuous Bag of Words)

Predicts a word using surrounding context.

Sentence: "Raj went to school yesterday"
Window size: 1

Input: [Raj, to] → Output: went
Input: [went, school] → Output: to
Input: [to, yesterday] → Output: school

Enter fullscreen mode Exit fullscreen mode

Working:

  • The context words are converted to one-hot vectors.
  • These vectors are summed or averaged
  • They are passed through the hidden layer.
  • The model predicts the target word
  • Error is calculated
  • Weights are updated using backpropagation

2. Skip-Gram

Predicts surrounding words using a target word.
For same sentence,

Target word: went
Context words: Raj, to

Training pairs:
Input: went → Output: Raj
Input: went → Output: to

Target = to
Context words: went, school

Training pairs:
Input: to → Output: went
Input: to → Output: school

Target = school
Context words: to, yesterday

Training pairs:
Input: school → Output: to
Input: school → Output: yesterday

Enter fullscreen mode Exit fullscreen mode

Working

  • The target word is converted to a one-hot vector
  • Passed through the hidden layer
  • The model predicts each context word
  • Error is calculated
  • Weights are updated using backpropagation

👉 The hidden layer weights become the word embeddings

Advantages:

  • Captures semantic relationships
  • Produces dense and meaningful embeddings
  • Useful for clustering and similarity tasks

Limitation:

Same word has the same vector in all contexts

Example:
bank (river) and bank (money)

This limitation is addressed by contextual models like BERT.

When to Use Each Technique?

Use Bag of Words when:

  • Building simple text classifiers
  • Creating baseline NLP models

Use TF-IDF when:

  • Working on search systems
  • Performing document similarity
  • Detecting spam

Use Word2Vec when:

  • Semantic similarity is important
  • Building recommendation systems
  • Clustering text data

Final Thoughts

These techniques show the evolution of NLP from counting words to weighting word importance to understanding semantic meaning.
They form the foundation for modern NLP and Generative AI systems.

Top comments (0)