DEV Community: datatoinfinity

Business of Fear Monger

datatoinfinity — Tue, 26 May 2026 14:18:29 +0000

Fear Is a Product

Somewhere on YouTube right now, a video is telling you React is dying. Another one says AI will take every job. Serious background music plays. The thumbnail screams in red. And somewhere, a new learner closes their laptop and thinks: why even start?
That is not information. That is a business model.
Before you absorb any of it, ask yourself three questions:
Are you scared? Then you are exactly the audience they are targeting.
What are they selling? A course, watch time, or just the feeling of being in the know.
Who do you trust? Not the loudest voice. The one giving you perspective and then letting you think.

You Are Inside a Bubble

I am fond of astrophysics. Here is what it teaches you about your own mind.
Our solar system feels enormous until you see our galaxy. The galaxy feels infinite until you see the Local Group. The Local Group is a speck inside a supercluster. And beyond the observable universe? We have no idea. The scale keeps breaking your assumptions.
Your information feed works the same way.
If you love dogs, the algorithm feeds you rescue stories, adoption videos, and dogs being heroes. Your world becomes a world where dogs are pure and precious. That is not wrong but it is incomplete.
There is a six-year-old girl named Chavi. She was walking home when a pack of stray dogs attacked her. The injuries were severe. She died the next day.
Now there is a court case. What do we do with stray dogs that attack people? Who is responsible?
And here is where it gets interesting. The people most vocal about loving dogs online what are they actually doing? Most are ranting. A smaller, quieter group is building shelters. The algorithm showed you the ranters because they generate more emotion. The builders barely trend.
You were not lied to. You were just shown a very small part of a very large picture.
That is your bubble. Same technology, same mechanism, same effect whether the topic is dogs, careers, or the future of programming.

The Fear Mongers Know Your Address

They know exactly how to reach you.
If you are unemployed and researching the tech industry, you are already anxious. You open YouTube and every video confirms the anxiety: AI is taking your job. Layoffs everywhere. The market is brutal. The music swells. The creator looks serious and concerned for you.
Some of these creators are giving you genuine perspective. Some are selling a course at the end. Some just want you to watch until the final second so their metrics improve. You get fear. They get paid.
The question is not whether bad things are happening. Layoffs are real. Market shifts are real. The question is: is this person giving you a map, or just making you feel lost so you keep following them?

History Does Not Panic

When the desktop computer arrived, thousands of jobs built around paper, ledgers, and manual calculation disappeared. What replaced them? An entire global industry software, hardware, networking, IT support, digital design that employs hundreds of millions of people today.
When instant messaging arrived, the telegraph died. The telephone industry that replaced it became one of the largest employers in human history.
When mobile data became cheap enough that a single recharge gave you unlimited calls do you know how that industry actually sustains itself? Data. Apps. Ecosystems. Entirely new categories of work that did not exist before.
Every compression of old work creates surface area for new work. This has been true for two hundred years without a single exception.
So yes, AI is causing layoffs. But look closer at why. The major tech companies cutting tens of thousands of jobs between 2022 and 2024 were simultaneously posting record profits. Those layoffs were not caused by AI making workers useless. They were cost-cutting measures to fund the AI pivot. AI was the destination, not the cause.
And think about what AI actually is. It is not a cloud. It is not a thought. Behind every AI product is a data centre, physical servers, cooling systems, energy infrastructure, engineers maintaining it, researchers improving it, and companies trying to use it efficiently enough that they do not run out of compute budget. There is a reason AI companies are obsessed with token efficiency this thing is expensive and physical and real.
When a single small storage device can hold the world's entire data, that will be a revolution. Right now, we are in the expensive, messy, labour-intensive middle of a transition. New industries are forming in real time.

What Actually Helps

Awareness. Know what bubble you are in — not to abandon it, but to see its edges. Ask yourself: why is my feed this negative? Or this positive? What am I not being shown? Try to reach the observable universe of the topic before forming a strong opinion.
Rational thinking. This is a muscle. Train it by slowing down before any piece of content disturbs your peace. One question cuts through most noise: is this person giving me tools to think, or tools to feel? Feeling is faster, easier, and more addictive. Thinking is slower, harder, and far more useful.
Vitamins. This one sounds small. It is not. Depression, chronic anxiety, and the impulse to doom-scroll all have a physical component. Get a blood test before you decide the world is ending. Sleep debt alone can make every problem feel unsolvable. Your body is the hardware your mind runs on.

The Real Answer

I used to be inside the same galaxy before I found my way to the observable universe. I made peace with the uncertainty. I started exploring these tools instead of fearing them.
I always wanted to build a portfolio with a Minecraft theme. I am building it now.
That is the whole answer, hidden at the end of a long thought. While the algorithm debates whether React is dying, someone is building something with it. While the headlines mourn the developer job market, someone is learning to use AI as a tool instead of fearing it as a replacement.
Every major innovation in history arrived with social unrest, confusion, and people certain the world was ending. The world did not end. It reorganised. And the people who came out on the other side were not the ones who predicted the collapse most accurately.
They were the ones who kept building.

Awareness. Rational thinking. Vitamins. Get your facts right. Then get back to work.

Twitter
Source:First question does lay-off is not happening?
Source: Tweet about token exhaustion

Optimized PDF Q&A Assistant with Streamlit, LangChain, Hugging Face, and Supabase

datatoinfinity — Wed, 20 Aug 2025 10:05:54 +0000

Live Demo: PDFSUMMARIZATION Site

Github CODE

Optimized PDF Q&A Assistant with Streamlit, LangChain, Hugging Face, and Supabase

When working on AI projects, you might notice that code runs fast on Google Colab but slows down on a local machine. The solution is to make the pipeline optimized and efficient.

In this blog, I’ll walk you through building a PDF Q&A Assistant that:

Upload a PDF → hash & check if already stored → extract, embed, and save chunks in Supabase → take user’s question → retrieve relevant chunks → refine with LLM → display answer.

Tech Stack Used

Streamlit → Front-end UI and deployment
LangChain → Works with LLMs, connecting the AI “brain”
Hugging Face → Provides powerful pre-trained models
Supabase → Vector database for storing and retrieving PDF data

Configuration

from sentence_transformers import SentenceTransformer
from supabase import create_client
from huggingface_hub import InferenceClient

SUPABASE_URL = st.secrets["SUPABASE_URL"]
SUPABASE_KEY = st.secrets["SUPABASE_KEY"]
HF_TOKEN = st.secrets["HF_TOKEN"]  # Hugging Face token

supabase = create_client(SUPABASE_URL, SUPABASE_KEY)
model = SentenceTransformer('all-MiniLM-L6-v2')
hf_client = InferenceClient(api_key=HF_TOKEN)

Here, Supabase is used for storage, a SentenceTransformer model handles embeddings, and Hugging Face provides an LLM client for inference.

Hash and Extract PDF Data

import fitz  # PyMuPDF (faster alternative to pdfplumber)
import hashlib

def hash_pdf(pdf_path):
    with open(pdf_path, "rb") as f:
        return hashlib.md5(f.read()).hexdigest()

def extract_and_chunk(pdf_path, chunk_size=500):
    doc = fitz.open(pdf_path)
    text = " ".join([page.get_text() for page in doc])
    words = text.split()
    chunks = [' '.join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]
    return chunks

hashlib → creates a unique fingerprint (hash) for the PDF, preventing duplicate processing.
fitz → efficiently extracts text from the PDF and splits it into manageable chunks.

Embed, Store, and Retrieve

def embed_chunks(chunks):
    return model.encode(chunks, batch_size=16, show_progress_bar=True).tolist()

def store_to_supabase(chunks, embeddings, pdf_id):
    data = [{
        "id": f"chunk{i+1}",   # id will be chunk1, chunk2, ...
        "pdf_id": pdf_id,
        "text": chunk,
        "embedding": embedding
    } for i, (chunk, embedding) in enumerate(zip(chunks, embeddings))]
    supabase.table("documents1").upsert(data).execute()

def retrieve_chunks(query, pdf_id, top_k=10):
    query_embedding = model.encode(query).tolist()
    response = supabase.rpc("match_documents", {
        "query_embedding": query_embedding,
        "match_count": top_k,
        "pdf_id_filter": pdf_id
    }).execute()
    relevant_chunk=[row["text"] for row in response.data] if response.data else []
    return relevant_chunk

Embed Chunks → Convert text chunks into embeddings (vectors).
Store in Supabase → Save text + embeddings for future queries.
Retrieve Chunks → Find the most relevant text chunks with semantic similarity search.

Refine with Hugging Face LLM

def refine_with_llm(relevant_chunk, question):
    refinement_input = "\n\n---\n\n".join(relevant_chunk)
    prompt = f"""
    Refine the following extracted text chunks for clarity, conciseness, and improved readability.
    Keep the technical meaning accurate and explain any complex terms simply if needed.
    Text to refine:
    {refinement_input}
    Question:
    {question}"""

 response = hf_client.chat.completions.create(
    model="mistralai/Mixtral-8x7B-Instruct-v0.1",
    messages=[
        {"role": "system", "content": "You are an expert technical editor and writer."},
        {"role": "user", "content": prompt}
    ],
    temperature=0.7,
    max_tokens=500
    )
    refined_text = response.choices[0].message.content
    return refined_text

This step ensures that even if retrieved chunks are messy or incomplete, the AI agent refines them into clear, concise, and context-aware answers.

Streamlit Front-End

import uuid
import os
import streamlit as st

st.set_page_config(page_title="PDF Q&A Assistant")
st.title("📄 Ask Questions About Your PDF")

uploaded_file = st.file_uploader("Upload a PDF", type="pdf")

if uploaded_file:
    with st.spinner("Processing PDF..."):
        pdf_path = f"temp_{uuid.uuid4().hex}.pdf"
        with open(pdf_path, "wb") as f:
            f.write(uploaded_file.read())
        pdf_id = hash_pdf(pdf_path)

        existing = supabase.table("documents1").select("id").eq("pdf_id", pdf_id).execute()
        if existing.data:
            st.warning("⚠️ This PDF has already been processed. You can still ask questions.")
        else:
            chunks = extract_and_chunk(pdf_path)
            embeddings = embed_chunks(chunks)
            store_to_supabase(chunks, embeddings, pdf_id)
        os.remove(pdf_path)
    st.success("PDF ready for Q&A.")

    question = st.text_input("Ask a question about the uploaded PDF:")
    if question:
        with st.spinner("Generating answer..."):
            results = retrieve_chunks(question, pdf_id)
            if not results:
                st.error("No relevant chunks found.")
            else:
                answer = refine_with_llm(results, question)
                st.markdown("### Answer:")
                st.write(answer)

Explanation:

UI Setup → Streamlit sets page config, title, and PDF uploader.
Temporary Save → Uploaded PDF is saved locally with a unique name.
Hashing → Generate an MD5 hash to uniquely identify the PDF.
Check Supabase → Skip processing if the PDF was already stored.
Extract & Chunk → Pull text from the PDF and split it into word chunks.
Embed Chunks → Convert chunks into vector embeddings for semantic search.
Store in Supabase → Save chunks, embeddings, and PDF ID in the database.
Clean Up → Remove the temporary PDF file after processing.
Ask Question → User inputs a question about the uploaded PDF.
Retrieve Chunks → Fetch most relevant chunks from Supabase via similarity search.
Refine Answer → LLM polishes the retrieved text into a clear, concise response.
Display Result → Show the AI-generated answer in the Streamlit app.

From PDF to Summary: Building an AI Agent with Python & Vector Databases - Basic

datatoinfinity — Mon, 11 Aug 2025 10:27:14 +0000

Live Demo: PDFSUMMARIZATION Site

Sample PDF: Download Here

Github CODE

The PDF Summarization AI Agent is an AI-powered tool that summarizes lengthy PDFs and answers questions based only on their content.It’s useful when you need a quick overview without reading the entire document.

Summarizes large PDF files into concise overviews.
Answers user questions only from the uploaded PDF.
Formats responses clearly and preserves technical accuracy.

Used By

Researchers → Extract key findings from academic papers.
Lawyers → Summarize contracts & compliance documents.
Business Analysts → Turn meeting transcripts into quick insights.
Finance Teams → Condense invoices & financial statements.
Students → Create study notes from textbooks.

Tech Used

Streamlit → Front-end & deployment.
LangChain → LLM integration & chaining workflows.
Hugging Face → Pre-trained AI models (e.g., Mixtral-8x7B).
Supabase → Vector database for storing PDF embeddings.

How It Works

Extract text from PDF. 2 Chunk the text into smaller segments (for large PDFs).
Embed each chunk into vector form using a transformer model.
Store embeddings in Supabase Vector DB.
Perform similarity search to find the most relevant chunks for a query.
Use a Hugging Face model to refine and format the answer.

Key Concepts

Chaining

A method of breaking a complex task into sequential steps, where the output of one step feeds into the next.

Embedding

A representation of text, images, or audio as points in a semantic vector space.
Similar items (e.g., mobile, smartphone, cell phone) are stored close together in this space.

Installation

pip install pdfplumber sentence-transformers supabase

pdfplumber → Extract text from PDF.
sentence-transformers → Convert text into embeddings.
supabase → Store and search embeddings.

Supabase Setup

Create a Supabase account.
Start a new project and copy:
- Project URL
- API Key
Enable vector extension:

CREATE EXTENSION IF NOT EXISTS vector SCHEMA extensions;

Create documents1 table:

CREATE TABLE documents1 (
    id TEXT PRIMARY KEY,
    text TEXT,
    pdf_id TEXT,
    embedding VECTOR(384)
);

Create similarity search function:

CREATE FUNCTION match_documents(
    query_embedding VECTOR(384),
    match_count INT
) RETURNS TABLE (
    id TEXT,
    text TEXT
) LANGUAGE plpgsql STABLE AS $$
BEGIN
    RETURN QUERY
    SELECT documents1.id, documents1.text
    FROM documents1
    ORDER BY documents1.embedding <-> query_embedding
    LIMIT match_count;
END;
$$;

PDF Processing

1. Upload PDF (Google Colab)

from google.colab import files
uploaded = files.upload()

2. Extract & Chunk

import pdfplumber
def extract_and_chunk(pdf_path, chunk_size=500):
    with pdfplumber.open(pdf_path) as pdf:
        text = "".join(page.extract_text() or "" for page in pdf.pages)
    chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
    return chunks

Store in Supabase

from supabase import create_client
from sentence_transformers import SentenceTransformer

supabase_url = "YOUR_SUPABASE_URL"
supabase_key = "YOUR_API_KEY"
supabase = create_client(supabase_url, supabase_key)

model = SentenceTransformer('all-MiniLM-L6-v2')

pdf_path = "Sample.pdf"
chunks = extract_and_chunk(pdf_path)
embeddings = model.encode(chunks).tolist()

data = [
    {"id": f"chunk_{i}", "text": chunk, "embedding": embedding, "pdf_id": "doc1"}
    for i, (chunk, embedding) in enumerate(zip(chunks, embeddings))
]

supabase.table("documents1").insert(data).execute()

Query Search

query = "What is the topic?"
query_embedding = model.encode(query).tolist()

response = supabase.rpc(
    "match_documents",
    {"query_embedding": query_embedding, "match_count": 3}
).execute()

relevant_chunks = [row["text"] for row in response.data]
print("\n---\n".join(relevant_chunks))

Hugging Face Integration

Create a Hugging Face account.
Generate a READ API token.

from huggingface_hub import InferenceClient
import os

client = InferenceClient(
    api_key=os.getenv("HUGGINGFACEHUB_API_TOKEN", "YOUR_HF_API_KEY")
)

Refinement with Mixtral-8x7B

prompt = f"""
Refine the following extracted text chunks for clarity, conciseness, and improved readability.
Keep the technical meaning accurate.

Text to refine:
{ "\n\n---\n\n".join(relevant_chunks) }
"""

response = client.chat.completions.create(
    model="mistralai/Mixtral-8x7B-Instruct-v0.1",
    messages=[
        {"role": "system", "content": "You are an expert technical editor."},
        {"role": "user", "content": prompt}
    ],
    temperature=0.7,
    max_tokens=500
)

print("\n Refined Output:\n")
print(response.choices[0].message.content)

Notes

Delete old data before inserting chunks from a new PDF to avoid * duplicate ID errors.
Hugging Face request cost & speed depend on the chosen model.
Supabase vector size (384) must match your embedding model output.

PDF upload → chunking → storing → querying → refining

Automated PDF Summarization Using AI Agents: LangChain + Hugging Face + Supabase + Streamlit - Basic

datatoinfinity — Fri, 08 Aug 2025 20:21:47 +0000

Build an AI Agent to Auto-Summarize PDFs with LangChain, Hugging Face, and Supabase

This idea came to me while working on a project with extensive documentation. It was time-consuming and overwhelming to extract only the important details.

I thought — what if I had an assistant that could read the entire document, summarize it, and even answer my questions? That’s when PDF Summarization AI Agent was born.

Live Demo: PDFSUMMARIZATION Site

Sample PDF: Download Here

Why This Matters

PDFs are everywhere — academic papers, contracts, reports, manuals — but manually skimming hundreds of pages isn’t scalable. This is especially painful for:

Researchers: Extract key findings from long papers.
Lawyers: Summarize contracts & compliance docs.
Business Analysts: Turn meeting transcripts into quick insights.
Finance Teams: Condense invoices & statements.
Students: Turn textbooks into study notes.

Tech Stack

Tool	Purpose
Streamlit	Easy Python web app frontend
LangChain	Handles LLM workflows & chaining
Hugging Face	Provides pre-trained AI models
Supabase	Vector DB for semantic search
PyPDF2	Extracts text from PDFs

How It Works (High-Level Flow)

Upload PDF(s)
Extract Text → using PyPDF2
Chunk & Embed → LangChain breaks text into smaller parts
Store in Supabase → for semantic search
Query AI → Hugging Face / Gemini answers based on context
Return Summary or Q&A Answer

Setup Instructions

Get a Google AI Studio API Key
- Visit Google AI Studio API Key
- Click Create API Key (new project)
- Copy your key.

Install Required Libraries

 
bash
pip install langchain langchain-core langchain-google-genai PyPDF2

Lets Start With Basic

Working with AI Agent with Gemini API key.

import warnings
warnings.filterwarnings("ignore")
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnableSequence
from langchain_google_genai import ChatGoogleGenerativeAI
import PyPDF2
import os

# Set your Gemini API key
os.environ["GOOGLE_API_KEY"] = "API"

# Extract text from multiple PDFs
def extract_text_from_pdf(pdf_paths):
    text = ""
    for pdf_path in pdf_paths:  # Iterate over the list of PDF paths
        try:
            with open(pdf_path, "rb") as file:
                reader = PyPDF2.PdfReader(file)
                for page in reader.pages:
                    page_text = page.extract_text()
                    if page_text:
                        text += page_text + "\n"  # Add newline to separate text from different PDFs
        except FileNotFoundError:
            text += f"Error: The file '{pdf_path}' was not found.\n"
    return text

# Define prompt template
template = """
You are an expert AI assistant. Use the information provided for answering the question
Context: {context}
Question: {question}
Answer:
"""
prompt = PromptTemplate(input_variables=["context", "question"], template=template)

# Initialize Gemini LLM and chain
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", google_api_key=os.environ["GOOGLE_API_KEY"])
qa_chain = RunnableSequence(prompt | llm)  # Updated to use RunnableSequence

# Function to answer questions
def answer_question(pdf_text, question):
    if not pdf_text:
        return "Error: No text extracted from the PDFs."
    answer = qa_chain.invoke({"context": pdf_text, "question": question})  # Updated to use invoke
    return answer.content if hasattr(answer, 'content') else answer  # Handle response content

# Example usage
if __name__ == "__main__":
    pdf_paths = ["sample3.pdf"]  # Replace with your list of PDF file paths
    pdf_text = extract_text_from_pdf(pdf_paths)  # Pass the list of PDF paths
    question = input("Enter text: ")
    answer = answer_question(pdf_text, question)
    print(f"Question: {question}\nAnswer: {answer}")
    # print(len(pdf_text))  # Uncomment to print the length of extracted text

Full Code Walkthrough

Here’s a detailed explanation of every part of the code for those who want the deep dive.

extract_text_from_pdf

Loops through PDF file paths
Uses PyPDF2 to read & extract text page-by-page
Adds newlines to separate pages

Prompt Template

{context} = extracted PDF text
{question} = user’s query
AI responds only based on the provided context

LLM Initialization

Uses Gemini 1.5 Flash (fast, cost-effective)
RunnableSequence pipes the prompt output into the AI model

This code is basic, it is just giving you hint how we going to extract the data from pdf. Try it by yourself, add file those are bigger in size then you will know it drawback then we will cover those drawback.

Pattern Printing Series: Alphabet Patterns Explained with Logic

datatoinfinity — Fri, 01 Aug 2025 18:09:21 +0000

Alphabet patterns are a classic way to improve your understanding of nested loops, ASCII values, and pattern logic in programming. Whether you're preparing for an interview or just sharpening your logic, these patterns offer a fun and educational challenge. Let’s dive into some fascinating examples and decode the logic behind them!

Alphabet Pattern 1:

A 
A B 
A B C 
A B C D 
A B C D E

row=5
for i in range(1,row+1):
    for j in range(i):
        print(chr(j+65),end=" ")
    print()

Explanation:

row=5 define how many rows and column needed.
for i in range(1,row+1) control the number of rows.
for j in range(i) control the number of column.
chr(j+65) Converts j to its corresponding uppercase alphabet using ASCII. chr(65) = 'A', chr(66) = 'B', and so on.

Alphabet Pattern 2:

A B C D E 
A B C D 
A B C 
A B 
A

row=5
for i in range(row,0,-1):
    for j in range(i):
        print(chr(j+65),end=" ")
    print()

Explanation:

row=5 define how many rows and column needed.
for i in range(row,0,-1) control the number of rows.
for j in range(i) control the number of column.
chr(j+65) Converts j to its corresponding uppercase alphabet using ASCII. chr(65) = 'A', chr(66) = 'B', and so on.

Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-1
Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-2
Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-3
Mater Logic With Number Pattern in Python - 1

Mater Logic With Number Pattern in Python - 1

datatoinfinity — Fri, 01 Aug 2025 15:32:14 +0000

Imagine looking at a simple series of numbers on a screen—and suddenly realizing you can predict the next move, decipher hidden logic, and even create your own mesmerizing patterns with just a few lines of code. That’s the secret magic behind number patterns in Python, and today, we’ll not only unravel how they work, but give you the tools to build your own “number masterpieces” using the power of simple logic!

Number Pattern 1.

row=5
for i in range(1,row+1):
    for j in range(1,i+1):
        print(j,end=" ")
    print()

Explanation:

Simple explanation for this we just print column.
for i in range(1,row+1) controls the number of row.
for j in range(1,i+1) controls the number of column.

Dry Run Table

Row (i)	Inner Loop (j in range(i))	Output
1	0	1
2	0, 1	1 2
3	0, 1, 2	1 2 3
4	0, 1, 2, 3	1 2 3 4
5	0, 1, 2, 3, 4	1 2 3 4 5

Other way around;

row=5
for i in range(1,row+1):
    for j in range(1,i+1):
        print(i,end=" ")
    print()

Just print row, print(i,end=" ")

Number Pattern 2.

1 
2 3 
4 5 6 
7 8 9 10 
11 12 13 14 15 
16 17 18 19 20 21

row=5
num=0
for i in range(1,row+1):
    for j in range(i):
        num=num+1
        print(num,end=" ")
    print()

Explanation:

row=5 number of rows and column.
num=0 will be incremented for every printed number.
for i in range(1,row+1) print number of rows.
for j in range(i) print number of column.
num=num+1 advances the sequence and prints each number, spacing them on the same line.

Dry Run Table

Row (i)	Inner Loop (j in range(i))	num=0, num=num+1
1	0	1
2	0, 1	2 3
3	0, 1, 2	4 5 6
4	0, 1, 2, 3	7 8 9 10
5	0, 1, 2, 3, 4	11 12 13 14 15

Find the Most Frequent Word in Text using Python | NLP Basics Explained

datatoinfinity — Mon, 28 Jul 2025 13:23:20 +0000

Ever wondered which word appears the most in a text? Whether you’re analyzing customer feedback, blog posts, or any text data, finding the most frequent word is a common Natural Language Processing (NLP) task. In this post, we’ll explore how to do it in Python, why it matters, and some real-world applications.

Why Find the Most Frequent Word?

Word frequency analysis helps in:

Keyword extraction for SEO and blogs.
Sentiment analysis in customer feedback.
Topic modeling in large text datasets.
Chatbots & AI models for training data.

Steps to Find the Most Frequent Word

Prepare Text
Tokenize and Stop Word Removal
Count Word Frequency
DataFrame for word and frequency

Code

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
import pandas as pd
import nltk

corpora = [
    'Artificial Intelligence is transforming the world. AI is used in healthcare, finance, and education. Machine Learning, a branch of AI, powers recommendation systems and predictive analytics.',
    'Success is not the key to happiness. Happiness is the key to success. If you love what you do, you will be successful.',
    'The product is great. The quality is great and the price is reasonable. I will recommend this product to my friends because the product is worth the price.',
    'The football match was intense. The players gave their best. The match ended with a thrilling victory. Fans celebrated the match with great excitement.'
]

stop_words = set(stopwords.words('english'))

all_data = []  

for i, corpus in enumerate(corpora, start=1):
    words = [word.lower() for word in word_tokenize(corpus) 
             if word.lower() not in stop_words and word.isalpha()]
    
    
    word_freq = {}
    for word in words:
        word_freq[word] = word_freq.get(word, 0) + 1
    
   
    for word, freq in word_freq.items():
        all_data.append({"Corpus": f"Corpus_{i}", "Word": word, "Frequency": freq})

df = pd.DataFrame(all_data)

# Sort by frequency for better readability
df = df.sort_values(by=["Corpus", "Frequency"], ascending=[True, False])

print(df)

Importing Library and Module

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
import pandas as pd
import nltk

Explanation:

import nltk NLTK, or the Natural Language Toolkit, is a prominent open-source library in Python designed for Natural Language Processing (NLP).
from nltk.corpus import stopwords it import stopwords corpus.
from nltk.tokenize import word_tokenize it helps to tokenize sentence which mean extract words from sentence.

Prepare the Text

corpora = [
    'Artificial Intelligence is transforming the world. AI is used in healthcare, finance, and education. Machine Learning, a branch of AI, powers recommendation systems and predictive analytics.',
    'Success is not the key to happiness. Happiness is the key to success. If you love what you do, you will be successful.',
    'The product is great. The quality is great and the price is reasonable. I will recommend this product to my friends because the product is worth the price.',
    'The football match was intense. The players gave their best. The match ended with a thrilling victory. Fans celebrated the match with great excitement.'
]

Process each Corpus

stop_words = set(stopwords.words('english'))
all_data = [] 
for i, corpus in enumerate(corpora, start=1):
    words = [word.lower() for word in word_tokenize(corpus) 
             if word.lower() not in stop_words and word.isalpha()]

Explanation:

stop_words=set(stopwords.words('english')) it load all the stopwords in english.
all_data=[] to store all word-frequency data.
for i,corpus in enumerate(corpora,start=1)
- Loop over the corpora list.
- enumerate gives both:
  - i index of the corpus
  - corpus actual text string
words = [word.lower() for word in word_tokenize(corpus) if word.lower() not in stop_words and word.isalpha()]
- word.lower() make it text in lowercase.
- for word in word_tokenize(corpus) iterate through tokenize word which is tokenized using word_tokenize() function.
- if word.lower() not in stop_words and word.isalpha() now it will say if word is in lowercase not in stop_words and alphabet.

Count Frequency

word_freq = {}
    for word in words:
        word_freq[word] = word_freq.get(word, 0) + 1

Explanation:

word_freq={} make dictionary where it store frequency of words.
for word in words iterate through words list.
word_freq[word]=word_freq.get(word, 0) + 1 word_freq.get(word, 0) checks if the word exists in the dictionary. If yes, returns its current count. If no, returns 0 (default value). Then + 1 increments the count by 1.

Convert to list of dicts for DataFrame

for word, freq in word_freq.items():
        all_data.append({"Corpus": f"Corpus_{i}", "Word": word, "Frequency": freq})

Create Pandas DataFrame

df = pd.DataFrame(all_data)
df = df.sort_values(by=["Corpus", "Frequency"], ascending=[True, False])
print(df)

Whole Code

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
import pandas as pd
import nltk

# Download NLTK resources (run once)
# nltk.download('punkt')
# nltk.download('stopwords')

# Define corpora
corpora = [
    'Artificial Intelligence is transforming the world. AI is used in healthcare, finance, and education. Machine Learning, a branch of AI, powers recommendation systems and predictive analytics.',
    'Success is not the key to happiness. Happiness is the key to success. If you love what you do, you will be successful.',
    'The product is great. The quality is great and the price is reasonable. I will recommend this product to my friends because the product is worth the price.',
    'The football match was intense. The players gave their best. The match ended with a thrilling victory. Fans celebrated the match with great excitement.'
]

stop_words = set(stopwords.words('english'))

all_data = []  # to store all word-frequency data

# Process each corpus
for i, corpus in enumerate(corpora, start=1):
    words = [word.lower() for word in word_tokenize(corpus) 
             if word.lower() not in stop_words and word.isalpha()]
    
    # Count frequency
    word_freq = {}
    for word in words:
        word_freq[word] = word_freq.get(word, 0) + 1
    
    # Convert to list of dicts for DataFrame
    for word, freq in word_freq.items():
        all_data.append({"Corpus": f"Corpus_{i}", "Word": word, "Frequency": freq})

# Create Pandas DataFrame
df = pd.DataFrame(all_data)

# Sort by frequency for better readability
df = df.sort_values(by=["Corpus", "Frequency"], ascending=[True, False])

print(df)

Output:

     Corpus            Word  Frequency
4   Corpus_1              ai          2
0   Corpus_1      artificial          1
1   Corpus_1    intelligence          1
2   Corpus_1    transforming          1
3   Corpus_1           world          1
5   Corpus_1            used          1
6   Corpus_1      healthcare          1
7   Corpus_1         finance          1
8   Corpus_1       education          1
9   Corpus_1         machine          1

Download nltk important library
Install Stopwords module

Information Extraction in NLP

Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-3

datatoinfinity — Fri, 18 Jul 2025 13:26:56 +0000

If you're a visual learner, pattern problems in Python are the perfect playground.
From simple triangles to pyramids and diamonds — every pattern teaches you how loops and logic work together. Ready to visualize code like never before?

Inverted Pyramid Using Nested Loop in Python.

* * * * * * * * * 
  * * * * * * * 
    * * * * * 
      * * * 
        *

Before diving in, check out the previous patterns:
Reverse Right-Angled Triangle Pattern

Pyramid Using Nested Loop in Python
Once you understand both the pyramid and the reverse right-angled triangle, the logic behind the inverted pyramid becomes intuitive.

Code

row=5
for i in range(row,0,-1):
    for j in range(row-i):
        print(" ",end=" ")
    for k in range(2*i-1):
        print("*",end=" ")
    print()

Explanation:

range(row, 0, -1) makes the rows decrease from 5 to 1, it will inverse the Pyramid.
row - i controls the leading spaces to shift the stars rightward.
2*i - 1 ensures that each row has the correct number of stars to form a centered triangle.

Diamond Pyramid Pattern.

        * 
      * * * 
    * * * * * 
  * * * * * * * 
* * * * * * * * * 
  * * * * * * * 
    * * * * * 
      * * * 
        *

Concept:

First, you build the pyramid (top half).
Then, you mirror it by adding the inverted pyramid (bottom half).
Both parts share a common middle (the row with 9 stars).

Code

row=5
for i in range(1,row+1):       # Upper Pyramid
    for j in range(row-i):
        print(" ",end=" ")
    for k in range(2*i-1):
        print("*",end=" ")
    print()
for i in range(row,0,-1):      # Inverted Pyramid
    for j in range(row-i):
        print(" ",end=" ")
    for k in range(2*i-1):
        print("*",end=" ")
    print()

Explanation

range(1, row+1) builds the upper pyramid.
range(row, 0, -1) builds the inverted lower half.
Since both pyramids use the same logic (2*i - 1 stars and shifting spaces), the transition is seamless.

Explained the Inverted Right Angle Triangle Pattern
Explained the Pyramid Pattern

Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-2

datatoinfinity — Thu, 17 Jul 2025 15:15:08 +0000

Ever looked at a pattern problem and thought, "Why can't I get this simple star pyramid to align properly?"
You're not alone. Pattern problems might look easy, but they’re the secret weapon for building powerful logic using loops — and today, we’re going to decode them step by step.

Pattern: Mirrored Right Angle Triangle Using Nested Loop in Python.

We’re going to build this simple star pattern and understand the logic behind nested loops:

Logic Overview

We're using two for loops:

The outer loop controls the rows
The inner loop controls the columns (stars)

        * 
      * * 
    * * * 
  * * * * 
* * * * *

Logic Overview

To form this pyramid:

You need spaces before star to centre-align them.
The number of stars increases with each row.

	1	2	3	4	5
1	(1,1)	(1,2)	(1,3)	(1,4)	*
2	(2,1)	(2,2)	(2,3)	*	*
3	(3,1)	(3,2)	*	*	*
4	(4,1)	*	*	*	*
5	*	*	*	*	*

As you see row increase the column decrease. (1,4),(2,3),(3,2),(4,1). or we can say,

Spaces increases and star decreases.

Code

row=5
for i in range(1,row+1):
    for j in range(1,row-i+1):
        print(" ",end=" ")
    for k in range(1,i+1):
        print("*",end=" ")
    print()

Explanation

row=5 this is for how many rows you want.
for i in range(1,row+1): This is the outer loop, which runs 5 times (from 1 to 5). Each loop iteration represents one row of the output.
for j in range(1,row-i+1): This inner loop is responsible for printing spaces before the stars. Why? To make the pattern right-aligned, each row needs less space than the previous one.
for k in range(1,i+1): This loop prints the stars * in each row.

Dry Run Table

i	j	k
Row(i)	Spaces(row-i+1)	Stars(i+1)	Output
1	4	1	*
2	3	2	* *
3	2	3	* * *
4	1	4	* * * *
5	0	5	* * * * *

Summary:
Outer loop (i): controls the number of rows, increasing each time
Inner loop (j): controls the trailing spaces.
Inner loop (k): prints the star.
end=" " prints stars on the same line
print() moves to the next line after each row

Pattern: Pyramid Using Nested Loop in Python

Output:

        * 
      * * * 
    * * * * * 
  * * * * * * * 
* * * * * * * * *

Step by Step Explanation

row=5
for i in range(1,row+1):
    for j in range(row-i):
        print(" ",end=" ")
    for k in range(2*i-1):
        print("*",end=" ")
    print()

	1	2	3	4	5	6	7	8	9
1					*
2				*	*	*
3			*	*	*	*	*
4		*	*	*	*	*	*	*
5	*	*	*	*	*	*	*	*	*

i->1,row+1, how many row you want.
j->row-i, it is for spaces, as it decreasing.
k2*i-1, it is to print star.

i	j	k
1	4	1
2	3	3
3	2	5
4	1	7
5	0	9

Summary
Outer loop (i): controls the number of rows, increasing each time
Inner loop (j): controlling the spaces which is decreasing each time.
Inner loop (k): It will print the star.
end=" " prints stars on the same line
print() moves to the next line after each row

How to make inverted pyramid?

Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-1
Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-3

Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-1

datatoinfinity — Mon, 14 Jul 2025 19:15:58 +0000

Pattern problems are the gym for your brain. They strengthen your looping logic, thinking in steps, and algorithmic intuition — all while keeping it fun.

Pattern: Right-Angled Triangle Using Nested Loops in Python

We’re going to build this simple star pattern and understand the logic behind nested loops:

Logic Overview

We're using two for loops:

The outer loop controls the rows
The inner loop controls the columns (stars)

Output:

* 
* * 
* * * 
* * * * 
* * * * *

Step-by-Step Explanation

row = 5
for i in range(1, row + 1):       # Outer loop: for each row (1 to 5)
    for j in range(i):            # Inner loop: print i stars
        print("*", end=" ")
    print()                       # Move to the 
next line after each row

Why range(1, row + 1)?
- range(n) goes from 0 to n-1
- So range(1, row + 1) gives: 1, 2, 3, 4, 5
- That means we’ll have 5 rows, as required.
Why range(i) for inner loop?
In each row, the number of stars is equal to the row number:

Row 1 → 1 star
Row 2 → 2 stars
...
Row 5 → 5 stars

Dry Run Table

Row (i)	Inner Loop (j in range(i))	Output
1	0	*
2	0, 1	* *
3	0, 1, 2	* * *
4	0, 1, 2, 3	* * * *
5	0, 1, 2, 3, 4	* * * * *

Notice how the number of * matches the current row number i.

Summary
Outer loop (i) = rows
Inner loop (j) = columns (stars)
print("*", end=" ") prints stars on the same line
print() moves to the next line after each row

Reverse Right-Angled Triangle Pattern in Python

Output:

* * * * * 
* * * * 
* * * 
* * 
*

Step-by-Step Explanation:

row = 5
for i in range(row, 0, -1):       # Outer loop: from 5 to 1
    for j in range(i):            # Inner loop: print i stars
        print("*", end=" ")
    print()                       # Move to the next line after each row

row = 5 We want 5 rows, so we initialize the row variable as 5. You can increase this number for a larger pattern.
for i in range(row, 0, -1)
- This loop goes in reverse from row to 1.
- range(5, 0, -1) outputs: 5, 4, 3, 2, 1
- The third parameter -1 is the step; without it, the loop won’t run in reverse.
for j in range(i)
- This inner loop controls how many stars get printed in each row — same as the current value of i. So: Row 1 → 5 stars Row 2 → 4 stars ... Row 5 → 1 star

Dry Run Table

Row (i)	Inner Loop (j in range(i))	Output
5	0 1 2 3 4	* * * * *
4	0 1 2 3	* * * *
3	0 1 2	* * *
2	0 1	* *
1	0	*

Summary:
Outer loop (i): controls the number of rows, decreasing each time
Inner loop (j): prints stars equal to the current row number
end=" " prints stars on the same line
print() moves to the next line after each row

What if we make mirror image of Right-Angled Triangle Using Nested Loops in Python?

Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-2
Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-3

Information Extraction in NLP: Techniques, Tools & Real-World Examples

datatoinfinity — Sat, 12 Jul 2025 12:38:27 +0000

Ever wondered how search engines pull facts from millions of documents or how chatbots recognize names, dates, and numbers in your messages? That’s the magic of information extraction in NLP. It’s the process of transforming unstructured text into structured, actionable data a core part of modern AI systems.

Task

Let's do a task ask ChatGPT about yourself.

What is Text Extraction Information?

As name suggested it will extract information from unstructured or semi-structured text. There many technique is used to identify entity, entity name, action and event. It gives standardize format which is stored in rows and column.

Text Extraction Process

Name Entity Recognition:- It is information extraction task to identify name, organisation, date in unstructured text.
Relation Extraction:- It basically say the relation between entity and data source.
Event Extraction:- It recognise the action needs to be done. Like appointment or meeting.
Sentiment Analysis:- As name suggested sentiment, the feeling behind the sentence. Well feeling are abstract we just feel don't see but in this technique the model identify according to your word which you have written.

"My order of a Samsung Galaxy S23 from your Seattle warehouse hasn’t arrived yet, and it was supposed to be delivered by July 10, 2025."

Example Text Extraction Process:

Input Text: The customer’s message.
Named Entity Recognition (NER): The NLP system identifies:
- Product: Samsung Galaxy S23
- Location: Seattle
- Date: July 10, 2025
Keyword Extraction: Identifies key terms like “order,” “delivered,” and “warehouse” to understand the context.
Relation Extraction: Detects the relationship between “order” and “hasn’t arrived” to flag a delivery issue.
Output: Structured data:
{Product: "Samsung Galaxy S23", Location: "Seattle", Delivery Date: "July 10, 2025", Issue: "Non-delivered"}
Application: The chatbot uses this data to:
- Query the order database for the specific product and delivery status.
- Respond with: “I’m sorry, it seems your Samsung Galaxy S23 order from our Seattle warehouse is delayed. Let me check the status and provide an update.”
Sentiment here Negative.

Real-World Impact: This extraction enables the chatbot to quickly understand and address the customer’s issue, improving response time and user satisfaction. It’s used in customer support, logistics tracking, and automated ticketing systems.

Learn Text Extraction Basic:

Word Cloud in NLP: A Complete Guide to Visualizing Text with Python

datatoinfinity — Fri, 11 Jul 2025 19:18:28 +0000

Ever stared at a mountain of text and thought, “Where do I even begin?” Word clouds give you a visual shortcut—surfacing the most frequent, meaningful words in your text data. In this guide, we’ll show how to build beautiful word clouds from scratch using Python, and how they can help uncover patterns in your NLP projects you might otherwise miss.

What is Word Cloud?

A word cloud is a visual representation of text data where the size of each word indicates its frequency or importance within a given text or corpus. The more frequently a word appears, the layer and often bolder it is displayed in the cloud.

Example:

In customer reviews, big words like "price", "quality", or "service" indicate common discussion points.

Note: Word Cloud are not analytical models, they are visual aids that complement, not replace, deeper NLP tasks like classification, sentiment analysis or topic modelling.

Install Python library for wordcloud:

pip install wordlcoud

Code

Basic

import matplotlib.pyplot as plt
from wordcloud import WordCloud

text="India, officially the Republic of India Hindi: Bhārat Gaṇarājya, is a country in South Asia. It is the seventh-largest country by area, the second-most populous country, and the most populous democracy in the world. Bounded by the Indian Ocean on the south, the Arabian Sea on the southwest, and the Bay of Bengal on the southeast, it shares land borders with Pakistan to the west; China, Nepal, and Bhutan to the north; and Bangladesh and Myanmar to the east. In the Indian Ocean, India is in the vicinity of Sri Lanka and the Maldives; its Andaman and Nicobar Islands share a maritime border with Thailand, Myanmar, and Indonesia."

wc=WordCloud().generate(text)
plt.imshow(wc)
plt.axis('off')
plt.show()

Output:

WordCloud.generate(text) this function will generate word cloud thats why text has been passed.
plt.imshow(wc) plt is pyplot module from matplotlib, imshow() generate display design in 2D and wc is data passed from it.
plt.axis('off') hide all visual components of x-axis and y-axis.
plt.show() function from the matplotlib.pyplot module that serves to display all currently active figures.

Word Cloud without Stop Words

from nltk.corpus import stopwords
stopword=stopwords.words('english')
wc=WordCloud(width=1000,height=720,margin=2,max_words=100,background_color='white',stopwords=stopword)
plt.imshow(wc.generate(text))
wc=WordCloud().generate(text)
plt.axis('off')
plt.show()

Output:

from nltk.corpus import stopword it will import dictionary of stopword.
stopword=stopwords.words('english') any word from an english that is stop word.
wc=WordCloud(width=1000,height=720,margin=2,max_words=100,background_color='white',stopwords=stopword) in wordcloud() function:

width=1000 width of frame which should be display.
height=720 height of frame which should be display.
margin=2 margin of wordcloud in the frame.
max_words=100 we want maximum 100 word from corpus or text.
stopwords=stopword it is used to remove stopword from the cloud.

Learn about Part of Speech (POS)
Learn about Name Entity Recognation