<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: datatoinfinity</title>
    <description>The latest articles on DEV Community by datatoinfinity (@datatoinfinity).</description>
    <link>https://dev.to/datatoinfinity</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3119114%2Ffdc71707-4d05-4281-b15d-cbcc15387b19.png</url>
      <title>DEV Community: datatoinfinity</title>
      <link>https://dev.to/datatoinfinity</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/datatoinfinity"/>
    <language>en</language>
    <item>
      <title>Optimized PDF Q&amp;A Assistant with Streamlit, LangChain, Hugging Face, and Supabase</title>
      <dc:creator>datatoinfinity</dc:creator>
      <pubDate>Wed, 20 Aug 2025 10:05:54 +0000</pubDate>
      <link>https://dev.to/datatoinfinity/build-an-ai-pdf-summarizer-with-python-langchain-supabase-and-streamlit-5hcf</link>
      <guid>https://dev.to/datatoinfinity/build-an-ai-pdf-summarizer-with-python-langchain-supabase-and-streamlit-5hcf</guid>
      <description>&lt;p&gt;&lt;strong&gt;Live Demo:&lt;/strong&gt; &lt;a href="https://khushboogup-pdffolder-app1-f9ibs2.streamlit.app/" rel="noopener noreferrer"&gt;PDFSUMMARIZATION Site&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Github&lt;/strong&gt; &lt;a href="https://github.com/khushboogup/Pdffolder" rel="noopener noreferrer"&gt;CODE&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimized PDF Q&amp;amp;A Assistant with Streamlit, LangChain, Hugging Face, and Supabase
&lt;/h2&gt;

&lt;p&gt;When working on AI projects, you might notice that code runs fast on Google Colab but slows down on a local machine. The solution is to make the pipeline &lt;strong&gt;optimized and efficient&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In this blog, I’ll walk you through building a PDF Q&amp;amp;A Assistant that:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Upload a PDF → hash &amp;amp; check if already stored → extract, embed, and save chunks in Supabase → take user’s question → retrieve relevant chunks → refine with LLM → display answer.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech Stack Used
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Streamlit → Front-end UI and deployment&lt;/li&gt;
&lt;li&gt;LangChain → Works with LLMs, connecting the AI “brain”&lt;/li&gt;
&lt;li&gt;Hugging Face → Provides powerful pre-trained models&lt;/li&gt;
&lt;li&gt;Supabase → Vector database for storing and retrieving PDF data&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Configuration
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sentence_transformers import SentenceTransformer
from supabase import create_client
from huggingface_hub import InferenceClient

SUPABASE_URL = st.secrets["SUPABASE_URL"]
SUPABASE_KEY = st.secrets["SUPABASE_KEY"]
HF_TOKEN = st.secrets["HF_TOKEN"]  # Hugging Face token

supabase = create_client(SUPABASE_URL, SUPABASE_KEY)
model = SentenceTransformer('all-MiniLM-L6-v2')
hf_client = InferenceClient(api_key=HF_TOKEN)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, Supabase is used for storage, a SentenceTransformer model handles embeddings, and Hugging Face provides an LLM client for inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hash and Extract PDF Data
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import fitz  # PyMuPDF (faster alternative to pdfplumber)
import hashlib

def hash_pdf(pdf_path):
    with open(pdf_path, "rb") as f:
        return hashlib.md5(f.read()).hexdigest()

def extract_and_chunk(pdf_path, chunk_size=500):
    doc = fitz.open(pdf_path)
    text = " ".join([page.get_text() for page in doc])
    words = text.split()
    chunks = [' '.join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]
    return chunks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;hashlib&lt;/code&gt; → creates a unique fingerprint (hash) for the PDF, preventing duplicate processing.&lt;br&gt;
&lt;code&gt;fitz&lt;/code&gt; → efficiently extracts text from the PDF and splits it into manageable chunks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Embed, Store, and Retrieve
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def embed_chunks(chunks):
    return model.encode(chunks, batch_size=16, show_progress_bar=True).tolist()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def store_to_supabase(chunks, embeddings, pdf_id):
    data = [{
        "id": f"chunk{i+1}",   # id will be chunk1, chunk2, ...
        "pdf_id": pdf_id,
        "text": chunk,
        "embedding": embedding
    } for i, (chunk, embedding) in enumerate(zip(chunks, embeddings))]
    supabase.table("documents1").upsert(data).execute()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def retrieve_chunks(query, pdf_id, top_k=10):
    query_embedding = model.encode(query).tolist()
    response = supabase.rpc("match_documents", {
        "query_embedding": query_embedding,
        "match_count": top_k,
        "pdf_id_filter": pdf_id
    }).execute()
    relevant_chunk=[row["text"] for row in response.data] if response.data else []
    return relevant_chunk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;Embed Chunks&lt;/code&gt; → Convert text chunks into embeddings (vectors).&lt;br&gt;
&lt;code&gt;Store in Supabase&lt;/code&gt; → Save text + embeddings for future queries.&lt;br&gt;
&lt;code&gt;Retrieve Chunks&lt;/code&gt; → Find the most relevant text chunks with semantic similarity search.&lt;/p&gt;

&lt;h2&gt;
  
  
  Refine with Hugging Face LLM
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def refine_with_llm(relevant_chunk, question):
    refinement_input = "\n\n---\n\n".join(relevant_chunk)
    prompt = f"""
    Refine the following extracted text chunks for clarity, conciseness, and improved readability.
    Keep the technical meaning accurate and explain any complex terms simply if needed.
    Text to refine:
    {refinement_input}
    Question:
    {question}"""

 response = hf_client.chat.completions.create(
    model="mistralai/Mixtral-8x7B-Instruct-v0.1",
    messages=[
        {"role": "system", "content": "You are an expert technical editor and writer."},
        {"role": "user", "content": prompt}
    ],
    temperature=0.7,
    max_tokens=500
    )
    refined_text = response.choices[0].message.content
    return refined_text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This step ensures that even if retrieved chunks are messy or incomplete, the AI agent refines them into clear, concise, and context-aware answers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Streamlit Front-End
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import uuid
import os
import streamlit as st

st.set_page_config(page_title="PDF Q&amp;amp;A Assistant")
st.title("📄 Ask Questions About Your PDF")

uploaded_file = st.file_uploader("Upload a PDF", type="pdf")

if uploaded_file:
    with st.spinner("Processing PDF..."):
        pdf_path = f"temp_{uuid.uuid4().hex}.pdf"
        with open(pdf_path, "wb") as f:
            f.write(uploaded_file.read())
        pdf_id = hash_pdf(pdf_path)

        existing = supabase.table("documents1").select("id").eq("pdf_id", pdf_id).execute()
        if existing.data:
            st.warning("⚠️ This PDF has already been processed. You can still ask questions.")
        else:
            chunks = extract_and_chunk(pdf_path)
            embeddings = embed_chunks(chunks)
            store_to_supabase(chunks, embeddings, pdf_id)
        os.remove(pdf_path)
    st.success("PDF ready for Q&amp;amp;A.")

    question = st.text_input("Ask a question about the uploaded PDF:")
    if question:
        with st.spinner("Generating answer..."):
            results = retrieve_chunks(question, pdf_id)
            if not results:
                st.error("No relevant chunks found.")
            else:
                answer = refine_with_llm(results, question)
                st.markdown("### Answer:")
                st.write(answer)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Explanation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;UI Setup&lt;/strong&gt; → Streamlit sets page config, title, and PDF uploader.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temporary Save&lt;/strong&gt; → Uploaded PDF is saved locally with a unique name.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hashing&lt;/strong&gt; → Generate an MD5 hash to uniquely identify the PDF.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check Supabase&lt;/strong&gt; → Skip processing if the PDF was already stored.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extract &amp;amp; Chunk&lt;/strong&gt; → Pull text from the PDF and split it into word chunks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embed Chunks&lt;/strong&gt; → Convert chunks into vector embeddings for semantic search.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Store in Supabase&lt;/strong&gt; → Save chunks, embeddings, and PDF ID in the database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clean Up&lt;/strong&gt; → Remove the temporary PDF file after processing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ask Question&lt;/strong&gt; → User inputs a question about the uploaded PDF.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieve Chunks&lt;/strong&gt; → Fetch most relevant chunks from Supabase via similarity search.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Refine Answer&lt;/strong&gt; → LLM polishes the retrieved text into a clear, concise response.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Display Result&lt;/strong&gt; → Show the AI-generated answer in the Streamlit app.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://dev.to/datatoinfinity/from-pdf-to-summary-building-an-ai-agent-with-python-vector-databases-basic-b2f"&gt;From PDF to Summary: Building an AI Agent with Python &amp;amp; Vector Databases - Basic&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>webdev</category>
      <category>ai</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>From PDF to Summary: Building an AI Agent with Python &amp; Vector Databases - Basic</title>
      <dc:creator>datatoinfinity</dc:creator>
      <pubDate>Mon, 11 Aug 2025 10:27:14 +0000</pubDate>
      <link>https://dev.to/datatoinfinity/from-pdf-to-summary-building-an-ai-agent-with-python-vector-databases-basic-b2f</link>
      <guid>https://dev.to/datatoinfinity/from-pdf-to-summary-building-an-ai-agent-with-python-vector-databases-basic-b2f</guid>
      <description>&lt;p&gt;&lt;strong&gt;Live Demo:&lt;/strong&gt; &lt;a href="https://khushboogup-pdffolder-app1-f9ibs2.streamlit.app/" rel="noopener noreferrer"&gt;PDFSUMMARIZATION Site&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Sample PDF:&lt;/strong&gt; &lt;a href="https://drive.google.com/drive/folders/1YUpHTfnXBK7hzQQszIuLCpylHRwrcLNo?usp=share_link" rel="noopener noreferrer"&gt;Download Here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Github&lt;/strong&gt; &lt;a href="https://github.com/khushboogup/From-PDF-to-Summary-Building-an-AI-Agent-with-Python-Vector-Databases---Basic?tab=readme-ov-file" rel="noopener noreferrer"&gt;CODE&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The PDF Summarization AI Agent is an AI-powered tool that summarizes lengthy PDFs and answers questions based only on their content.It’s useful when you need a quick overview without reading the entire document.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Summarizes large PDF files into concise overviews.&lt;/li&gt;
&lt;li&gt;Answers user questions only from the uploaded PDF.&lt;/li&gt;
&lt;li&gt;Formats responses clearly and preserves technical accuracy.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Used By
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Researchers&lt;/strong&gt; → Extract key findings from academic papers.&lt;br&gt;
&lt;strong&gt;Lawyers&lt;/strong&gt; → Summarize contracts &amp;amp; compliance documents.&lt;br&gt;
&lt;strong&gt;Business Analysts&lt;/strong&gt; → Turn meeting transcripts into quick insights.&lt;br&gt;
&lt;strong&gt;Finance Teams&lt;/strong&gt; → Condense invoices &amp;amp; financial statements.&lt;br&gt;
&lt;strong&gt;Students&lt;/strong&gt; → Create study notes from textbooks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech Used
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://streamlit.io" rel="noopener noreferrer"&gt;Streamlit&lt;/a&gt; → Front-end &amp;amp; deployment.&lt;br&gt;
&lt;a href="https://www.langchain.com" rel="noopener noreferrer"&gt;LangChain &lt;/a&gt;→ LLM integration &amp;amp; chaining workflows.&lt;br&gt;
&lt;a href="https://huggingface.co" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt; → Pre-trained AI models (e.g., Mixtral-8x7B).&lt;br&gt;
&lt;a href="https://supabase.com" rel="noopener noreferrer"&gt;Supabase&lt;/a&gt; → Vector database for storing PDF embeddings.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Extract text&lt;/strong&gt; from PDF.
2 &lt;strong&gt;Chunk the text&lt;/strong&gt; into smaller segments (for large PDFs).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embed each chunk&lt;/strong&gt; into vector form using a transformer model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Store embeddings&lt;/strong&gt; in Supabase Vector DB.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Perform similarity search&lt;/strong&gt; to find the most relevant chunks for a query.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use a Hugging Face model&lt;/strong&gt; to refine and format the answer.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Key Concepts
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Chaining&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A method of breaking a complex task into sequential steps, where the output of one step feeds into the next.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Embedding&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A representation of text, images, or audio as points in a semantic vector space.&lt;br&gt;
Similar items (e.g., mobile, smartphone, cell phone) are stored close together in this space.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install pdfplumber sentence-transformers supabase
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pdfplumber&lt;/code&gt; → Extract text from PDF.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sentence-transformers&lt;/code&gt; → Convert text into embeddings.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;supabase&lt;/code&gt; → Store and search embeddings.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Supabase Setup
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Create a &lt;a href="https://supabase.com" rel="noopener noreferrer"&gt;Supabase&lt;/a&gt; account.&lt;/li&gt;
&lt;li&gt;Start a new project and copy:

&lt;ul&gt;
&lt;li&gt;Project URL&lt;/li&gt;
&lt;li&gt;API Key&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Enable vector extension:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE EXTENSION IF NOT EXISTS vector SCHEMA extensions;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Create &lt;strong&gt;documents1&lt;/strong&gt; table:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE documents1 (
    id TEXT PRIMARY KEY,
    text TEXT,
    pdf_id TEXT,
    embedding VECTOR(384)
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Create &lt;strong&gt;similarity search&lt;/strong&gt; function:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE FUNCTION match_documents(
    query_embedding VECTOR(384),
    match_count INT
) RETURNS TABLE (
    id TEXT,
    text TEXT
) LANGUAGE plpgsql STABLE AS $$
BEGIN
    RETURN QUERY
    SELECT documents1.id, documents1.text
    FROM documents1
    ORDER BY documents1.embedding &amp;lt;-&amp;gt; query_embedding
    LIMIT match_count;
END;
$$;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  PDF Processing
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Upload PDF (Google Colab)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from google.colab import files
uploaded = files.upload()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Extract &amp;amp; Chunk
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pdfplumber
def extract_and_chunk(pdf_path, chunk_size=500):
    with pdfplumber.open(pdf_path) as pdf:
        text = "".join(page.extract_text() or "" for page in pdf.pages)
    chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
    return chunks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Store in Supabase
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from supabase import create_client
from sentence_transformers import SentenceTransformer

supabase_url = "YOUR_SUPABASE_URL"
supabase_key = "YOUR_API_KEY"
supabase = create_client(supabase_url, supabase_key)

model = SentenceTransformer('all-MiniLM-L6-v2')

pdf_path = "Sample.pdf"
chunks = extract_and_chunk(pdf_path)
embeddings = model.encode(chunks).tolist()

data = [
    {"id": f"chunk_{i}", "text": chunk, "embedding": embedding, "pdf_id": "doc1"}
    for i, (chunk, embedding) in enumerate(zip(chunks, embeddings))
]

supabase.table("documents1").insert(data).execute()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Query Search
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;query = "What is the topic?"
query_embedding = model.encode(query).tolist()

response = supabase.rpc(
    "match_documents",
    {"query_embedding": query_embedding, "match_count": 3}
).execute()

relevant_chunks = [row["text"] for row in response.data]
print("\n---\n".join(relevant_chunks))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Hugging Face Integration
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Create a &lt;a href="https://huggingface.co" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt; account.&lt;/li&gt;
&lt;li&gt;Generate a READ API token.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from huggingface_hub import InferenceClient
import os

client = InferenceClient(
    api_key=os.getenv("HUGGINGFACEHUB_API_TOKEN", "YOUR_HF_API_KEY")
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Refinement with Mixtral-8x7B
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;prompt = f"""
Refine the following extracted text chunks for clarity, conciseness, and improved readability.
Keep the technical meaning accurate.

Text to refine:
{ "\n\n---\n\n".join(relevant_chunks) }
"""

response = client.chat.completions.create(
    model="mistralai/Mixtral-8x7B-Instruct-v0.1",
    messages=[
        {"role": "system", "content": "You are an expert technical editor."},
        {"role": "user", "content": prompt}
    ],
    temperature=0.7,
    max_tokens=500
)

print("\n Refined Output:\n")
print(response.choices[0].message.content)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Notes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Delete old data&lt;/strong&gt; before inserting chunks from a new PDF to avoid * duplicate ID errors.&lt;/li&gt;
&lt;li&gt;Hugging Face request cost &amp;amp; speed depend on the chosen model.&lt;/li&gt;
&lt;li&gt;Supabase vector size (384) must match your embedding model output.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;PDF upload → chunking → storing → querying → refining&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>langchain</category>
      <category>supabase</category>
    </item>
    <item>
      <title>Automated PDF Summarization Using AI Agents: LangChain + Hugging Face + Supabase + Streamlit - Basic</title>
      <dc:creator>datatoinfinity</dc:creator>
      <pubDate>Fri, 08 Aug 2025 20:21:47 +0000</pubDate>
      <link>https://dev.to/datatoinfinity/automated-pdf-summarization-using-ai-agents-langchain-hugging-face-supabase-streamlit-basic-385e</link>
      <guid>https://dev.to/datatoinfinity/automated-pdf-summarization-using-ai-agents-langchain-hugging-face-supabase-streamlit-basic-385e</guid>
      <description>&lt;h2&gt;
  
  
  Build an AI Agent to Auto-Summarize PDFs with LangChain, Hugging Face, and Supabase
&lt;/h2&gt;

&lt;p&gt;This idea came to me while working on a project with &lt;strong&gt;extensive documentation&lt;/strong&gt;. It was time-consuming and overwhelming to extract only the important details.&lt;/p&gt;

&lt;p&gt;I thought — &lt;em&gt;what if I had an assistant that could read the entire document, summarize it, and even answer my questions?&lt;/em&gt; That’s when &lt;strong&gt;PDF Summarization AI Agent&lt;/strong&gt; was born.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live Demo:&lt;/strong&gt; &lt;a href="https://khushboogup-pdffolder-app1-f9ibs2.streamlit.app/" rel="noopener noreferrer"&gt;PDFSUMMARIZATION Site&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Sample PDF:&lt;/strong&gt; &lt;a href="https://drive.google.com/drive/folders/1YUpHTfnXBK7hzQQszIuLCpylHRwrcLNo?usp=share_link" rel="noopener noreferrer"&gt;Download Here&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;PDFs are everywhere — academic papers, contracts, reports, manuals — but &lt;strong&gt;manually skimming hundreds of pages isn’t scalable&lt;/strong&gt;. This is especially painful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Researchers:&lt;/strong&gt; Extract key findings from long papers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lawyers:&lt;/strong&gt; Summarize contracts &amp;amp; compliance docs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business Analysts:&lt;/strong&gt; Turn meeting transcripts into quick insights.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Finance Teams:&lt;/strong&gt; Condense invoices &amp;amp; statements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Students:&lt;/strong&gt; Turn textbooks into study notes.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Streamlit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Easy Python web app frontend&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LangChain&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Handles LLM workflows &amp;amp; chaining&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hugging Face&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Provides pre-trained AI models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Supabase&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Vector DB for semantic search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PyPDF2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Extracts text from PDFs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  How It Works (High-Level Flow)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Upload PDF(s)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extract Text&lt;/strong&gt; → using PyPDF2&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunk &amp;amp; Embed&lt;/strong&gt; → LangChain breaks text into smaller parts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Store in Supabase&lt;/strong&gt; → for semantic search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Query AI&lt;/strong&gt; → Hugging Face / Gemini answers based on context&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Return Summary or Q&amp;amp;A Answer&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Setup Instructions
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Get a Google AI Studio API Key&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visit &lt;a href="https://aistudio.google.com/app/apikey" rel="noopener noreferrer"&gt;Google AI Studio API Key&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Create API Key&lt;/strong&gt; (new project)
&lt;/li&gt;
&lt;li&gt;Copy your key.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Install Required Libraries&lt;/strong&gt;&lt;/p&gt;

&lt;pre&gt; 
bash
pip install langchain langchain-core langchain-google-genai PyPDF2 
&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Lets Start With Basic
&lt;/h2&gt;

&lt;p&gt;Working with AI Agent with Gemini API key. &lt;/p&gt;

&lt;pre&gt;
import warnings
warnings.filterwarnings("ignore")
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnableSequence
from langchain_google_genai import ChatGoogleGenerativeAI
import PyPDF2
import os

# Set your Gemini API key
os.environ["GOOGLE_API_KEY"] = "API"

# Extract text from multiple PDFs
def extract_text_from_pdf(pdf_paths):
    text = ""
    for pdf_path in pdf_paths:  # Iterate over the list of PDF paths
        try:
            with open(pdf_path, "rb") as file:
                reader = PyPDF2.PdfReader(file)
                for page in reader.pages:
                    page_text = page.extract_text()
                    if page_text:
                        text += page_text + "\n"  # Add newline to separate text from different PDFs
        except FileNotFoundError:
            text += f"Error: The file '{pdf_path}' was not found.\n"
    return text

# Define prompt template
template = """
You are an expert AI assistant. Use the information provided for answering the question
Context: {context}
Question: {question}
Answer:
"""
prompt = PromptTemplate(input_variables=["context", "question"], template=template)

# Initialize Gemini LLM and chain
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", google_api_key=os.environ["GOOGLE_API_KEY"])
qa_chain = RunnableSequence(prompt | llm)  # Updated to use RunnableSequence

# Function to answer questions
def answer_question(pdf_text, question):
    if not pdf_text:
        return "Error: No text extracted from the PDFs."
    answer = qa_chain.invoke({"context": pdf_text, "question": question})  # Updated to use invoke
    return answer.content if hasattr(answer, 'content') else answer  # Handle response content

# Example usage
if __name__ == "__main__":
    pdf_paths = ["sample3.pdf"]  # Replace with your list of PDF file paths
    pdf_text = extract_text_from_pdf(pdf_paths)  # Pass the list of PDF paths
    question = input("Enter text: ")
    answer = answer_question(pdf_text, question)
    print(f"Question: {question}\nAnswer: {answer}")
    # print(len(pdf_text))  # Uncomment to print the length of extracted text
&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Full Code Walkthrough&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here’s a detailed explanation of every part of the code for those who want the deep dive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;extract_text_from_pdf&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loops through PDF file paths&lt;/li&gt;
&lt;li&gt;Uses PyPDF2 to read &amp;amp; extract text page-by-page&lt;/li&gt;
&lt;li&gt;Adds newlines to separate pages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Prompt Template&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;{context} = extracted PDF text&lt;/li&gt;
&lt;li&gt;{question} = user’s query&lt;/li&gt;
&lt;li&gt;AI responds only based on the provided context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;LLM Initialization&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uses Gemini 1.5 Flash (fast, cost-effective)&lt;/li&gt;
&lt;li&gt;RunnableSequence pipes the prompt output into the AI model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This code is basic, it is just giving you hint how we going to extract the data from pdf. Try it by yourself, add file those are bigger in size then you will know it drawback then we will cover those drawback.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>devto</category>
      <category>langchain</category>
      <category>huggingface</category>
      <category>supabase</category>
    </item>
    <item>
      <title>Pattern Printing Series: Alphabet Patterns Explained with Logic</title>
      <dc:creator>datatoinfinity</dc:creator>
      <pubDate>Fri, 01 Aug 2025 18:09:21 +0000</pubDate>
      <link>https://dev.to/datatoinfinity/pattern-printing-series-alphabet-patterns-explained-with-logic-5enl</link>
      <guid>https://dev.to/datatoinfinity/pattern-printing-series-alphabet-patterns-explained-with-logic-5enl</guid>
      <description>&lt;p&gt;Alphabet patterns are a classic way to improve your understanding of nested loops, ASCII values, and pattern logic in programming. Whether you're preparing for an interview or just sharpening your logic, these patterns offer a fun and educational challenge. Let’s dive into some fascinating examples and decode the logic behind them!&lt;/p&gt;

&lt;h2&gt;
  
  
  Alphabet Pattern 1:
&lt;/h2&gt;

&lt;pre&gt;
A 
A B 
A B C 
A B C D 
A B C D E 
&lt;/pre&gt;

&lt;pre&gt;
row=5
for i in range(1,row+1):
    for j in range(i):
        print(chr(j+65),end=" ")
    print()
&lt;/pre&gt;

&lt;p&gt;Explanation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;row=5&lt;/code&gt; define how many rows and column needed.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;for i in range(1,row+1)&lt;/code&gt; control the number of rows.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;for j in range(i)&lt;/code&gt; control the number of column.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;chr(j+65)&lt;/code&gt; Converts j to its corresponding uppercase alphabet using ASCII.
chr(65) = 'A',
chr(66) = 'B', and so on.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Alphabet Pattern 2:
&lt;/h2&gt;

&lt;pre&gt;
A B C D E 
A B C D 
A B C 
A B 
A 
&lt;/pre&gt;

&lt;pre&gt;
row=5
for i in range(row,0,-1):
    for j in range(i):
        print(chr(j+65),end=" ")
    print()
&lt;/pre&gt;

&lt;p&gt;Explanation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;row=5&lt;/code&gt; define how many rows and column needed.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;for i in range(row,0,-1)&lt;/code&gt; control the number of rows.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;for j in range(i)&lt;/code&gt; control the number of column.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;chr(j+65)&lt;/code&gt; Converts j to its corresponding uppercase alphabet using ASCII.
chr(65) = 'A',
chr(66) = 'B', and so on.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-1-12bh"&gt;Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-1&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-2-4p51"&gt;Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-2&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-3-31n7"&gt;Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-3&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/datatoinfinity/mater-logic-with-number-pattern-in-python-1-3iga"&gt;Mater Logic With Number Pattern in Python - 1&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>beginners</category>
      <category>coding</category>
      <category>programming</category>
    </item>
    <item>
      <title>Mater Logic With Number Pattern in Python - 1</title>
      <dc:creator>datatoinfinity</dc:creator>
      <pubDate>Fri, 01 Aug 2025 15:32:14 +0000</pubDate>
      <link>https://dev.to/datatoinfinity/mater-logic-with-number-pattern-in-python-1-3iga</link>
      <guid>https://dev.to/datatoinfinity/mater-logic-with-number-pattern-in-python-1-3iga</guid>
      <description>&lt;p&gt;Imagine looking at a simple series of numbers on a screen—and suddenly realizing you can predict the next move, decipher hidden logic, and even create your own mesmerizing patterns with just a few lines of code. That’s the secret magic behind number patterns in Python, and today, we’ll not only unravel how they work, but give you the tools to build your own “number masterpieces” using the power of simple logic!&lt;/p&gt;

&lt;h2&gt;
  
  
  Number Pattern 1.
&lt;/h2&gt;

&lt;pre&gt;
1 
1 2 
1 2 3 
1 2 3 4 
1 2 3 4 5 
&lt;/pre&gt;

&lt;pre&gt;
row=5
for i in range(1,row+1):
    for j in range(1,i+1):
        print(j,end=" ")
    print()
&lt;/pre&gt;

&lt;p&gt;Explanation: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Simple explanation for this we just print column.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;for i in range(1,row+1)&lt;/code&gt; controls the number of row.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;for j in range(1,i+1)&lt;/code&gt; controls the number of column.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Dry Run Table&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Row (i)&lt;/th&gt;
&lt;th&gt;Inner Loop (j in range(i))&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0, 1&lt;/td&gt;
&lt;td&gt;1 2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0, 1, 2&lt;/td&gt;
&lt;td&gt;1 2 3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0, 1, 2, 3&lt;/td&gt;
&lt;td&gt;1 2 3 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;0, 1, 2, 3, 4&lt;/td&gt;
&lt;td&gt;1 2 3 4 5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Other way around;&lt;/p&gt;

&lt;pre&gt;
1
2 2
3 3 3
4 4 4 4
5 5 5 5 5
&lt;/pre&gt;

&lt;pre&gt;
row=5
for i in range(1,row+1):
    for j in range(1,i+1):
        print(i,end=" ")
    print()
&lt;/pre&gt;

&lt;p&gt;Just print row, print(i,end=" ")&lt;/p&gt;

&lt;h2&gt;
  
  
  Number Pattern 2.
&lt;/h2&gt;

&lt;pre&gt;
1 
2 3 
4 5 6 
7 8 9 10 
11 12 13 14 15 
16 17 18 19 20 21 
&lt;/pre&gt;

&lt;pre&gt;
row=5
num=0
for i in range(1,row+1):
    for j in range(i):
        num=num+1
        print(num,end=" ")
    print()
&lt;/pre&gt;

&lt;p&gt;Explanation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;row=5&lt;/code&gt; number of rows and column.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;num=0&lt;/code&gt; will be incremented for every printed number.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;for i in range(1,row+1)&lt;/code&gt; print number of rows.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;for j in range(i)&lt;/code&gt; print number of column.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;num=num+1&lt;/code&gt; advances the sequence and prints each number, spacing them on the same line.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Dry Run Table&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Row (i)&lt;/th&gt;
&lt;th&gt;Inner Loop (j in range(i))&lt;/th&gt;
&lt;th&gt;num=0, num=num+1&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0, 1&lt;/td&gt;
&lt;td&gt;2 3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0, 1, 2&lt;/td&gt;
&lt;td&gt;4 5 6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0, 1, 2, 3&lt;/td&gt;
&lt;td&gt;7 8 9 10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;0, 1, 2, 3, 4&lt;/td&gt;
&lt;td&gt;11 12 13 14 15&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-1-12bh"&gt;Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-1&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-2-4p51"&gt;Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-2&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-3-31n7"&gt;Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-3&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>beginners</category>
      <category>programming</category>
      <category>coding</category>
    </item>
    <item>
      <title>Find the Most Frequent Word in Text using Python | NLP Basics Explained</title>
      <dc:creator>datatoinfinity</dc:creator>
      <pubDate>Mon, 28 Jul 2025 13:23:20 +0000</pubDate>
      <link>https://dev.to/datatoinfinity/find-the-most-frequent-word-in-text-using-python-nlp-basics-explained-410j</link>
      <guid>https://dev.to/datatoinfinity/find-the-most-frequent-word-in-text-using-python-nlp-basics-explained-410j</guid>
      <description>&lt;p&gt;Ever wondered which word appears the most in a text? Whether you’re analyzing customer feedback, blog posts, or any text data, finding the most frequent word is a common Natural Language Processing (NLP) task. In this post, we’ll explore how to do it in Python, why it matters, and some real-world applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Find the Most Frequent Word?
&lt;/h2&gt;

&lt;p&gt;Word frequency analysis helps in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Keyword extraction&lt;/strong&gt; for SEO and blogs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sentiment analysis&lt;/strong&gt; in customer feedback.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Topic modeling&lt;/strong&gt; in large text datasets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chatbots &amp;amp; AI models&lt;/strong&gt; for training data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Steps to Find the Most Frequent Word
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Prepare Text&lt;/li&gt;
&lt;li&gt;Tokenize and Stop Word Removal&lt;/li&gt;
&lt;li&gt;Count Word Frequency &lt;/li&gt;
&lt;li&gt;DataFrame for word and frequency &lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;pre&gt;
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
import pandas as pd
import nltk

corpora = [
    'Artificial Intelligence is transforming the world. AI is used in healthcare, finance, and education. Machine Learning, a branch of AI, powers recommendation systems and predictive analytics.',
    'Success is not the key to happiness. Happiness is the key to success. If you love what you do, you will be successful.',
    'The product is great. The quality is great and the price is reasonable. I will recommend this product to my friends because the product is worth the price.',
    'The football match was intense. The players gave their best. The match ended with a thrilling victory. Fans celebrated the match with great excitement.'
]

stop_words = set(stopwords.words('english'))

all_data = []  

for i, corpus in enumerate(corpora, start=1):
    words = [word.lower() for word in word_tokenize(corpus) 
             if word.lower() not in stop_words and word.isalpha()]
    
    
    word_freq = {}
    for word in words:
        word_freq[word] = word_freq.get(word, 0) + 1
    
   
    for word, freq in word_freq.items():
        all_data.append({"Corpus": f"Corpus_{i}", "Word": word, "Frequency": freq})

df = pd.DataFrame(all_data)

# Sort by frequency for better readability
df = df.sort_values(by=["Corpus", "Frequency"], ascending=[True, False])

print(df)

&lt;/pre&gt;

&lt;h4&gt;
  
  
  Importing Library and Module
&lt;/h4&gt;

&lt;pre&gt;
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
import pandas as pd
import nltk
&lt;/pre&gt;

&lt;p&gt;Explanation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;import nltk&lt;/code&gt; NLTK, or the Natural Language Toolkit, is a prominent open-source library in Python designed for Natural Language Processing (NLP). &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;from nltk.corpus import stopwords&lt;/code&gt; it import stopwords corpus.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;from nltk.tokenize import word_tokenize&lt;/code&gt; it helps to tokenize sentence which mean extract words from sentence.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Prepare the Text
&lt;/h4&gt;

&lt;pre&gt;
corpora = [
    'Artificial Intelligence is transforming the world. AI is used in healthcare, finance, and education. Machine Learning, a branch of AI, powers recommendation systems and predictive analytics.',
    'Success is not the key to happiness. Happiness is the key to success. If you love what you do, you will be successful.',
    'The product is great. The quality is great and the price is reasonable. I will recommend this product to my friends because the product is worth the price.',
    'The football match was intense. The players gave their best. The match ended with a thrilling victory. Fans celebrated the match with great excitement.'
]
&lt;/pre&gt;

&lt;h4&gt;
  
  
  Process each Corpus
&lt;/h4&gt;

&lt;pre&gt;
stop_words = set(stopwords.words('english'))
all_data = [] 
for i, corpus in enumerate(corpora, start=1):
    words = [word.lower() for word in word_tokenize(corpus) 
             if word.lower() not in stop_words and word.isalpha()]
&lt;/pre&gt;

&lt;p&gt;Explanation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;stop_words=set(stopwords.words('english'))&lt;/code&gt; it load all the stopwords in english.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;all_data=[]&lt;/code&gt; to store all word-frequency data.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;for i,corpus in enumerate(corpora,start=1)&lt;/code&gt; 

&lt;ul&gt;
&lt;li&gt;Loop over the corpora list.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;enumerate&lt;/code&gt; gives both:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;i&lt;/code&gt; index of the corpus&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;corpus&lt;/code&gt; actual text string&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;words = [word.lower() for word in word_tokenize(corpus) 
         if word.lower() not in stop_words and word.isalpha()]&lt;/code&gt; 

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;word.lower()&lt;/code&gt; make it text in lowercase.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;for word in word_tokenize(corpus)&lt;/code&gt; iterate through tokenize word which is tokenized using word_tokenize() function.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;if word.lower() not in stop_words and word.isalpha()&lt;/code&gt; now it will say if word is in lowercase not in stop_words and alphabet.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Count Frequency
&lt;/h4&gt;

&lt;pre&gt;
word_freq = {}
    for word in words:
        word_freq[word] = word_freq.get(word, 0) + 1
&lt;/pre&gt;

&lt;p&gt;Explanation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;word_freq={}&lt;/code&gt; make dictionary where it store frequency of words.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;for word in words&lt;/code&gt; iterate through words list.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;word_freq[word]=word_freq.get(word, 0) + 1&lt;/code&gt; word_freq.get(word, 0) checks if the word exists in the dictionary. If yes, returns its current count. If no, returns 0 (default value). Then + 1 increments the count by 1.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Convert to list of dicts for DataFrame
&lt;/h4&gt;

&lt;pre&gt;
for word, freq in word_freq.items():
        all_data.append({"Corpus": f"Corpus_{i}", "Word": word, "Frequency": freq})
&lt;/pre&gt;

&lt;h4&gt;
  
  
  Create Pandas DataFrame
&lt;/h4&gt;

&lt;pre&gt;
df = pd.DataFrame(all_data)
df = df.sort_values(by=["Corpus", "Frequency"], ascending=[True, False])
print(df)
&lt;/pre&gt;

&lt;h2&gt;
  
  
  Whole Code
&lt;/h2&gt;

&lt;pre&gt;
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
import pandas as pd
import nltk

# Download NLTK resources (run once)
# nltk.download('punkt')
# nltk.download('stopwords')

# Define corpora
corpora = [
    'Artificial Intelligence is transforming the world. AI is used in healthcare, finance, and education. Machine Learning, a branch of AI, powers recommendation systems and predictive analytics.',
    'Success is not the key to happiness. Happiness is the key to success. If you love what you do, you will be successful.',
    'The product is great. The quality is great and the price is reasonable. I will recommend this product to my friends because the product is worth the price.',
    'The football match was intense. The players gave their best. The match ended with a thrilling victory. Fans celebrated the match with great excitement.'
]

stop_words = set(stopwords.words('english'))

all_data = []  # to store all word-frequency data

# Process each corpus
for i, corpus in enumerate(corpora, start=1):
    words = [word.lower() for word in word_tokenize(corpus) 
             if word.lower() not in stop_words and word.isalpha()]
    
    # Count frequency
    word_freq = {}
    for word in words:
        word_freq[word] = word_freq.get(word, 0) + 1
    
    # Convert to list of dicts for DataFrame
    for word, freq in word_freq.items():
        all_data.append({"Corpus": f"Corpus_{i}", "Word": word, "Frequency": freq})

# Create Pandas DataFrame
df = pd.DataFrame(all_data)

# Sort by frequency for better readability
df = df.sort_values(by=["Corpus", "Frequency"], ascending=[True, False])

print(df)

&lt;/pre&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;pre&gt;
     Corpus            Word  Frequency
4   Corpus_1              ai          2
0   Corpus_1      artificial          1
1   Corpus_1    intelligence          1
2   Corpus_1    transforming          1
3   Corpus_1           world          1
5   Corpus_1            used          1
6   Corpus_1      healthcare          1
7   Corpus_1         finance          1
8   Corpus_1       education          1
9   Corpus_1         machine          1
&lt;/pre&gt;

&lt;p&gt;&lt;a href="https://dev.to/datatoinfinity/basic-natural-language-processing-2gp7"&gt;Download nltk important library&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/datatoinfinity/stop-words-removal-1gp9"&gt;Install Stopwords module&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Information Extraction in NLP
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/datatoinfinity/information-extraction-in-nlp-techniques-tools-real-world-examples-j75"&gt;Information Extraction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/datatoinfinity/what-is-pos-tagging-in-nlp-real-world-example-and-use-cases-with-python-using-spacy-18ef"&gt;Part of Speech (POS)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/datatoinfinity/what-is-ner-in-nlp-real-world-examples-and-use-cases-using-python-and-spacy-7ik"&gt;Name Entity Recognation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/datatoinfinity/word-cloud-in-nlp-a-complete-guide-to-visualizing-text-with-python-1m7l"&gt;Word Cloud in NLP&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>nlp</category>
      <category>machinelearning</category>
      <category>devto</category>
      <category>python</category>
    </item>
    <item>
      <title>Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-3</title>
      <dc:creator>datatoinfinity</dc:creator>
      <pubDate>Fri, 18 Jul 2025 13:26:56 +0000</pubDate>
      <link>https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-3-31n7</link>
      <guid>https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-3-31n7</guid>
      <description>&lt;p&gt;If you're a visual learner, pattern problems in Python are the perfect playground.&lt;br&gt;
From simple triangles to pyramids and diamonds — every pattern teaches you how loops and logic work together. Ready to visualize code like never before?&lt;/p&gt;

&lt;h1&gt;
  
  
  Inverted Pyramid Using Nested Loop in Python.
&lt;/h1&gt;

&lt;pre&gt;
* * * * * * * * * 
  * * * * * * * 
    * * * * * 
      * * * 
        * 
&lt;/pre&gt;

&lt;blockquote&gt;
&lt;p&gt;Before diving in, check out the previous patterns:&lt;br&gt;
&lt;a href="https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-1-12bh"&gt;Reverse Right-Angled Triangle Pattern&lt;br&gt;
&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-2-4p51"&gt;Pyramid Using Nested Loop in Python&lt;/a&gt;&lt;br&gt;
Once you understand both the pyramid and the reverse right-angled triangle, the logic behind the inverted pyramid becomes intuitive.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;pre&gt;
row=5
for i in range(row,0,-1):
    for j in range(row-i):
        print(" ",end=" ")
    for k in range(2*i-1):
        print("*",end=" ")
    print()
&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Explanation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;range(row, 0, -1)&lt;/code&gt; makes the rows decrease from 5 to 1, it will inverse the Pyramid.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;row - i&lt;/code&gt; controls the leading spaces to shift the stars rightward.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;2*i - 1&lt;/code&gt; ensures that each row has the correct number of stars to form a centered triangle.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Diamond Pyramid Pattern.
&lt;/h1&gt;

&lt;pre&gt;
        * 
      * * * 
    * * * * * 
  * * * * * * * 
* * * * * * * * * 
  * * * * * * * 
    * * * * * 
      * * * 
        * 
&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Concept:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First, you build the pyramid (top half).&lt;/li&gt;
&lt;li&gt;Then, you mirror it by adding the inverted pyramid (bottom half).&lt;/li&gt;
&lt;li&gt;Both parts share a common middle (the row with 9 stars).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;pre&gt;
row=5
for i in range(1,row+1):       # Upper Pyramid
    for j in range(row-i):
        print(" ",end=" ")
    for k in range(2*i-1):
        print("*",end=" ")
    print()
for i in range(row,0,-1):      # Inverted Pyramid
    for j in range(row-i):
        print(" ",end=" ")
    for k in range(2*i-1):
        print("*",end=" ")
    print()
    
&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Explanation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;range(1, row+1)&lt;/code&gt; builds the upper pyramid.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;range(row, 0, -1)&lt;/code&gt; builds the inverted lower half.&lt;/li&gt;
&lt;li&gt;Since both pyramids use the same logic (2*i - 1 stars and shifting spaces), the transition is seamless.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-1-12bh"&gt;Explained the Inverted Right Angle Triangle Pattern&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-2-4p51"&gt;Explained the Pyramid Pattern&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>beginners</category>
      <category>programming</category>
      <category>coding</category>
    </item>
    <item>
      <title>Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-2</title>
      <dc:creator>datatoinfinity</dc:creator>
      <pubDate>Thu, 17 Jul 2025 15:15:08 +0000</pubDate>
      <link>https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-2-4p51</link>
      <guid>https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-2-4p51</guid>
      <description>&lt;p&gt;Ever looked at a pattern problem and thought, "Why can't I get this simple star pyramid to align properly?"&lt;br&gt;
You're not alone. Pattern problems might look easy, but they’re the secret weapon for building powerful logic using loops — and today, we’re going to decode them step by step.&lt;/p&gt;

&lt;h1&gt;
  
  
  Pattern: Mirrored Right Angle Triangle Using Nested Loop in Python.
&lt;/h1&gt;

&lt;p&gt;We’re going to build this simple star pattern and understand the logic behind nested loops:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Logic Overview&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We're using two for loops:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The outer loop controls the rows&lt;/li&gt;
&lt;li&gt;The inner loop controls the columns (stars)&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;
        * 
      * * 
    * * * 
  * * * * 
* * * * * 
&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Logic Overview&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To form this pyramid:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need spaces before star to centre-align them.&lt;/li&gt;
&lt;li&gt;The number of stars increases with each row.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;1&lt;/th&gt;
&lt;th&gt;2&lt;/th&gt;
&lt;th&gt;3&lt;/th&gt;
&lt;th&gt;4&lt;/th&gt;
&lt;th&gt;5&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;(1,1)&lt;/td&gt;
&lt;td&gt;(1,2)&lt;/td&gt;
&lt;td&gt;(1,3)&lt;/td&gt;
&lt;td&gt;(1,4)&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;(2,1)&lt;/td&gt;
&lt;td&gt;(2,2)&lt;/td&gt;
&lt;td&gt;(2,3)&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;(3,1)&lt;/td&gt;
&lt;td&gt;(3,2)&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;(4,1)&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;As you see row increase the column decrease. (1,4),(2,3),(3,2),(4,1). or we can say,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Spaces increases and star decreases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;pre&gt;
row=5
for i in range(1,row+1):
    for j in range(1,row-i+1):
        print(" ",end=" ")
    for k in range(1,i+1):
        print("*",end=" ")
    print()
&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Explanation&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;row=5&lt;/code&gt; this is for how many rows you want.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;for i in range(1,row+1):&lt;/code&gt; This is the outer loop, which runs 5 times (from 1 to 5). Each loop iteration represents one row of the output.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;for j in range(1,row-i+1):&lt;/code&gt; This inner loop is responsible for printing spaces before the stars. Why?
To make the pattern right-aligned, each row needs less space than the previous one.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;for k in range(1,i+1):&lt;/code&gt; This loop prints the stars * in each row.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Dry Run Table&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;i&lt;/th&gt;
&lt;th&gt;j&lt;/th&gt;
&lt;th&gt;k&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Row(i)&lt;/td&gt;
&lt;td&gt;Spaces(row-i+1)&lt;/td&gt;
&lt;td&gt;Stars(i+1)&lt;/td&gt;
&lt;td&gt;Output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;* *&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;* * *&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;* * * *&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;* * * * *&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Summary:&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;Outer loop (i)&lt;/code&gt;: controls the number of rows, increasing each time&lt;br&gt;
&lt;code&gt;Inner loop (j)&lt;/code&gt;: controls the trailing spaces.&lt;br&gt;
&lt;code&gt;Inner loop (k)&lt;/code&gt;: prints the star. &lt;br&gt;
&lt;code&gt;end=" "&lt;/code&gt; prints stars on the same line&lt;br&gt;
&lt;code&gt;print()&lt;/code&gt; moves to the next line after each row&lt;/p&gt;

&lt;h1&gt;
  
  
  Pattern: Pyramid Using Nested Loop in Python
&lt;/h1&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;pre&gt;
        * 
      * * * 
    * * * * * 
  * * * * * * * 
* * * * * * * * * 
&lt;/pre&gt;

&lt;p&gt;Step by Step Explanation&lt;/p&gt;

&lt;pre&gt;
row=5
for i in range(1,row+1):
    for j in range(row-i):
        print(" ",end=" ")
    for k in range(2*i-1):
        print("*",end=" ")
    print()
&lt;/pre&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;1&lt;/th&gt;
&lt;th&gt;2&lt;/th&gt;
&lt;th&gt;3&lt;/th&gt;
&lt;th&gt;4&lt;/th&gt;
&lt;th&gt;5&lt;/th&gt;
&lt;th&gt;6&lt;/th&gt;
&lt;th&gt;7&lt;/th&gt;
&lt;th&gt;8&lt;/th&gt;
&lt;th&gt;9&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;i&lt;/code&gt;-&amp;gt;1,row+1, how many row you want.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;j&lt;/code&gt;-&amp;gt;row-i, it is for spaces, as it decreasing.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;k&lt;/code&gt;2*i-1, it is to print star.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;i&lt;/th&gt;
&lt;th&gt;j&lt;/th&gt;
&lt;th&gt;k&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;Outer loop (i)&lt;/code&gt;: controls the number of rows, increasing each time&lt;br&gt;
&lt;code&gt;Inner loop (j)&lt;/code&gt;: controlling the spaces which is decreasing each time.&lt;br&gt;
&lt;code&gt;Inner loop (k)&lt;/code&gt;: It will print the star.&lt;br&gt;
end=" " prints stars on the same line&lt;br&gt;
print() moves to the next line after each row&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How to make inverted pyramid?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-1-12bh"&gt;Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-1&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-3-31n7"&gt;Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-3&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>beginners</category>
      <category>programming</category>
      <category>coding</category>
    </item>
    <item>
      <title>Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-1</title>
      <dc:creator>datatoinfinity</dc:creator>
      <pubDate>Mon, 14 Jul 2025 19:15:58 +0000</pubDate>
      <link>https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-1-12bh</link>
      <guid>https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-1-12bh</guid>
      <description>&lt;p&gt;Pattern problems are the gym for your brain. They strengthen your looping logic, thinking in steps, and algorithmic intuition — all while keeping it fun.&lt;/p&gt;

&lt;h1&gt;
  
  
  Pattern: Right-Angled Triangle Using Nested Loops in Python
&lt;/h1&gt;

&lt;p&gt;We’re going to build this simple star pattern and understand the logic behind nested loops:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Logic Overview&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We're using two for loops:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The outer loop controls the rows&lt;/li&gt;
&lt;li&gt;The inner loop controls the columns (stars)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;pre&gt;
* 
* * 
* * * 
* * * * 
* * * * *
&lt;/pre&gt; 

&lt;p&gt;Step-by-Step Explanation&lt;/p&gt;

&lt;pre&gt;
row = 5
for i in range(1, row + 1):       # Outer loop: for each row (1 to 5)
    for j in range(i):            # Inner loop: print i stars
        print("*", end=" ")
    print()                       # Move to the 
next line after each row
&lt;/pre&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Why &lt;code&gt;range(1, row + 1)&lt;/code&gt;?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;range(n)&lt;/code&gt; goes from &lt;code&gt;0 to n-1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;So &lt;code&gt;range(1, row + 1)&lt;/code&gt; gives: &lt;code&gt;1, 2, 3, 4, 5&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;That means we’ll have 5 rows, as required.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Why &lt;code&gt;range(i)&lt;/code&gt; for inner loop?&lt;br&gt;
In each row, the number of stars is equal to the row number:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Row 1 → 1 star&lt;br&gt;
Row 2 → 2 stars&lt;br&gt;
...&lt;br&gt;
Row 5 → 5 stars&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dry Run Table&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Row (i)&lt;/th&gt;
&lt;th&gt;Inner Loop (j in range(i))&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0, 1&lt;/td&gt;
&lt;td&gt;* *&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0, 1, 2&lt;/td&gt;
&lt;td&gt;* * *&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0, 1, 2, 3&lt;/td&gt;
&lt;td&gt;* * * *&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;0, 1, 2, 3, 4&lt;/td&gt;
&lt;td&gt;* * * * *&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Notice how the number of * matches the current row number &lt;code&gt;i&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;Outer loop (i)&lt;/code&gt; = rows&lt;br&gt;
&lt;code&gt;Inner loop (j)&lt;/code&gt; = columns (stars)&lt;br&gt;
&lt;code&gt;print("*", end=" ")&lt;/code&gt; prints stars on the same line&lt;br&gt;
&lt;code&gt;print()&lt;/code&gt; moves to the next line after each row&lt;/p&gt;

&lt;h1&gt;
  
  
  Reverse Right-Angled Triangle Pattern in Python
&lt;/h1&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;pre&gt;
* * * * * 
* * * * 
* * * 
* * 
* 
&lt;/pre&gt;

&lt;p&gt;Step-by-Step Explanation:&lt;/p&gt;

&lt;pre&gt;
row = 5
for i in range(row, 0, -1):       # Outer loop: from 5 to 1
    for j in range(i):            # Inner loop: print i stars
        print("*", end=" ")
    print()                       # Move to the next line after each row
&lt;/pre&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;row = 5&lt;/code&gt;
We want 5 rows, so we initialize the row variable as 5. You can increase this number for a larger pattern.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;for i in range(row, 0, -1)&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This loop goes in reverse from row to 1.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;range(5, 0, -1)&lt;/code&gt; outputs: &lt;code&gt;5, 4, 3, 2, 1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt; The third parameter &lt;code&gt;-1&lt;/code&gt; is the step; without it, the loop won’t run in reverse.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;for j in range(i)&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This inner loop controls how many stars get printed in each row — same as the current value of i.
So:
Row 1 → 5 stars
Row 2 → 4 stars
...
Row 5 → 1 star&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Dry Run Table&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Row (i)&lt;/th&gt;
&lt;th&gt;Inner Loop (j in range(i))&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;0 1 2 3 4&lt;/td&gt;
&lt;td&gt;* * * * *&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0 1 2 3&lt;/td&gt;
&lt;td&gt;* * * *&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;0 1 2&lt;/td&gt;
&lt;td&gt;* * *&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0 1&lt;/td&gt;
&lt;td&gt;* *&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Summary:&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;Outer loop (i)&lt;/code&gt;: controls the number of rows, decreasing each time&lt;br&gt;
&lt;code&gt;Inner loop (j)&lt;/code&gt;: prints stars equal to the current row number&lt;br&gt;
&lt;code&gt;end=" "&lt;/code&gt; prints stars on the same line&lt;br&gt;
&lt;code&gt;print()&lt;/code&gt; moves to the next line after each row&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What if we make mirror image of Right-Angled Triangle Using Nested Loops in Python?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://dev.to/datatoinfinity/build-your-logic-from-scratch-python-pattern-problems-explained-star-pattern-2-4p51"&gt;Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-2&lt;/a&gt;&lt;br&gt;
&lt;a href="//Build%20Your%20Logic%20from%20Scratch:%20Python%20Pattern%20Problems%20Explained.%20Star%20Pattern-1"&gt;Build Your Logic from Scratch: Python Pattern Problems Explained. Star Pattern-3&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>beginners</category>
      <category>programming</category>
      <category>coding</category>
    </item>
    <item>
      <title>Information Extraction in NLP: Techniques, Tools &amp; Real-World Examples</title>
      <dc:creator>datatoinfinity</dc:creator>
      <pubDate>Sat, 12 Jul 2025 12:38:27 +0000</pubDate>
      <link>https://dev.to/datatoinfinity/information-extraction-in-nlp-techniques-tools-real-world-examples-j75</link>
      <guid>https://dev.to/datatoinfinity/information-extraction-in-nlp-techniques-tools-real-world-examples-j75</guid>
      <description>&lt;p&gt;Ever wondered how search engines pull facts from millions of documents or how chatbots recognize names, dates, and numbers in your messages? That’s the magic of information extraction in NLP. It’s the process of transforming unstructured text into structured, actionable data a core part of modern AI systems. &lt;/p&gt;

&lt;h3&gt;
  
  
  Task
&lt;/h3&gt;

&lt;p&gt;Let's do a task ask ChatGPT about yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Text Extraction Information?
&lt;/h2&gt;

&lt;p&gt;As name suggested it will extract information from unstructured or semi-structured text. There many technique is used to identify entity, entity name, action and event. It gives standardize format which is stored in rows and column.&lt;/p&gt;

&lt;h2&gt;
  
  
  Text Extraction Process
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Name Entity Recognition:- It is information extraction task to identify name, organisation, date in unstructured text.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Relation Extraction:- It basically say the relation between entity and data source. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Event Extraction:- It recognise the action needs to be done. Like appointment or meeting.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sentiment Analysis:- As name suggested sentiment, the feeling behind the sentence. Well feeling are abstract  we just feel don't see but in this technique the model identify according to your word which you have written.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  "My order of a Samsung Galaxy S23 from your Seattle warehouse hasn’t arrived yet, and it was supposed to be delivered by July 10, 2025."
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Example Text Extraction Process:
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Input Text: The customer’s message.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Named Entity Recognition (NER): The NLP system identifies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product: Samsung Galaxy S23&lt;/li&gt;
&lt;li&gt;Location: Seattle&lt;/li&gt;
&lt;li&gt;Date: July 10, 2025&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Keyword Extraction: Identifies key terms like “order,” “delivered,” and “warehouse” to understand the context.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Relation Extraction: Detects the relationship between “order” and “hasn’t arrived” to flag a delivery issue.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Output: Structured data:&lt;br&gt;
{Product: "Samsung Galaxy S23", Location: "Seattle", Delivery Date: "July 10, 2025", Issue: "Non-delivered"}&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Application: The chatbot uses this data to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Query the order database for the specific product and delivery status.&lt;/li&gt;
&lt;li&gt;Respond with: “I’m sorry, it seems your Samsung Galaxy S23 order from our Seattle warehouse is delayed. Let me check the status and provide an update.”&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sentiment here Negative.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Real-World Impact: This extraction enables the chatbot to quickly understand and address the customer’s issue, improving response time and user satisfaction. It’s used in customer support, logistics tracking, and automated ticketing systems.&lt;/p&gt;

&lt;p&gt;Learn Text Extraction Basic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/datatoinfinity/what-is-pos-tagging-in-nlp-real-world-example-and-use-cases-with-python-using-spacy-18ef"&gt;Learn Part Of Speech (POS)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/datatoinfinity/what-is-ner-in-nlp-real-world-examples-and-use-cases-using-python-and-spacy-7ik"&gt;Learn Name Entity Recognition (NER)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/datatoinfinity/word-cloud-in-nlp-a-complete-guide-to-visualizing-text-with-python-1m7l"&gt;Learn Word Cloud In NLP&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>nlp</category>
      <category>machinelearning</category>
      <category>devto</category>
      <category>python</category>
    </item>
    <item>
      <title>Word Cloud in NLP: A Complete Guide to Visualizing Text with Python</title>
      <dc:creator>datatoinfinity</dc:creator>
      <pubDate>Fri, 11 Jul 2025 19:18:28 +0000</pubDate>
      <link>https://dev.to/datatoinfinity/word-cloud-in-nlp-a-complete-guide-to-visualizing-text-with-python-1m7l</link>
      <guid>https://dev.to/datatoinfinity/word-cloud-in-nlp-a-complete-guide-to-visualizing-text-with-python-1m7l</guid>
      <description>&lt;p&gt;Ever stared at a mountain of text and thought, “Where do I even begin?” Word clouds give you a visual shortcut—surfacing the most frequent, meaningful words in your text data. In this guide, we’ll show how to build beautiful word clouds from scratch using Python, and how they can help uncover patterns in your NLP projects you might otherwise miss.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Word Cloud?
&lt;/h2&gt;

&lt;p&gt;A word cloud is a visual representation of text data where the size of each word indicates its frequency or importance within a given text or corpus. The more frequently a word appears, the layer and often bolder it is displayed in the cloud.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;In customer reviews, big words like "price", "quality", or "service" indicate common discussion points.&lt;/p&gt;

&lt;h2&gt;
  
  
  Note: Word Cloud are not analytical models, they are visual aids that complement, not replace, deeper NLP tasks like classification, sentiment analysis or topic modelling.
&lt;/h2&gt;

&lt;p&gt;Install Python library for wordcloud:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pip install wordlcoud&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Basic&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;
import matplotlib.pyplot as plt
from wordcloud import WordCloud

text="India, officially the Republic of India Hindi: Bhārat Gaṇarājya, is a country in South Asia. It is the seventh-largest country by area, the second-most populous country, and the most populous democracy in the world. Bounded by the Indian Ocean on the south, the Arabian Sea on the southwest, and the Bay of Bengal on the southeast, it shares land borders with Pakistan to the west; China, Nepal, and Bhutan to the north; and Bangladesh and Myanmar to the east. In the Indian Ocean, India is in the vicinity of Sri Lanka and the Maldives; its Andaman and Nicobar Islands share a maritime border with Thailand, Myanmar, and Indonesia."

wc=WordCloud().generate(text)
plt.imshow(wc)
plt.axis('off')
plt.show()
&lt;/pre&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1i5tngpf8ytesn5kkz45.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1i5tngpf8ytesn5kkz45.png" alt="WordCloud" width="764" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;WordCloud.generate(text)&lt;/code&gt; this function will generate word cloud thats why text has been passed.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;plt.imshow(wc)&lt;/code&gt; plt is pyplot module from matplotlib, imshow() generate display design in 2D and wc is data passed from it.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;plt.axis('off')&lt;/code&gt; hide all visual components of x-axis and y-axis.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;plt.show()&lt;/code&gt; function from the matplotlib.pyplot module that serves to display all currently active figures. &lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Word Cloud without Stop Words&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;
from nltk.corpus import stopwords
stopword=stopwords.words('english')
wc=WordCloud(width=1000,height=720,margin=2,max_words=100,background_color='white',stopwords=stopword)
plt.imshow(wc.generate(text))
wc=WordCloud().generate(text)
plt.axis('off')
plt.show()
&lt;/pre&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltoouhs7pzxud27q6w52.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltoouhs7pzxud27q6w52.png" alt="WordCloud" width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;from nltk.corpus import stopword&lt;/code&gt; it will import dictionary of stopword.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;stopword=stopwords.words('english')&lt;/code&gt; any word from  an english that is stop word.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;wc=WordCloud(width=1000,height=720,margin=2,max_words=100,background_color='white',stopwords=stopword)&lt;/code&gt; in wordcloud() function:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;width=1000&lt;/code&gt; width of frame which should be display.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;height=720&lt;/code&gt; height of frame which should be display.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;margin=2&lt;/code&gt; margin of wordcloud in the frame.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;max_words=100&lt;/code&gt; we want maximum 100 word from corpus or text.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;stopwords=stopword&lt;/code&gt; it is used to remove stopword from the cloud.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://dev.to/datatoinfinity/what-is-pos-tagging-in-nlp-real-world-example-and-use-cases-with-python-using-spacy-18ef"&gt;Learn about Part of Speech (POS)&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/datatoinfinity/what-is-ner-in-nlp-real-world-examples-and-use-cases-using-python-and-spacy-7ik"&gt;Learn about Name Entity Recognation&lt;/a&gt;&lt;/p&gt;

</description>
      <category>nlp</category>
      <category>machinelearning</category>
      <category>devto</category>
      <category>python</category>
    </item>
    <item>
      <title>What is POS tagging in NLP with Python using Spacy</title>
      <dc:creator>datatoinfinity</dc:creator>
      <pubDate>Thu, 10 Jul 2025 19:16:14 +0000</pubDate>
      <link>https://dev.to/datatoinfinity/what-is-pos-tagging-in-nlp-real-world-example-and-use-cases-with-python-using-spacy-18ef</link>
      <guid>https://dev.to/datatoinfinity/what-is-pos-tagging-in-nlp-real-world-example-and-use-cases-with-python-using-spacy-18ef</guid>
      <description>&lt;p&gt;How does an AI know that ‘run’ is a verb and ‘quick’ is an adjective? That’s the magic of Part Of Speech Tagging – teaching machines grammar!&lt;/p&gt;

&lt;p&gt;&lt;code&gt;"A woman without her man is nothing."&lt;/code&gt;&lt;br&gt;
&lt;code&gt;"A woman, without her, man is nothing."&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;One comma change the whole meaning of sentence. Same goes for AI model if AI doesn't understand this basic it can misinterpret of sentence or text. So Part of Speech, which is task of NLP help in it to make model work accurately.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Part Of Speech?
&lt;/h2&gt;

&lt;p&gt;It is NLP where each word in a text is assigned a grammatical tag (like noun, verb, adjective etc.) This process helps computer understand the syntactic structure of a sentence and the role of each word, which is crucial for various NLP tasks.&lt;/p&gt;

&lt;p&gt;Many words can have multiple meanings depending on their context. For example:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;"Book a fight"&lt;/code&gt;&lt;br&gt;
    * Book -&amp;gt; verb (an action)&lt;br&gt;
&lt;code&gt;"Read the book"&lt;/code&gt;&lt;br&gt;
    * Book -&amp;gt; Noun (an object)&lt;/p&gt;

&lt;p&gt;Without POS tagging, an NLP system might treat both &lt;code&gt;"book"&lt;/code&gt; the same and get confused POS tagging helps resolve these ambiguities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Import necessary library and Initialise the text&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;
import spacy 
nlp=spacy.load('en_core_web_sm')

text=u"Steve Jobs was a founder of Apple, he created his company April 1, 1976. Now company headquarter located in Cupertino,California,United State"
d=nlp(text)
&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Part of Speech&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;
print(d[0].text,d[0].pos_,d[0].tag_)
&lt;/pre&gt;

&lt;p&gt;Output&lt;/p&gt;

&lt;pre&gt;
Steve PROPN NNP
&lt;/pre&gt;

&lt;p&gt;&lt;code&gt;d[0].text&lt;/code&gt; first word of sentence of text &lt;code&gt;Steve&lt;/code&gt;. &lt;code&gt;d[0].pos_&lt;/code&gt; assigning the grammatical categories &lt;code&gt;PROPN&lt;/code&gt; proper noun. &lt;code&gt;d[0].tag_&lt;/code&gt; indicating its grammatical role &lt;code&gt;NNP&lt;/code&gt; proper noun, singular.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Print for every word&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;
text=u"I like to play cricket"
d=nlp(text)
for token in d:
     print(f"{token.text:{15}}{token.pos_:{15}}{token.tag_:{15}}{spacy.explain(token.tag_)}")
&lt;/pre&gt;
 

&lt;p&gt;Output &lt;/p&gt;

&lt;pre&gt;
I              PRON           PRP            pronoun, personal
like           VERB           VBP            verb, non-3rd person singular present
to             PART           TO             infinitival "to"
play           VERB           VB             verb, base form
cricket        NOUN           NN             noun, singular or mass
&lt;/pre&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;token&lt;/code&gt; iterate through text.&lt;/li&gt;
&lt;li&gt; &lt;code&gt;token.text:{15}&lt;/code&gt; take word from text &lt;code&gt;token.text&lt;/code&gt; and &lt;code&gt;:{15}&lt;/code&gt; it give 15 spaces after the word.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;token.pos_:{15}&lt;/code&gt; assign grammatical categories and &lt;code&gt;:{15}&lt;/code&gt; it give 15 spaces after the word.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;token.tag_&lt;/code&gt; indicating grammatical role.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;spacy.explain(token.tag_)&lt;/code&gt; it will explain the &lt;code&gt;tag_&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You must be thinking where the sentence is been tokenized. Well &lt;code&gt;nlp()&lt;/code&gt; tokenized the sentence into word.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/datatoinfinity/what-is-ner-in-nlp-real-world-examples-and-use-cases-using-python-and-spacy-7ik"&gt;Want to Learn Name Entity Recognition&lt;/a&gt;&lt;br&gt;
&lt;a href="https://dev.to/datatoinfinity/word-cloud-in-nlp-a-complete-guide-to-visualizing-text-with-python-1m7l"&gt;Learn About Word Cloud in NLP&lt;/a&gt;&lt;/p&gt;

</description>
      <category>nlp</category>
      <category>machinelearning</category>
      <category>devto</category>
      <category>python</category>
    </item>
  </channel>
</rss>
