In my previous blog, I covered:
π From LLMs to Agents: Build Smart AI Systems with Tools in LangChain
We learned how to:
- build custom tools
- create AI agents
- fetch real-world data
π₯ Whatβs Next?
Now letβs take it further.
π Instead of just querying tools, we will make AI work with real data sources:
In this blog, we will learn:
- π Load and analyze text files
- π Process CSV data
- π Fetch and analyze web URLs (web scraping)
- β‘ Optimize using semantic search (vector DB)
π 1. Load Text File Using TextLoader
We can directly load a *.txt file into LangChain:
from langchain_community.document_loaders import TextLoader
loader = TextLoader("tata_motors.txt")
docs = loader.load()
docs
Output
π This converts your text file into structured documents.

Add Queries to fetch result from .txt file
from langchain_community.document_loaders import TextLoader
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
loader = TextLoader("tata_motors.txt", encoding="utf-8")
docs = loader.load()
# Combine all texts into one single string
context = "\n\n".join(doc.page_content for doc in docs)
# Ask Questions
query = """
How much worth Tata Motors has provided on behalf of its Singapore holding company?
"""
prompt = ChatPromptTemplate.from_template("""
You are a stock research assistant.
Use only the context below.
Do not invent missing values.
User query:
{query}
Context:
{context}
""")
chain = prompt | llm
response = chain.invoke({
"query": query,
"context": context
})
print(response.content)
Output
π 2. Load CSV Data Using CSVLoader
from langchain_community.document_loaders import CSVLoader
loader = CSVLoader("cars.csv")
data = loader.load()
data
Output
π cars.csv file contents
You can also use Pandas for better control:
pip install -U pandas
π This allows LLMs to behave like a data analyst on your CSV.
import os
import pandas as pd
from langchain_openai import ChatOpenAI
df = pd.read_csv("cars.csv")
question = "List the cars within 10 Lakhs budget?"
csv_text = df.to_string(index=False)
prompt = f"""
You are answering questions from this CSV data.
CSV data:
{csv_text}
Question:
{question}
Answer clearly using only the CSV data.
"""
response = llm.invoke(prompt)
print(response.content)
Output
π 3. Load URLs & Perform Web Scraping
Now comes the powerful part.
pip install -U unstructured
π LLM will read web content and generate structured analysis.
from langchain_community.document_loaders import UnstructuredURLLoader
urls = [
"https://www.tickertape.in/stocks/tata-motors-TMC",
"https://groww.in/stocks/tata-motors-ltd",
]
loader = UnstructuredURLLoader(urls=urls)
documents = loader.load()
query = """
Analyze valuation, profitability, entry point, red flags,
and overall whether Tata Motors stock looks attractive.
"""
prompt = f"""
You are a stock research assistant.
Use only the context below. Do not invent missing values.
User query:
{query}
Return the answer in this exact format:
# Tata Motors Stock Analysis
## 1. Quick View
- Overall view:
- Reason:
## 2. Key Metrics Found
| Metric | Value | Interpretation |
|---|---:|---|
| Market Cap | | |
| PE Ratio | | |
| PB Ratio | | |
| Dividend Yield | | |
| Risk / Volatility | | |
| Red Flags | | |
## 3. Valuation
## 4. Profitability / Quality
## 5. Entry Point
## 6. Red Flags / Risks
## 7. Final Tentative View
"""
response = llm.invoke(prompt)
print(response.content)
Output
β οΈ Problem: Slow Performance
If you load many URLs:
- β³ Processing becomes slow
- π Context becomes too large
- πΈ Cost increases
- β‘ Solution: Semantic Search (Vector DB)
Instead of passing all data, we:
- Split content into chunks
- Convert into embeddings
- Store in vector DB
- Retrieve only relevant data
β‘ 4. Optimize using semantic search (vector DB)
from langchain_community.document_loaders import UnstructuredURLLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
urls = [
"https://www.tickertape.in/stocks/tata-motors-TMC",
"https://groww.in/stocks/tata-motors-ltd",
]
loader = UnstructuredURLLoader(urls=urls)
documents = loader.load()
# Step 1: Split into Chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
chunks = text_splitter.split_documents(documents)
# Step 2: Create Embeddings + Store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_db = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_url_db"
)
# Step 3: Retrieve Relevant Data
retriever = vector_db.as_retriever(search_kwargs={"k": 4})
retrieved_docs = retriever.invoke(query)
context = "\n\n".join(
doc.page_content for doc in retrieved_docs
)
query = """
Analyze valuation, profitability, entry point, red flags,
and overall whether Tata Motors stock looks attractive.
"""
prompt = ChatPromptTemplate.from_template("""
You are a stock research assistant.
Use only the context below. Do not invent missing values.
User query:
{query}
Context:
{context}
Return the answer in this exact format:
# Tata Motors Stock Analysis
## 1. Quick View
- Overall view:
- Reason:
## 2. Key Metrics Found
| Metric | Value | Interpretation |
|---|---:|---|
| Market Cap | | |
| PE Ratio | | |
| PB Ratio | | |
| Dividend Yield | | |
| Risk / Volatility | | |
| Red Flags | | |
## 3. Valuation
## 4. Profitability / Quality
## 5. Entry Point
## 6. Red Flags / Risks
## 7. Final Tentative View
""")
context = "\n\n".join(
doc.page_content for doc in retrieved_docs
)
# Step 4: Final Analysis
chain = prompt | llm
response = chain.invoke({
"query": query,
"context": context,
})
print(response.content)
Output
π What You Learned
In this blog, we moved from:
π AI Agents β AI + Data Intelligence
You learned how to:
- Load text and CSV data
- Scrape and analyze web content
- Handle large data efficiently
- Use vector databases for semantic search






Top comments (0)