Learn how to build a production-ready sentiment analysis application using LangChain's Expression Language (LCEL) and OpenAI GPT models
🔗 Repository: github.com/JaimeLucena/google-reviews-sentiment
🎯 Introduction
Have you ever wondered how businesses automatically understand customer feedback? Or how AI systems can analyze thousands of reviews and extract meaningful insights?
In this tutorial, we'll build a complete sentiment analysis application that:
- Fetches real Google Business reviews
- Analyzes sentiment using OpenAI's GPT models
- Extracts key aspects (service, food, price, etc.)
- Provides actionable insights
This is a perfect project for learning LangChain LCEL (LangChain Expression Language), one of the most powerful patterns for building LLM applications. By the end, you'll understand how to compose chains, handle structured outputs, and build real-world AI applications.
🤔 What is Sentiment Analysis with LLMs?
Traditional sentiment analysis uses rule-based systems or machine learning models trained on specific datasets. Modern LLM-based approaches are more flexible because:
- No training required: LLMs already understand language nuances
- Multi-language support: Works across languages without retraining
- Context understanding: Can understand sarcasm, context, and subtle meanings
- Structured extraction: Can extract specific aspects (service, food, price) automatically
Instead of just saying "positive" or "negative", we can get:
- A sentiment score from -1 to +1
- Specific aspects mentioned (service, food quality, price)
- A rationale explaining why
- Language detection
🧠 What is LangChain LCEL?
LCEL (LangChain Expression Language) is a declarative way to compose chains using Python's | operator. Think of it like Unix pipes, but for AI workflows.
Why LCEL?
Before LCEL (traditional approach):
def analyze_sentiment(text):
# Step 1: Preprocess
cleaned = clean_text(text)
# Step 2: Build prompt
prompt = build_prompt(cleaned)
# Step 3: Call LLM
response = llm.invoke(prompt)
# Step 4: Parse response
result = parse_response(response)
return result
With LCEL (composable chains):
chain = preprocess | build_prompt | llm | parse_response
result = await chain.ainvoke({"text": text})
Benefits:
- ✅ Composable: Mix and match components easily
- ✅ Async by default: Built-in support for async/await
- ✅ Batch processing: Process multiple items efficiently
- ✅ Type-safe: Works seamlessly with Pydantic models
- ✅ Streaming: Built-in support for streaming responses
🏗️ Project Architecture
Let's understand how our application works:
User Query: "Analyze reviews for Joe's Pizza"
│
▼
┌─────────────────────────────────────┐
│ Google Places API │
│ - Find place by query │
│ - Fetch business reviews │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ LangChain LCEL Pipeline │
│ ┌───────────────────────────────┐ │
│ │ 1. Preprocess text │ │
│ └──────────────┬────────────────┘ │
│ │ │
│ ┌──────────────▼────────────────┐ │
│ │ 2. Build prompt with │ │
│ │ format instructions │ │
│ └──────────────┬────────────────┘ │
│ │ │
│ ┌──────────────▼────────────────┐ │
│ │ 3. Call OpenAI GPT │ │
│ └──────────────┬────────────────┘ │
│ │ │
│ ┌──────────────▼────────────────┐ │
│ │ 4. Parse structured output │ │
│ │ (Pydantic validation) │ │
│ └──────────────┬────────────────┘ │
└──────────────────┼───────────────────┘
│
▼
Sentiment Results
💻 Code Walkthrough
Let's dive into the key components of our application.
1. Building the Sentiment Analysis Chain
The heart of our application is in app/chains.py. Here's how we build our LCEL chain:
def build_sentiment_chain(model_name: str | None = None):
"""
LCEL chain: (preprocess) -> prompt -> llm -> Pydantic parser
"""
# Step 1: Create a Pydantic output parser
parser = PydanticOutputParser(pydantic_object=ReviewSentiment)
# Step 2: Define the system prompt
system = (
"You are a precise sentiment analyst. "
"Return a single JSON object strictly matching the given schema.\n"
"{format_instructions}\n"
"Scoring: negative=-1..-0.05, neutral≈-0.05..0.05, positive=0.05..1.0.\n"
"Keep rationale concise. Extract aspects if present; otherwise return an empty list."
)
# Step 3: Create the prompt template
prompt = ChatPromptTemplate.from_messages([
("system", system),
("user", "Analyze the sentiment of the following review.\n"
"Language hint: {language_hint}\n"
"Text: ```
{text}
```"),
])
# Step 4: Initialize the LLM
llm = build_llm(model_name)
# Step 5: Preprocessing step
pre = RunnableLambda(
lambda x: {
"text": normalize_text(x["text"]),
"language_hint": x.get("language_hint"),
"format_instructions": parser.get_format_instructions(),
}
)
# Step 6: Compose the chain using the pipe operator
chain = pre | prompt | llm | parser
return chain
Key Concepts:
PydanticOutputParser: Automatically generates format instructions from your Pydantic model and parses the LLM response into a typed object.
ChatPromptTemplate: Creates structured prompts with system and user messages. The
{format_instructions}placeholder gets filled with JSON schema instructions.RunnableLambda: Wraps a Python function to make it part of the chain. Here, we use it for text preprocessing.
Pipe Operator (
|): Composes the chain. Data flows from left to right through each component.
Understanding Each Component in Detail
Now let's dive deeper into each component we just used:
RunnableLambda: Custom Processing
pre = RunnableLambda(
lambda x: {
"text": normalize_text(x["text"]),
"language_hint": x.get("language_hint"),
"format_instructions": parser.get_format_instructions(),
}
)
What it does:
- Wraps any Python function to make it part of the chain
- Transforms input data before it reaches the prompt
- Can be synchronous or asynchronous
Why we use it:
- Text normalization (clean whitespace, encoding)
- Injecting dynamic values (format_instructions)
- Data transformation that doesn't need an LLM
ChatPromptTemplate: Structured Prompts
prompt = ChatPromptTemplate.from_messages([
("system", system),
("user", "Analyze the sentiment..."),
])
Benefits:
- Separates system instructions from user input
- Supports conversation history (multi-turn)
- Handles variable substitution automatically
- Works with any chat model (OpenAI, Anthropic, etc.)
Message Types:
-
system: Instructions and context (sent once) -
user: Actual input from the user -
assistant: Previous responses (for conversations) -
human: Alias for user messages
LLM Invocation
llm = ChatOpenAI(model="gpt-4o-mini")
What happens:
- Prompt template is rendered with actual values
- Messages are formatted according to the model's expected format
- API call is made to OpenAI
- Response is returned as an
AIMessageobject
Model Selection:
-
gpt-4o-mini: Fast, cost-effective, good for sentiment analysis -
gpt-4o: More capable, better for complex reasoning -
gpt-3.5-turbo: Older, cheaper, less accurate
PydanticOutputParser: Structured Parsing
parser = PydanticOutputParser(pydantic_object=ReviewSentiment)
What it does:
- Extracts text from the LLM's response
- Parses JSON from the text
- Validates against the Pydantic model
- Returns a typed Python object
Error Handling:
- If JSON is malformed, raises a parsing error
- If validation fails, raises a validation error
- You can catch these and retry or handle gracefully
2. Structured Output with Pydantic
We use Pydantic models to ensure type safety and structured outputs:
class SentimentLabel(str, Enum):
positive = "positive"
neutral = "neutral"
negative = "negative"
class AspectSentiment(BaseModel):
aspect: str = Field(..., description="The aspect/topic mentioned")
label: SentimentLabel
score: float = Field(..., ge=-1.0, le=1.0)
class ReviewSentiment(BaseModel):
text: str
language: Optional[str] = None
label: SentimentLabel
score: float = Field(..., ge=-1.0, le=1.0)
rationale: str = Field(..., description="Short explanation")
aspects: List[AspectSentiment] = Field(default_factory=list)
Why Pydantic?
- ✅ Automatic validation
- ✅ Type hints for better IDE support
- ✅ Generates JSON schema for LLM instructions
- ✅ Ensures the LLM returns data in the expected format
3. Batch Processing
One of LCEL's superpowers is easy batch processing:
async def analyze_texts(payload: AnalyzeTextRequest) -> List[ReviewSentiment]:
chain = build_sentiment_chain(payload.model_name)
inputs = [
{"text": t, "language_hint": payload.language_hint}
for t in payload.texts
]
# Process all reviews in parallel!
results: List[ReviewSentiment] = await chain.abatch(inputs)
return results
The abatch() method automatically handles:
- Parallel API calls
- Rate limiting
- Error handling
- Progress tracking
4. Google Places Integration
We fetch real reviews from Google Places API:
class GooglePlacesClient:
async def find_place_id(self, query: str) -> Optional[str]:
"""Find a place by text query"""
url = f"{self._base}/findplacefromtext/json"
params = {
"input": query,
"inputtype": "textquery",
"fields": "place_id",
"key": self.api_key,
}
async with httpx.AsyncClient(timeout=self.timeout) as client:
r = await client.get(url, params=params)
data = r.json()
candidates = data.get("candidates") or []
return candidates[0]["place_id"] if candidates else None
async def fetch_reviews(self, place_id: str, *, limit: int = 10) -> List[BusinessReview]:
"""Fetch reviews for a place"""
url = f"{self._base}/details/json"
params = {
"place_id": place_id,
"fields": "reviews",
"key": self.api_key,
}
# ... fetch and parse reviews
5. FastAPI Endpoints
We expose our functionality through REST API endpoints:
@app.post("/google/analyze-by-query", response_model=AnalyzeByQueryResponse)
async def google_analyze_by_query(body: AnalyzeByQueryRequest):
# Step 1: Find the place
client = GooglePlacesClient(settings.google_maps_api_key)
place_id = await client.find_place_id(body.query)
# Step 2: Fetch reviews
reviews = await client.fetch_reviews(place_id, limit=body.limit)
texts = [r.text for r in reviews if r.text and r.text.strip()]
# Step 3: Analyze sentiment
analysis = await analyze_texts(
AnalyzeTextRequest(texts=texts, language_hint=body.language)
)
return AnalyzeByQueryResponse(
query=body.query,
place_id=place_id,
review_count=len(texts),
results=analysis,
)
🔄 Understanding the Data Flow
Let's trace through what happens when a user requests sentiment analysis:
Step-by-Step Execution Flow
1. User Input
"Analyze reviews for Joe's Pizza in New York"
2. Place Discovery
The application uses Google Places API to find the business:
- Converts the query to a
place_id(a unique identifier) - This
place_idis stable and doesn't change over time
3. Review Retrieval
Once we have the place_id, we fetch reviews:
- Google Places API returns up to 5 reviews per request
- Each review contains: text, author, rating, timestamp
- We extract just the text for analysis
4. Text Preprocessing
Before sending to the LLM, we normalize the text:
- Remove extra whitespace
- Handle encoding issues
- Prepare for batch processing
5. LCEL Chain Execution
This is where the magic happens:
Input: {"text": "Amazing pizza!", "language_hint": "en"}
↓
Preprocess: Normalize and add format instructions
↓
Prompt: Build system + user message with instructions
↓
LLM: OpenAI GPT processes and generates JSON
↓
Parser: Pydantic validates and structures the output
↓
Output: ReviewSentiment object with label, score, aspects
6. Result Aggregation
For multiple reviews:
- Each review is processed in parallel (thanks to
abatch()) - Results are collected and returned as a list
- The API response includes metadata (place_id, review_count)
Why This Architecture Works
Separation of Concerns:
- Google Places client handles external API calls
- LCEL chain handles AI processing
- FastAPI handles HTTP requests/responses
- Pydantic ensures data validation at every step
Async/Await Benefits:
- Multiple reviews processed concurrently
- Non-blocking I/O operations
- Better resource utilization
- Faster response times
🎨 Deep Dive: Prompt Engineering
The quality of your LLM output depends heavily on prompt design. Let's break down our prompt strategy:
System Message Design
system = (
"You are a precise sentiment analyst. " # Role definition
"Return a single JSON object strictly matching the given schema.\n" # Format requirement
"{format_instructions}\n" # Auto-generated from Pydantic
"Scoring: negative=-1..-0.05, neutral≈-0.05..0.05, positive=0.05..1.0.\n" # Clear scoring rules
"Keep rationale concise. Extract aspects if present; otherwise return an empty list." # Output guidelines
)
Key Elements:
-
Role Definition: "You are a precise sentiment analyst"
- Sets context for the LLM
- Helps it adopt the right "persona"
-
Format Instructions: Automatically generated from Pydantic schema
- Ensures the LLM knows the exact JSON structure
- Includes field descriptions and constraints
-
Scoring Guidelines: Explicit ranges for each sentiment
- Prevents ambiguous scores
- Ensures consistency across analyses
-
Output Guidelines: Instructions for optional fields
- Tells the LLM when to extract aspects
- Guides the rationale length
User Message Template
"Analyze the sentiment of the following review.\n"
"Language hint: {language_hint}\n"
"Text: ```
{text}
```"
Why This Works:
- Clear task instruction
- Language hint helps with multilingual reviews
- Triple backticks ( ```) help the LLM identify the text clearly
- Simple, direct, no ambiguity
Format Instructions Magic
The PydanticOutputParser automatically generates instructions like:
The output should be formatted as a JSON instance that conforms to the JSON schema below.
As an example, for the schema {"properties": {"foo": {"title": "Foo", "type": "integer"}, "bar": {"title": "Bar", "type": "string"}}, "required": ["foo", "bar"]}
the object {"foo": 5, "bar": "baz"} is a well-formatted instance of the schema.
Schema:
{
"properties": {
"text": {"title": "Text", "type": "string"},
"label": {"title": "Label", "enum": ["positive", "neutral", "negative"]},
"score": {"title": "Score", "type": "number", "minimum": -1.0, "maximum": 1.0},
...
},
"required": ["text", "label", "score", "rationale", "aspects"]
}
This ensures the LLM returns valid JSON that matches our Pydantic model exactly!
🚀 Performance Considerations
Batch Processing Benefits
When analyzing multiple reviews, abatch() provides:
Parallel Execution:
python
# Sequential (slow):
for text in texts:
result = await chain.ainvoke({"text": text}) # Wait for each
# Parallel (fast):
results = await chain.abatch(inputs) # All at once
Automatic Rate Limiting:
- LangChain respects API rate limits
- Queues requests if needed
- Prevents API errors from too many requests
Error Resilience:
- Individual failures don't stop the batch
- You can handle errors per item
- Partial results are still useful
Cost Optimization
Token Usage:
- System prompt: ~100 tokens (sent once per request)
- User prompt: ~50-200 tokens per review
- Response: ~100-300 tokens per review
Strategies:
- Batch processing: Reduces overhead
-
Model selection:
gpt-4o-miniis 10x cheaper thangpt-4o - Prompt optimization: Shorter prompts = lower costs
- Caching: Cache results for identical reviews
Async/Await Patterns
Why async matters:
python
# Synchronous (blocking):
result1 = analyze(text1) # Wait 2 seconds
result2 = analyze(text2) # Wait 2 seconds
# Total: 4 seconds
# Asynchronous (non-blocking):
results = await asyncio.gather(
analyze(text1), # Start immediately
analyze(text2) # Start immediately
)
# Total: ~2 seconds (parallel)
Best Practices:
- Use
async deffor all I/O operations - Use
awaitfor LLM calls, API calls, database queries - Use
asyncio.gather()for independent operations - Use
abatch()for LangChain chains
🎓 Key Learnings
1. LCEL Chain Composition
The pipe operator (|) makes it easy to compose complex workflows:
python
chain = preprocess | prompt | llm | parser
Each component transforms the data and passes it to the next. This is much cleaner than nested function calls!
2. Structured Outputs
Using Pydantic models with PydanticOutputParser ensures:
- The LLM knows exactly what format to return
- Automatic validation of responses
- Type safety throughout your application
3. Async/Await Patterns
LCEL chains are async by default. Use ainvoke() for single items and abatch() for multiple items:
python
# Single item
result = await chain.ainvoke({"text": "Great service!"})
# Multiple items (parallel processing)
results = await chain.abatch([
{"text": "Great!"},
{"text": "Terrible!"},
{"text": "Okay"}
])
4. Prompt Engineering
Good prompts are crucial. Notice how we:
- Provide clear instructions in the system message
- Include format instructions automatically
- Give examples of scoring ranges
- Ask for specific outputs (aspects, rationale)
5. Error Handling
Always handle API failures gracefully:
python
try:
place_id = await client.find_place_id(query)
if not place_id:
raise HTTPException(status_code=404, detail="Place not found")
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
🎯 Conclusion
You've learned how to:
- ✅ Build composable chains with LangChain LCEL
- ✅ Use structured outputs with Pydantic
- ✅ Process data in batches efficiently
- ✅ Integrate external APIs (Google Places)
- ✅ Build a complete AI application
LangChain LCEL is a powerful pattern that makes building LLM applications much more maintainable and composable. The pipe operator (|) creates a clean, readable flow that's easy to debug and extend.
This project demonstrates real-world patterns you'll use in production AI applications. The combination of LCEL, Pydantic, and async/await creates a robust foundation for any LLM-powered service.
Next Steps:
- Experiment with different prompts
- Try different LLM models
- Add your own features
- Deploy to production
Happy building! 🚀
For installation instructions, API examples, and more details, check out the repository. If you found this helpful, star it and share your own projects!
Top comments (0)