Jaime Lucena Pérez

Posted on Nov 10

Building Your First Sentiment Analysis App with LangChain LCEL and OpenAI

#langchain #ai #genai #llm

Learn how to build a production-ready sentiment analysis application using LangChain's Expression Language (LCEL) and OpenAI GPT models

🔗 Repository: github.com/JaimeLucena/google-reviews-sentiment

🎯 Introduction

Have you ever wondered how businesses automatically understand customer feedback? Or how AI systems can analyze thousands of reviews and extract meaningful insights?

In this tutorial, we'll build a complete sentiment analysis application that:

Fetches real Google Business reviews
Analyzes sentiment using OpenAI's GPT models
Extracts key aspects (service, food, price, etc.)
Provides actionable insights

This is a perfect project for learning LangChain LCEL (LangChain Expression Language), one of the most powerful patterns for building LLM applications. By the end, you'll understand how to compose chains, handle structured outputs, and build real-world AI applications.

🤔 What is Sentiment Analysis with LLMs?

Traditional sentiment analysis uses rule-based systems or machine learning models trained on specific datasets. Modern LLM-based approaches are more flexible because:

No training required: LLMs already understand language nuances
Multi-language support: Works across languages without retraining
Context understanding: Can understand sarcasm, context, and subtle meanings
Structured extraction: Can extract specific aspects (service, food, price) automatically

Instead of just saying "positive" or "negative", we can get:

A sentiment score from -1 to +1
Specific aspects mentioned (service, food quality, price)
A rationale explaining why
Language detection

🧠 What is LangChain LCEL?

LCEL (LangChain Expression Language) is a declarative way to compose chains using Python's | operator. Think of it like Unix pipes, but for AI workflows.

Why LCEL?

Before LCEL (traditional approach):

def analyze_sentiment(text):
    # Step 1: Preprocess
    cleaned = clean_text(text)

    # Step 2: Build prompt
    prompt = build_prompt(cleaned)

    # Step 3: Call LLM
    response = llm.invoke(prompt)

    # Step 4: Parse response
    result = parse_response(response)

    return result

With LCEL (composable chains):

chain = preprocess | build_prompt | llm | parse_response
result = await chain.ainvoke({"text": text})

Benefits:

✅ Composable: Mix and match components easily
✅ Async by default: Built-in support for async/await
✅ Batch processing: Process multiple items efficiently
✅ Type-safe: Works seamlessly with Pydantic models
✅ Streaming: Built-in support for streaming responses

🏗️ Project Architecture

Let's understand how our application works:

User Query: "Analyze reviews for Joe's Pizza"
    │
    ▼
┌─────────────────────────────────────┐
│   Google Places API                 │
│   - Find place by query             │
│   - Fetch business reviews          │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│   LangChain LCEL Pipeline           │
│   ┌───────────────────────────────┐ │
│   │ 1. Preprocess text            │ │
│   └──────────────┬────────────────┘ │
│                  │                   │
│   ┌──────────────▼────────────────┐ │
│   │ 2. Build prompt with          │ │
│   │    format instructions        │ │
│   └──────────────┬────────────────┘ │
│                  │                   │
│   ┌──────────────▼────────────────┐ │
│   │ 3. Call OpenAI GPT            │ │
│   └──────────────┬────────────────┘ │
│                  │                   │
│   ┌──────────────▼────────────────┐ │
│   │ 4. Parse structured output    │ │
│   │    (Pydantic validation)      │ │
│   └──────────────┬────────────────┘ │
└──────────────────┼───────────────────┘
                   │
                   ▼
         Sentiment Results

💻 Code Walkthrough

Let's dive into the key components of our application.

1. Building the Sentiment Analysis Chain

The heart of our application is in app/chains.py. Here's how we build our LCEL chain:

def build_sentiment_chain(model_name: str | None = None):
    """
    LCEL chain: (preprocess) -> prompt -> llm -> Pydantic parser
    """
    # Step 1: Create a Pydantic output parser
    parser = PydanticOutputParser(pydantic_object=ReviewSentiment)

    # Step 2: Define the system prompt
    system = (
        "You are a precise sentiment analyst. "
        "Return a single JSON object strictly matching the given schema.\n"
        "{format_instructions}\n"
        "Scoring: negative=-1..-0.05, neutral≈-0.05..0.05, positive=0.05..1.0.\n"
        "Keep rationale concise. Extract aspects if present; otherwise return an empty list."
    )

    # Step 3: Create the prompt template
    prompt = ChatPromptTemplate.from_messages([
        ("system", system),
        ("user", "Analyze the sentiment of the following review.\n"
                 "Language hint: {language_hint}\n"
                 "Text: ```

{text}

```"),
    ])

    # Step 4: Initialize the LLM
    llm = build_llm(model_name)

    # Step 5: Preprocessing step
    pre = RunnableLambda(
        lambda x: {
            "text": normalize_text(x["text"]),
            "language_hint": x.get("language_hint"),
            "format_instructions": parser.get_format_instructions(),
        }
    )

    # Step 6: Compose the chain using the pipe operator
    chain = pre | prompt | llm | parser
    return chain

Key Concepts:

PydanticOutputParser: Automatically generates format instructions from your Pydantic model and parses the LLM response into a typed object.
ChatPromptTemplate: Creates structured prompts with system and user messages. The {format_instructions} placeholder gets filled with JSON schema instructions.
RunnableLambda: Wraps a Python function to make it part of the chain. Here, we use it for text preprocessing.
Pipe Operator (|): Composes the chain. Data flows from left to right through each component.

Understanding Each Component in Detail

Now let's dive deeper into each component we just used:

RunnableLambda: Custom Processing

pre = RunnableLambda(
    lambda x: {
        "text": normalize_text(x["text"]),
        "language_hint": x.get("language_hint"),
        "format_instructions": parser.get_format_instructions(),
    }
)

What it does:

Wraps any Python function to make it part of the chain
Transforms input data before it reaches the prompt
Can be synchronous or asynchronous

Why we use it:

Text normalization (clean whitespace, encoding)
Injecting dynamic values (format_instructions)
Data transformation that doesn't need an LLM

ChatPromptTemplate: Structured Prompts

prompt = ChatPromptTemplate.from_messages([
    ("system", system),
    ("user", "Analyze the sentiment..."),
])

Benefits:

Separates system instructions from user input
Supports conversation history (multi-turn)
Handles variable substitution automatically
Works with any chat model (OpenAI, Anthropic, etc.)

Message Types:

system: Instructions and context (sent once)
user: Actual input from the user
assistant: Previous responses (for conversations)
human: Alias for user messages

LLM Invocation

llm = ChatOpenAI(model="gpt-4o-mini")

What happens:

Prompt template is rendered with actual values
Messages are formatted according to the model's expected format
API call is made to OpenAI
Response is returned as an AIMessage object

Model Selection:

gpt-4o-mini: Fast, cost-effective, good for sentiment analysis
gpt-4o: More capable, better for complex reasoning
gpt-3.5-turbo: Older, cheaper, less accurate

PydanticOutputParser: Structured Parsing

parser = PydanticOutputParser(pydantic_object=ReviewSentiment)

What it does:

Extracts text from the LLM's response
Parses JSON from the text
Validates against the Pydantic model
Returns a typed Python object

Error Handling:

If JSON is malformed, raises a parsing error
If validation fails, raises a validation error
You can catch these and retry or handle gracefully

2. Structured Output with Pydantic

We use Pydantic models to ensure type safety and structured outputs:

class SentimentLabel(str, Enum):
    positive = "positive"
    neutral = "neutral"
    negative = "negative"

class AspectSentiment(BaseModel):
    aspect: str = Field(..., description="The aspect/topic mentioned")
    label: SentimentLabel
    score: float = Field(..., ge=-1.0, le=1.0)

class ReviewSentiment(BaseModel):
    text: str
    language: Optional[str] = None
    label: SentimentLabel
    score: float = Field(..., ge=-1.0, le=1.0)
    rationale: str = Field(..., description="Short explanation")
    aspects: List[AspectSentiment] = Field(default_factory=list)

Why Pydantic?

✅ Automatic validation
✅ Type hints for better IDE support
✅ Generates JSON schema for LLM instructions
✅ Ensures the LLM returns data in the expected format

3. Batch Processing

One of LCEL's superpowers is easy batch processing:

async def analyze_texts(payload: AnalyzeTextRequest) -> List[ReviewSentiment]:
    chain = build_sentiment_chain(payload.model_name)
    inputs = [
        {"text": t, "language_hint": payload.language_hint} 
        for t in payload.texts
    ]
    # Process all reviews in parallel!
    results: List[ReviewSentiment] = await chain.abatch(inputs)
    return results

The abatch() method automatically handles:

Parallel API calls
Rate limiting
Error handling
Progress tracking

4. Google Places Integration

We fetch real reviews from Google Places API:

class GooglePlacesClient:
    async def find_place_id(self, query: str) -> Optional[str]:
        """Find a place by text query"""
        url = f"{self._base}/findplacefromtext/json"
        params = {
            "input": query,
            "inputtype": "textquery",
            "fields": "place_id",
            "key": self.api_key,
        }
        async with httpx.AsyncClient(timeout=self.timeout) as client:
            r = await client.get(url, params=params)
            data = r.json()
        candidates = data.get("candidates") or []
        return candidates[0]["place_id"] if candidates else None

    async def fetch_reviews(self, place_id: str, *, limit: int = 10) -> List[BusinessReview]:
        """Fetch reviews for a place"""
        url = f"{self._base}/details/json"
        params = {
            "place_id": place_id,
            "fields": "reviews",
            "key": self.api_key,
        }
        # ... fetch and parse reviews

5. FastAPI Endpoints

We expose our functionality through REST API endpoints:

@app.post("/google/analyze-by-query", response_model=AnalyzeByQueryResponse)
async def google_analyze_by_query(body: AnalyzeByQueryRequest):
    # Step 1: Find the place
    client = GooglePlacesClient(settings.google_maps_api_key)
    place_id = await client.find_place_id(body.query)

    # Step 2: Fetch reviews
    reviews = await client.fetch_reviews(place_id, limit=body.limit)
    texts = [r.text for r in reviews if r.text and r.text.strip()]

    # Step 3: Analyze sentiment
    analysis = await analyze_texts(
        AnalyzeTextRequest(texts=texts, language_hint=body.language)
    )

    return AnalyzeByQueryResponse(
        query=body.query,
        place_id=place_id,
        review_count=len(texts),
        results=analysis,
    )

🔄 Understanding the Data Flow

Let's trace through what happens when a user requests sentiment analysis:

Step-by-Step Execution Flow

1. User Input

"Analyze reviews for Joe's Pizza in New York"

2. Place Discovery
The application uses Google Places API to find the business:

Converts the query to a place_id (a unique identifier)
This place_id is stable and doesn't change over time

3. Review Retrieval
Once we have the place_id, we fetch reviews:

Google Places API returns up to 5 reviews per request
Each review contains: text, author, rating, timestamp
We extract just the text for analysis

4. Text Preprocessing
Before sending to the LLM, we normalize the text:

Remove extra whitespace
Handle encoding issues
Prepare for batch processing

5. LCEL Chain Execution
This is where the magic happens:

Input: {"text": "Amazing pizza!", "language_hint": "en"}
  ↓
Preprocess: Normalize and add format instructions
  ↓
Prompt: Build system + user message with instructions
  ↓
LLM: OpenAI GPT processes and generates JSON
  ↓
Parser: Pydantic validates and structures the output
  ↓
Output: ReviewSentiment object with label, score, aspects

6. Result Aggregation
For multiple reviews:

Each review is processed in parallel (thanks to abatch())
Results are collected and returned as a list
The API response includes metadata (place_id, review_count)

Why This Architecture Works

Separation of Concerns:

Google Places client handles external API calls
LCEL chain handles AI processing
FastAPI handles HTTP requests/responses
Pydantic ensures data validation at every step

Async/Await Benefits:

Multiple reviews processed concurrently
Non-blocking I/O operations
Better resource utilization
Faster response times

🎨 Deep Dive: Prompt Engineering

The quality of your LLM output depends heavily on prompt design. Let's break down our prompt strategy:

System Message Design

system = (
    "You are a precise sentiment analyst. "  # Role definition
    "Return a single JSON object strictly matching the given schema.\n"  # Format requirement
    "{format_instructions}\n"  # Auto-generated from Pydantic
    "Scoring: negative=-1..-0.05, neutral≈-0.05..0.05, positive=0.05..1.0.\n"  # Clear scoring rules
    "Keep rationale concise. Extract aspects if present; otherwise return an empty list."  # Output guidelines
)

Key Elements:

Role Definition: "You are a precise sentiment analyst"
- Sets context for the LLM
- Helps it adopt the right "persona"
Format Instructions: Automatically generated from Pydantic schema
- Ensures the LLM knows the exact JSON structure
- Includes field descriptions and constraints
Scoring Guidelines: Explicit ranges for each sentiment
- Prevents ambiguous scores
- Ensures consistency across analyses
Output Guidelines: Instructions for optional fields
- Tells the LLM when to extract aspects
- Guides the rationale length

User Message Template

"Analyze the sentiment of the following review.\n"
"Language hint: {language_hint}\n"
"Text: ```

{text}

```"

Why This Works:

Clear task instruction
Language hint helps with multilingual reviews
Triple backticks ( ```) help the LLM identify the text clearly
Simple, direct, no ambiguity

Format Instructions Magic

The PydanticOutputParser automatically generates instructions like:



The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "type": "integer"}, "bar": {"title": "Bar", "type": "string"}}, "required": ["foo", "bar"]}
the object {"foo": 5, "bar": "baz"} is a well-formatted instance of the schema.

Schema:
{
  "properties": {
    "text": {"title": "Text", "type": "string"},
    "label": {"title": "Label", "enum": ["positive", "neutral", "negative"]},
    "score": {"title": "Score", "type": "number", "minimum": -1.0, "maximum": 1.0},
    ...
  },
  "required": ["text", "label", "score", "rationale", "aspects"]
}

This ensures the LLM returns valid JSON that matches our Pydantic model exactly!

🚀 Performance Considerations

Batch Processing Benefits

When analyzing multiple reviews, abatch() provides:

Parallel Execution:


python
# Sequential (slow):
for text in texts:
    result = await chain.ainvoke({"text": text})  # Wait for each

# Parallel (fast):
results = await chain.abatch(inputs)  # All at once

Automatic Rate Limiting:

LangChain respects API rate limits
Queues requests if needed
Prevents API errors from too many requests

Error Resilience:

Individual failures don't stop the batch
You can handle errors per item
Partial results are still useful

Cost Optimization

Token Usage:

System prompt: ~100 tokens (sent once per request)
User prompt: ~50-200 tokens per review
Response: ~100-300 tokens per review

Strategies:

Batch processing: Reduces overhead
Model selection: gpt-4o-mini is 10x cheaper than gpt-4o
Prompt optimization: Shorter prompts = lower costs
Caching: Cache results for identical reviews

Async/Await Patterns

Why async matters:


python
# Synchronous (blocking):
result1 = analyze(text1)  # Wait 2 seconds
result2 = analyze(text2)  # Wait 2 seconds
# Total: 4 seconds

# Asynchronous (non-blocking):
results = await asyncio.gather(
    analyze(text1),  # Start immediately
    analyze(text2)   # Start immediately
)
# Total: ~2 seconds (parallel)

Best Practices:

Use async def for all I/O operations
Use await for LLM calls, API calls, database queries
Use asyncio.gather() for independent operations
Use abatch() for LangChain chains

🎓 Key Learnings

1. LCEL Chain Composition

The pipe operator (|) makes it easy to compose complex workflows:


python
chain = preprocess | prompt | llm | parser

Each component transforms the data and passes it to the next. This is much cleaner than nested function calls!

2. Structured Outputs

Using Pydantic models with PydanticOutputParser ensures:

The LLM knows exactly what format to return
Automatic validation of responses
Type safety throughout your application

3. Async/Await Patterns

LCEL chains are async by default. Use ainvoke() for single items and abatch() for multiple items:


python
# Single item
result = await chain.ainvoke({"text": "Great service!"})

# Multiple items (parallel processing)
results = await chain.abatch([
    {"text": "Great!"},
    {"text": "Terrible!"},
    {"text": "Okay"}
])

4. Prompt Engineering

Good prompts are crucial. Notice how we:

Provide clear instructions in the system message
Include format instructions automatically
Give examples of scoring ranges
Ask for specific outputs (aspects, rationale)

5. Error Handling

Always handle API failures gracefully:


python
try:
    place_id = await client.find_place_id(query)
    if not place_id:
        raise HTTPException(status_code=404, detail="Place not found")
except Exception as e:
    raise HTTPException(status_code=500, detail=str(e))

🎯 Conclusion

You've learned how to:

✅ Build composable chains with LangChain LCEL
✅ Use structured outputs with Pydantic
✅ Process data in batches efficiently
✅ Integrate external APIs (Google Places)
✅ Build a complete AI application

LangChain LCEL is a powerful pattern that makes building LLM applications much more maintainable and composable. The pipe operator (|) creates a clean, readable flow that's easy to debug and extend.

This project demonstrates real-world patterns you'll use in production AI applications. The combination of LCEL, Pydantic, and async/await creates a robust foundation for any LLM-powered service.

Next Steps:

Experiment with different prompts
Try different LLM models
Add your own features
Deploy to production

Happy building! 🚀

For installation instructions, API examples, and more details, check out the repository. If you found this helpful, star it and share your own projects!

DEV Community