DEV Community

Cover image for Building Your First Sentiment Analysis App with LangChain LCEL and OpenAI
Jaime Lucena Pérez
Jaime Lucena Pérez

Posted on

Building Your First Sentiment Analysis App with LangChain LCEL and OpenAI

Learn how to build a production-ready sentiment analysis application using LangChain's Expression Language (LCEL) and OpenAI GPT models

🔗 Repository: github.com/JaimeLucena/google-reviews-sentiment


🎯 Introduction

Have you ever wondered how businesses automatically understand customer feedback? Or how AI systems can analyze thousands of reviews and extract meaningful insights?

In this tutorial, we'll build a complete sentiment analysis application that:

  • Fetches real Google Business reviews
  • Analyzes sentiment using OpenAI's GPT models
  • Extracts key aspects (service, food, price, etc.)
  • Provides actionable insights

This is a perfect project for learning LangChain LCEL (LangChain Expression Language), one of the most powerful patterns for building LLM applications. By the end, you'll understand how to compose chains, handle structured outputs, and build real-world AI applications.


🤔 What is Sentiment Analysis with LLMs?

Traditional sentiment analysis uses rule-based systems or machine learning models trained on specific datasets. Modern LLM-based approaches are more flexible because:

  1. No training required: LLMs already understand language nuances
  2. Multi-language support: Works across languages without retraining
  3. Context understanding: Can understand sarcasm, context, and subtle meanings
  4. Structured extraction: Can extract specific aspects (service, food, price) automatically

Instead of just saying "positive" or "negative", we can get:

  • A sentiment score from -1 to +1
  • Specific aspects mentioned (service, food quality, price)
  • A rationale explaining why
  • Language detection

🧠 What is LangChain LCEL?

LCEL (LangChain Expression Language) is a declarative way to compose chains using Python's | operator. Think of it like Unix pipes, but for AI workflows.

Why LCEL?

Before LCEL (traditional approach):

def analyze_sentiment(text):
    # Step 1: Preprocess
    cleaned = clean_text(text)

    # Step 2: Build prompt
    prompt = build_prompt(cleaned)

    # Step 3: Call LLM
    response = llm.invoke(prompt)

    # Step 4: Parse response
    result = parse_response(response)

    return result
Enter fullscreen mode Exit fullscreen mode

With LCEL (composable chains):

chain = preprocess | build_prompt | llm | parse_response
result = await chain.ainvoke({"text": text})
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • Composable: Mix and match components easily
  • Async by default: Built-in support for async/await
  • Batch processing: Process multiple items efficiently
  • Type-safe: Works seamlessly with Pydantic models
  • Streaming: Built-in support for streaming responses

🏗️ Project Architecture

Let's understand how our application works:

User Query: "Analyze reviews for Joe's Pizza"
    │
    ▼
┌─────────────────────────────────────┐
│   Google Places API                 │
│   - Find place by query             │
│   - Fetch business reviews          │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│   LangChain LCEL Pipeline           │
│   ┌───────────────────────────────┐ │
│   │ 1. Preprocess text            │ │
│   └──────────────┬────────────────┘ │
│                  │                   │
│   ┌──────────────▼────────────────┐ │
│   │ 2. Build prompt with          │ │
│   │    format instructions        │ │
│   └──────────────┬────────────────┘ │
│                  │                   │
│   ┌──────────────▼────────────────┐ │
│   │ 3. Call OpenAI GPT            │ │
│   └──────────────┬────────────────┘ │
│                  │                   │
│   ┌──────────────▼────────────────┐ │
│   │ 4. Parse structured output    │ │
│   │    (Pydantic validation)      │ │
│   └──────────────┬────────────────┘ │
└──────────────────┼───────────────────┘
                   │
                   ▼
         Sentiment Results
Enter fullscreen mode Exit fullscreen mode

💻 Code Walkthrough

Let's dive into the key components of our application.

1. Building the Sentiment Analysis Chain

The heart of our application is in app/chains.py. Here's how we build our LCEL chain:

def build_sentiment_chain(model_name: str | None = None):
    """
    LCEL chain: (preprocess) -> prompt -> llm -> Pydantic parser
    """
    # Step 1: Create a Pydantic output parser
    parser = PydanticOutputParser(pydantic_object=ReviewSentiment)

    # Step 2: Define the system prompt
    system = (
        "You are a precise sentiment analyst. "
        "Return a single JSON object strictly matching the given schema.\n"
        "{format_instructions}\n"
        "Scoring: negative=-1..-0.05, neutral≈-0.05..0.05, positive=0.05..1.0.\n"
        "Keep rationale concise. Extract aspects if present; otherwise return an empty list."
    )

    # Step 3: Create the prompt template
    prompt = ChatPromptTemplate.from_messages([
        ("system", system),
        ("user", "Analyze the sentiment of the following review.\n"
                 "Language hint: {language_hint}\n"
                 "Text: ```

{text}

```"),
    ])

    # Step 4: Initialize the LLM
    llm = build_llm(model_name)

    # Step 5: Preprocessing step
    pre = RunnableLambda(
        lambda x: {
            "text": normalize_text(x["text"]),
            "language_hint": x.get("language_hint"),
            "format_instructions": parser.get_format_instructions(),
        }
    )

    # Step 6: Compose the chain using the pipe operator
    chain = pre | prompt | llm | parser
    return chain
Enter fullscreen mode Exit fullscreen mode

Key Concepts:

  1. PydanticOutputParser: Automatically generates format instructions from your Pydantic model and parses the LLM response into a typed object.

  2. ChatPromptTemplate: Creates structured prompts with system and user messages. The {format_instructions} placeholder gets filled with JSON schema instructions.

  3. RunnableLambda: Wraps a Python function to make it part of the chain. Here, we use it for text preprocessing.

  4. Pipe Operator (|): Composes the chain. Data flows from left to right through each component.

Understanding Each Component in Detail

Now let's dive deeper into each component we just used:

RunnableLambda: Custom Processing

pre = RunnableLambda(
    lambda x: {
        "text": normalize_text(x["text"]),
        "language_hint": x.get("language_hint"),
        "format_instructions": parser.get_format_instructions(),
    }
)
Enter fullscreen mode Exit fullscreen mode

What it does:

  • Wraps any Python function to make it part of the chain
  • Transforms input data before it reaches the prompt
  • Can be synchronous or asynchronous

Why we use it:

  • Text normalization (clean whitespace, encoding)
  • Injecting dynamic values (format_instructions)
  • Data transformation that doesn't need an LLM

ChatPromptTemplate: Structured Prompts

prompt = ChatPromptTemplate.from_messages([
    ("system", system),
    ("user", "Analyze the sentiment..."),
])
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • Separates system instructions from user input
  • Supports conversation history (multi-turn)
  • Handles variable substitution automatically
  • Works with any chat model (OpenAI, Anthropic, etc.)

Message Types:

  • system: Instructions and context (sent once)
  • user: Actual input from the user
  • assistant: Previous responses (for conversations)
  • human: Alias for user messages

LLM Invocation

llm = ChatOpenAI(model="gpt-4o-mini")
Enter fullscreen mode Exit fullscreen mode

What happens:

  1. Prompt template is rendered with actual values
  2. Messages are formatted according to the model's expected format
  3. API call is made to OpenAI
  4. Response is returned as an AIMessage object

Model Selection:

  • gpt-4o-mini: Fast, cost-effective, good for sentiment analysis
  • gpt-4o: More capable, better for complex reasoning
  • gpt-3.5-turbo: Older, cheaper, less accurate

PydanticOutputParser: Structured Parsing

parser = PydanticOutputParser(pydantic_object=ReviewSentiment)
Enter fullscreen mode Exit fullscreen mode

What it does:

  1. Extracts text from the LLM's response
  2. Parses JSON from the text
  3. Validates against the Pydantic model
  4. Returns a typed Python object

Error Handling:

  • If JSON is malformed, raises a parsing error
  • If validation fails, raises a validation error
  • You can catch these and retry or handle gracefully

2. Structured Output with Pydantic

We use Pydantic models to ensure type safety and structured outputs:

class SentimentLabel(str, Enum):
    positive = "positive"
    neutral = "neutral"
    negative = "negative"

class AspectSentiment(BaseModel):
    aspect: str = Field(..., description="The aspect/topic mentioned")
    label: SentimentLabel
    score: float = Field(..., ge=-1.0, le=1.0)

class ReviewSentiment(BaseModel):
    text: str
    language: Optional[str] = None
    label: SentimentLabel
    score: float = Field(..., ge=-1.0, le=1.0)
    rationale: str = Field(..., description="Short explanation")
    aspects: List[AspectSentiment] = Field(default_factory=list)
Enter fullscreen mode Exit fullscreen mode

Why Pydantic?

  • ✅ Automatic validation
  • ✅ Type hints for better IDE support
  • ✅ Generates JSON schema for LLM instructions
  • ✅ Ensures the LLM returns data in the expected format

3. Batch Processing

One of LCEL's superpowers is easy batch processing:

async def analyze_texts(payload: AnalyzeTextRequest) -> List[ReviewSentiment]:
    chain = build_sentiment_chain(payload.model_name)
    inputs = [
        {"text": t, "language_hint": payload.language_hint} 
        for t in payload.texts
    ]
    # Process all reviews in parallel!
    results: List[ReviewSentiment] = await chain.abatch(inputs)
    return results
Enter fullscreen mode Exit fullscreen mode

The abatch() method automatically handles:

  • Parallel API calls
  • Rate limiting
  • Error handling
  • Progress tracking

4. Google Places Integration

We fetch real reviews from Google Places API:

class GooglePlacesClient:
    async def find_place_id(self, query: str) -> Optional[str]:
        """Find a place by text query"""
        url = f"{self._base}/findplacefromtext/json"
        params = {
            "input": query,
            "inputtype": "textquery",
            "fields": "place_id",
            "key": self.api_key,
        }
        async with httpx.AsyncClient(timeout=self.timeout) as client:
            r = await client.get(url, params=params)
            data = r.json()
        candidates = data.get("candidates") or []
        return candidates[0]["place_id"] if candidates else None

    async def fetch_reviews(self, place_id: str, *, limit: int = 10) -> List[BusinessReview]:
        """Fetch reviews for a place"""
        url = f"{self._base}/details/json"
        params = {
            "place_id": place_id,
            "fields": "reviews",
            "key": self.api_key,
        }
        # ... fetch and parse reviews
Enter fullscreen mode Exit fullscreen mode

5. FastAPI Endpoints

We expose our functionality through REST API endpoints:

@app.post("/google/analyze-by-query", response_model=AnalyzeByQueryResponse)
async def google_analyze_by_query(body: AnalyzeByQueryRequest):
    # Step 1: Find the place
    client = GooglePlacesClient(settings.google_maps_api_key)
    place_id = await client.find_place_id(body.query)

    # Step 2: Fetch reviews
    reviews = await client.fetch_reviews(place_id, limit=body.limit)
    texts = [r.text for r in reviews if r.text and r.text.strip()]

    # Step 3: Analyze sentiment
    analysis = await analyze_texts(
        AnalyzeTextRequest(texts=texts, language_hint=body.language)
    )

    return AnalyzeByQueryResponse(
        query=body.query,
        place_id=place_id,
        review_count=len(texts),
        results=analysis,
    )
Enter fullscreen mode Exit fullscreen mode

🔄 Understanding the Data Flow

Let's trace through what happens when a user requests sentiment analysis:

Step-by-Step Execution Flow

1. User Input

"Analyze reviews for Joe's Pizza in New York"
Enter fullscreen mode Exit fullscreen mode

2. Place Discovery
The application uses Google Places API to find the business:

  • Converts the query to a place_id (a unique identifier)
  • This place_id is stable and doesn't change over time

3. Review Retrieval
Once we have the place_id, we fetch reviews:

  • Google Places API returns up to 5 reviews per request
  • Each review contains: text, author, rating, timestamp
  • We extract just the text for analysis

4. Text Preprocessing
Before sending to the LLM, we normalize the text:

  • Remove extra whitespace
  • Handle encoding issues
  • Prepare for batch processing

5. LCEL Chain Execution
This is where the magic happens:

Input: {"text": "Amazing pizza!", "language_hint": "en"}
  ↓
Preprocess: Normalize and add format instructions
  ↓
Prompt: Build system + user message with instructions
  ↓
LLM: OpenAI GPT processes and generates JSON
  ↓
Parser: Pydantic validates and structures the output
  ↓
Output: ReviewSentiment object with label, score, aspects
Enter fullscreen mode Exit fullscreen mode

6. Result Aggregation
For multiple reviews:

  • Each review is processed in parallel (thanks to abatch())
  • Results are collected and returned as a list
  • The API response includes metadata (place_id, review_count)

Why This Architecture Works

Separation of Concerns:

  • Google Places client handles external API calls
  • LCEL chain handles AI processing
  • FastAPI handles HTTP requests/responses
  • Pydantic ensures data validation at every step

Async/Await Benefits:

  • Multiple reviews processed concurrently
  • Non-blocking I/O operations
  • Better resource utilization
  • Faster response times

🎨 Deep Dive: Prompt Engineering

The quality of your LLM output depends heavily on prompt design. Let's break down our prompt strategy:

System Message Design

system = (
    "You are a precise sentiment analyst. "  # Role definition
    "Return a single JSON object strictly matching the given schema.\n"  # Format requirement
    "{format_instructions}\n"  # Auto-generated from Pydantic
    "Scoring: negative=-1..-0.05, neutral≈-0.05..0.05, positive=0.05..1.0.\n"  # Clear scoring rules
    "Keep rationale concise. Extract aspects if present; otherwise return an empty list."  # Output guidelines
)
Enter fullscreen mode Exit fullscreen mode

Key Elements:

  1. Role Definition: "You are a precise sentiment analyst"

    • Sets context for the LLM
    • Helps it adopt the right "persona"
  2. Format Instructions: Automatically generated from Pydantic schema

    • Ensures the LLM knows the exact JSON structure
    • Includes field descriptions and constraints
  3. Scoring Guidelines: Explicit ranges for each sentiment

    • Prevents ambiguous scores
    • Ensures consistency across analyses
  4. Output Guidelines: Instructions for optional fields

    • Tells the LLM when to extract aspects
    • Guides the rationale length

User Message Template

"Analyze the sentiment of the following review.\n"
"Language hint: {language_hint}\n"
"Text: ```

{text}

```"
Enter fullscreen mode Exit fullscreen mode

Why This Works:

  • Clear task instruction
  • Language hint helps with multilingual reviews
  • Triple backticks ( ```) help the LLM identify the text clearly
  • Simple, direct, no ambiguity

Format Instructions Magic

The PydanticOutputParser automatically generates instructions like:



The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "type": "integer"}, "bar": {"title": "Bar", "type": "string"}}, "required": ["foo", "bar"]}
the object {"foo": 5, "bar": "baz"} is a well-formatted instance of the schema.

Schema:
{
  "properties": {
    "text": {"title": "Text", "type": "string"},
    "label": {"title": "Label", "enum": ["positive", "neutral", "negative"]},
    "score": {"title": "Score", "type": "number", "minimum": -1.0, "maximum": 1.0},
    ...
  },
  "required": ["text", "label", "score", "rationale", "aspects"]
}


Enter fullscreen mode Exit fullscreen mode

This ensures the LLM returns valid JSON that matches our Pydantic model exactly!


🚀 Performance Considerations

Batch Processing Benefits

When analyzing multiple reviews, abatch() provides:

Parallel Execution:


python
# Sequential (slow):
for text in texts:
    result = await chain.ainvoke({"text": text})  # Wait for each

# Parallel (fast):
results = await chain.abatch(inputs)  # All at once


Enter fullscreen mode Exit fullscreen mode

Automatic Rate Limiting:

  • LangChain respects API rate limits
  • Queues requests if needed
  • Prevents API errors from too many requests

Error Resilience:

  • Individual failures don't stop the batch
  • You can handle errors per item
  • Partial results are still useful

Cost Optimization

Token Usage:

  • System prompt: ~100 tokens (sent once per request)
  • User prompt: ~50-200 tokens per review
  • Response: ~100-300 tokens per review

Strategies:

  1. Batch processing: Reduces overhead
  2. Model selection: gpt-4o-mini is 10x cheaper than gpt-4o
  3. Prompt optimization: Shorter prompts = lower costs
  4. Caching: Cache results for identical reviews

Async/Await Patterns

Why async matters:


python
# Synchronous (blocking):
result1 = analyze(text1)  # Wait 2 seconds
result2 = analyze(text2)  # Wait 2 seconds
# Total: 4 seconds

# Asynchronous (non-blocking):
results = await asyncio.gather(
    analyze(text1),  # Start immediately
    analyze(text2)   # Start immediately
)
# Total: ~2 seconds (parallel)


Enter fullscreen mode Exit fullscreen mode

Best Practices:

  • Use async def for all I/O operations
  • Use await for LLM calls, API calls, database queries
  • Use asyncio.gather() for independent operations
  • Use abatch() for LangChain chains

🎓 Key Learnings

1. LCEL Chain Composition

The pipe operator (|) makes it easy to compose complex workflows:


python
chain = preprocess | prompt | llm | parser


Enter fullscreen mode Exit fullscreen mode

Each component transforms the data and passes it to the next. This is much cleaner than nested function calls!

2. Structured Outputs

Using Pydantic models with PydanticOutputParser ensures:

  • The LLM knows exactly what format to return
  • Automatic validation of responses
  • Type safety throughout your application

3. Async/Await Patterns

LCEL chains are async by default. Use ainvoke() for single items and abatch() for multiple items:


python
# Single item
result = await chain.ainvoke({"text": "Great service!"})

# Multiple items (parallel processing)
results = await chain.abatch([
    {"text": "Great!"},
    {"text": "Terrible!"},
    {"text": "Okay"}
])


Enter fullscreen mode Exit fullscreen mode

4. Prompt Engineering

Good prompts are crucial. Notice how we:

  • Provide clear instructions in the system message
  • Include format instructions automatically
  • Give examples of scoring ranges
  • Ask for specific outputs (aspects, rationale)

5. Error Handling

Always handle API failures gracefully:


python
try:
    place_id = await client.find_place_id(query)
    if not place_id:
        raise HTTPException(status_code=404, detail="Place not found")
except Exception as e:
    raise HTTPException(status_code=500, detail=str(e))


Enter fullscreen mode Exit fullscreen mode

🎯 Conclusion

You've learned how to:

  • ✅ Build composable chains with LangChain LCEL
  • ✅ Use structured outputs with Pydantic
  • ✅ Process data in batches efficiently
  • ✅ Integrate external APIs (Google Places)
  • ✅ Build a complete AI application

LangChain LCEL is a powerful pattern that makes building LLM applications much more maintainable and composable. The pipe operator (|) creates a clean, readable flow that's easy to debug and extend.

This project demonstrates real-world patterns you'll use in production AI applications. The combination of LCEL, Pydantic, and async/await creates a robust foundation for any LLM-powered service.

Next Steps:

  • Experiment with different prompts
  • Try different LLM models
  • Add your own features
  • Deploy to production

Happy building! 🚀


For installation instructions, API examples, and more details, check out the repository. If you found this helpful, star it and share your own projects!

Top comments (0)