DEV Community

Sangeet Agarwal
Sangeet Agarwal

Posted on

1

Streamlining Routine ML Tasks with LangChain: A Hacker News Comment Analysis Example

Introduction

LangChain and its companion framework LangGraph are synonymous with building autonomous "agents" capable of complex interactions -- retrieving external data, chaining multiple tools together, to name just a few. While that's certainly the case, I noticed they streamline more mundane day-to-day ML tasks.

One example is a Hacker News comment-analysis notebook I put together. There’s nothing earth-shattering about the underlying code, but I appreciate how LangChain abstracts away the complexity of interacting with large language models (LLMs). You can see the complete notebook in the langchain_hn_comment_analysis Jupyter notebook, available here in my GitHub repo.

Below, I’ll walk you through some of the key pieces—starting with how we define our LLM with structured output, then how we handle prompt templates to direct the model’s responses.

Setting up the LLM

from langchain_ollama import ChatOllama

llm = ChatOllama(
    model="qwen2.5",
    temperature="0.0",
    verbose=False,
    keep_alive=-1,
)

llm = llm.with_structured_output(PostCommentEvaluation, method="json_schema")
Enter fullscreen mode Exit fullscreen mode

Here, ChatOllama integrates with the Ollama chat model. The parameters control various aspects:

  • model: Which LLM to use (e.g., “qwen2.5”).
  • temperature: How “creative” the model is (0.0 means very deterministic).
  • verbose: Toggles detailed logging.
  • keep_alive: Manages the LLM’s session or connection.

By calling llm.with_structured_output(PostCommentEvaluation, method="json_schema"), we tell LangChain that any response we get from the LLM should conform to the structure defined by our PostCommentEvaluation class. This eliminates the need to manually parse unstructured text or JSON—LangChain handles that for us. And if you ever switch to another chat model, LangChain’s chat model integrations let you do it with minimal code changes.

Defining the Output Schema

The PostCommentEvaluation class looks like so

from typing import List, Literal, Optional, ForwardRef
from pydantic import BaseModel, Field

class PostCommentEvaluation(BaseModel):
    """
    A class to represent the evaluation of comments.

    Attributes:
    summary (str): Summary of comments into a few sentences.
    key_points (List[str]): Key points from the comments.
    topics (List[str]): Main topics discussed in the comments.
    controversies (List[str]): Controversial takeaways from the comments.
    sentiment (Literal): Overall emotional sentiment of the comments.
    """
    summary: str = Field(..., description="Summary comments into a few sentences")
    key_points: List[str] = Field([], description="Key points from the comments")
    topics: List[str] = Field([], description="Main topics discussed in the comments")
    controversies: List[str] = Field([], description="Controversial takeaways from the comments")
    sentiment: Literal[
        "happiness", "anger", "sadness", "fear", "surprise", "disgust", "trust", "anticipation"
    ] = Field(..., description="Overall emotional sentiment of the comments")
Enter fullscreen mode Exit fullscreen mode

The PostCommentEvaluation is a Pydantic class and it provides data validation out of the box. With this the LLM's response will be in the shape of the PostCommentEvaluation class.

Crafting the Prompt

The next bit is the prompt template—let’s call it EVALUATE_POST_COMMENTS_PROMPT. This is where LangChain really clicks for me. Here’s what happens:

I define a string template with placeholders like {post}, {comments}, and {user_question}.
LangChain’s PromptTemplate lets me fill those placeholders with real data (the HN post details, a chunk of comments, and my specific query).
Because I’m using f-string formatting by default, I don’t have to worry about weird template injection issues.

A snippet might look something like:

EVALUATE_POST_COMMENTS_PROMPT = PromptTemplate.from_template(
    """
    Analyze the comments:

    {post}

    <comments>
    {comments}
    </comments>

    Please summarize, highlight key points, identify controversies, and provide an overall sentiment. 
    Respond in JSON format with the following fields:
    summary, key_points, topics, controversies, sentiment.

    <user_question>
    {user_question}
    </user_question>
    """
)

Enter fullscreen mode Exit fullscreen mode

If you’re curious about more advanced prompt patterns or want to see prompts that other folks are using, check out the LangChain Hub. It’s basically a marketplace where the community shares their custom prompt templates.

Putting It All Together
So how does this fit into the Hacker News pipeline?

  1. Fetch Data: I grab a Hacker News post’s metadata plus its comments (including nested replies) via the Hacker News Firebase API.
  2. Format Data: I have small utility functions to convert the post info and comments into a neat, template-friendly format.
  3. Generate Prompt: Using those details, I fill in {post}, {comments}, and whatever user question I have (like “What’s the overall sentiment here?”).
  4. Invoke the Model: Call llm.invoke(...) with the final prompt.
  5. Validate & Use Results: LangChain returns a structured Python object that matches my PostCommentEvaluation. For instance:
{
  "summary": "Several commenters discussed the pros and cons of open sourcing…",
  "key_points": ["Open-source concerns", "Security vulnerabilities…"],
  "topics": ["Open source", "Security", "Community engagement"],
  "controversies": ["Whether it should remain private or not"],
  "sentiment": "anticipation"
}
Enter fullscreen mode Exit fullscreen mode

Summary

LangChain may be best known for its advanced agent-based workflows, but this Hacker News project demonstrates its usefulness for simpler, day-to-day ML tasks. Not only did Pydantic schemas make it straightforward to enforce a well-defined output format, but working through the LangChain documentation also highlighted important best practices—like employing f-string templates for secure prompt construction. These lessons transfer directly to real-world ML pipelines, saving you both time and trouble when working with LLMs.

Hostinger image

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

Top comments (1)

Collapse
 
hack_burg_c36507d9d089a17 profile image
Hack Burg

The majority of information on the internet may be added to, removed, altered, and so on, and they can be useful to you in a number of ways. As a group of skilled hackers, we put your needs first. You can be confident that we will protect your privacy!.
Among the services we offer are the following ones:.
*. Consulting Services.
*. Recovery of Email Passwords.
*. Hacking Email.
*. Database Access.
*. Credit Repair [tradelines for installments and revolving loans].
*. Payment of student loans.
*. Bug bounty (penetration testing).
*. Hack a phone to snoop on anyone, including a spouse who is unfaithful.
*. Criminal record clearance.
*. Recovering lost money from capital investments and binary options.(stolen cryptocurrency recovery).
*. Removal of unwanted information on the internet.
*. Upgrade of School Grades... and much more.
If your particular job isn't on the list, you can still get in touch with us, and we'll be pleased to assist you.

CONTACT us: hackburg @ cryptolab . net

We also offer blog posts on hack news/updates, hack tutorials etc.

BLOG: hackburgblog . noblogs . org

WEBSITE: hackburg . co

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay