Chandrani Mukherjee

Posted on Jul 5

Smarter Chatbots: Fixing Missing Data and Hallucinations with LangChain

#ai #chatgpt #programming #softwaredevelopment

Building robust conversational AI systems often requires solving two foundational issues:

Gracefully handling missing or incomplete user input (e.g., “Add 3” instead of “Add 3 and 4”).
Mitigating hallucination: Ensuring the model doesn’t generate plausible-sounding, but factually incorrect, responses.
Let’s see how you can use LangChain and OpenAI’s API (with “structured tools”) to address these pain points, using a simple example: an agent that adds two numbers.

🤔 The Problem

Suppose you’ve deployed an OpenAI-based assistant that can perform arithmetic via a tool function. A naive implementation might fail when a user omits arguments (“Add 3”), or even worse, the model may hallucinate — making up numbers or answers.

We want:

Missing value detection:

The system should prompt for any arguments it needs.

Hallucination mitigation:

If clarification is needed, interactively get it from the user, instead of letting the model guess.

🧱 The Solution: LangChain, Structured Tools, and Similarity Checking

Using LangChain’s agent system, you can wrap business logic (say, a Python function) as a first-class tool. Structured input models allow you to enforce argument schemas (with Pydantic), making it easy to catch missing data.

To further reduce hallucination, you can compare the agent’s responses against expected clarification prompts using spaCy’s semantic similarity. If the agent’s response looks like it’s asking for clarification, prompt the user for the missing information.

Here’s what it looks like (annotated):

from langchain_openai import ChatOpenAI

from langchain.agents import initialize_agent, AgentType, Tool

from langchain.tools import StructuredTool

from langchain.memory import ConversationBufferMemory

from pydantic.v1 import BaseModel, Field

import spacy

Define a schema for your tool’s arguments.

class MathInput(BaseModel):

a: int = Field(…, description=”First number”)

b: int = Field(…, description=”Second number”)

Your actual addition business logic.

def add_numbers(a: int, b: int=None) -> str:

return str(a + b)

#Wrap the function in a LangChain StructuredTool, providing schema.

add_tool = StructuredTool.from_function(

add_numbers,

name=”AddTwoNumbers”,

description=”Add two numbers (even if not all arguments are passed).”,

args_schema=MathInput

)

memory = ConversationBufferMemory(memory_key=”chat_history”, return_messages=True)

import os

os.environ[“OPENAI_API_KEY”] = “sk-…” # Set your API key

llm = ChatOpenAI(temperature=0)

Initialize the agent with your tool and memory.

agent = initialize_agent(

tools=[add_tool],

llm=llm,

agent=AgentType.OPENAI_FUNCTIONS,

memory=memory,

verbose=True

)

Use spaCy for response similarity checks (clarification detection).

def similarity_check(input1, input2):

nlp = spacy.load(“en_core_web_md”) # Make sure to download this model!

doc1 = nlp(input1)

doc2 = nlp(input2)

print(“spaCy similarity:”, doc1.similarity(doc2))

return doc1.similarity(doc2)

Dialogue Loop: Ask, handle missing args, avoid hallucination.

input1 = “Add 3”

result1 = agent.run(input1)

print(result1)

if similarity_check(“needs another argument or parameter”, result1) > .50:

print(agent.run({“input”: “What was the first number?”}))

input2 = “Second number is 4”

To reduce hallucination, re-pass the original question, now with the missing info clarified.

result2 = agent.run(input1 + input2)

print(result2)`

⚡️ Key Points

Structured tools and schemas: These force the model (and LangChain agent) to respect required arguments, catching missing values at the interface boundary.
Clarification via similarity: Use semantic similarity (spaCy, or OpenAI embeddings) to detect if the model is seeking clarification rather than hallucinating or making up a number.
Context-aware history: If you simply tell the model “Second number is 4,” it might hallucinate the question. By always resending the original question with the clarification, you help the LLM stay grounded.

🛠️ Best Practices

Never concatenate ambiguous clarifications (“Second number is 4”) without context. Always pair with the original user intent.
Tune your threshold in the similarity check to balance coverage and false positives (try .4 to .6).
Structured tool descriptions are your friend — describe explicitly how tools should behave if fields are missing.

⚡️ Wrapping Up

Using a combination of structured argument validation (LangChain + Pydantic) and response similarity checks, you can build conversational OpenAI agents that handle missing data gracefully and minimize hallucinated content. These tools are essential for production-grade conversational AI!

Want to go further?

Experiment with more tools, robust paraphrase detection, or other LLMs (Anthropic, Azure, etc.)
LangChain documentation
spaCy similarity docs
If you like my post please follow me on Linkedin https://www.linkedin.com/in/chandrani-mukherjee-usa-nj/

Top comments (1)

Lucas Henry • Jul 23

Using LangChain memory like this is super helpful for continuity in chat UX.