In this Story, I have a super quick tutorial showing you how to create a multi-agent chatbot using LangGraph, Context Engineering, and Kimi K2 to build a powerful agent chatbot for your business or personal use.
The world of AI is at a historic turning point. Moonshot launches a trillion-parameter AI agent that actually completes tasks, not just talks, but also acts. While the AI community discovered Context Engineering is a systematic approach to “making things clear”. It makes the Agent no longer just a conversation tool that responds to prompts, but an intelligent entity with understanding, memory, and action capabilities.
Open source giant models, agents with memory, real-world automation, and super-fast long-text context models: all of these are happening at the same time.
Kimi K2 has the potential to fundamentally change the face of AI. It’s a gargantuan model of 1 trillion parameters, but thanks to a clever Mixture of Experts (MoE) technique, it only uses 32 billion parameters per token generation. This gives you the power of a huge model without the huge costs.
K2 is trained on a massive amount of multilingual and diverse media data consisting of 1.5 trillion tokens, and uses a custom optimiser called “Muon Clip” to maintain its stability.
Building powerful and reliable AI agents is becoming less and less dependent on finding a magical prompt or waiting for the next model update. The real key lies in context engineering — delivering the right information and tools, in the right format, at the right time.
So, let me give you a quick demo of a live chatbot to show you what I mean.
Check out the video
I will ask the chatbot a question: “Write about the latest AI healthcare” If you take a look at how the chatbot generates the output, you’ll see that the agent generate research plan which formats context engineering prompt template with the current timestamp and user query, and generate two complementary research subtasks with specific fields including ID, query, source type, period, domain focus, and priority.
Then calculates precise start_date and end_date timestamps from periods like “recent” or “today”. and it execute searches node enhances each subtask query with temporal keywords and runs DuckDuckGo searches, storing results mapped to subtask IDs.
Then, the agent build rag node combines search results into documents, splitting them into 1000-character chunks with 200-character overlap, generating vector embeddings, and building a FAISS vector store that enables semantic similarity search across all research findings.
and the agent generate report node aggregates all search context and prompts Kimi K2 to synthesise a comprehensive research report.
So, by the end of this Story, you will understand what makes Kimi k2 Special, why context engineering is important, and how we are going to use LangGraph, Context Engineering and Kimi k2 to create a powerful Agentic chatbot.
What Makes Kimi K2 Special
What makes Kimi K2 truly special is that it is designed not just to talk, but to act. While traditional AI models excel at explaining and chatting, they leave the actual work to the user. K2 is different.
K2 was trained in simulated conversations to solve real-world problems, learning to choose and integrate tools, write and modify code, analyse data, and complete complex tasks on its own. Once you give it a job, K2 breaks it down, figures out what needs to be done, and handles it from start to finish, without you having to give it step-by-step instructions.
Kimi K2 also has an exceptional memory capacity, able to handle context of up to 128,000 tokens, allowing it to remember very long conversations, documents, and entire workflows.
And very competitive: While top-tier models like Claude and Gemini 2.5 Pro charge over $3 per million tokens for input and up to $15 for output, Kimi K2 is priced at $0.6 for input and $2.5 for output. Additionally, it supports local deployment.
Kimi K2's shift from passive chatbots to action-focused agents is a big step forward. The discussion is no longer about which model will give the accurate response, but rather which model can be built, tested, and deployed into a working product.
It’s important to note that K2 is open source, which means that not only the biggest companies in Silicon Valley, but also research teams, startups, and developers around the world can access it, improve it, and run it on their devices.
Why is Context Engineering important?
LLM is like a genius locked in a room, unable to actively learn or manipulate the world, and unable to access real-time information or use a computer. The “raw intelligence” of a large model is not equal to an “intelligent software system”. The “intelligence” of a large model LLM is only the foundation. To transform it into a truly effective intelligent system, two keys are needed:
🔧 Tool integration: Allow the model to actually operate, search for information, remember and respond
📦 Correct context: Dynamically provide the information the model needs to know based on the task
Context management is where our AI engineers need to put the most effort when building a powerful agent. Context is not free; each token is a cost and will affect the model's behavior.
This is Context Engineering: dynamically processing “all the information and tools that the LLM needs for the current task.” Building a powerful and reliable AI system is increasingly not about finding a magic prompt or switching to a more powerful model, but about providing the right context and tools at the right time.
Let’s Start Coding
Let us now explore step by step and unravel the answer to how to automate the Knowledge Graph Schema. We will install the libraries that support the model. For this, we will do a pip install requirements
pip install requirements
Once installed, we import the important dependencies like langgraph, langchain, and langchain_community
import streamlit as st
import json
import re
from datetime import datetime, timedelta
from typing import List, Dict, Any, Optional, TypedDict
from dataclasses import dataclass, asdict
import openai
# LangGraph imports
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
# LangChain imports for RAG
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.messages import HumanMessage, AIMessage
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.tools import DuckDuckGoSearchRun
import os
Let’s built a planning agent that takes user queries and breaks them down into structured subtasks by creating a detailed prompt with clear delimiters separating instructions from user input, defining exact subtask requirements (description, priority 1–5, time estimates, dependencies), and using structured output parsing with JSON schemas to force consistent formatting instead of random text responses.
The key was extensive context engineering — not just saying “break this down” but providing specific constraints, examples, and edge cases, then using built-in tools to automatically generate schemas from JSON examples that ensure the LLM outputs reliable, structured data that can be passed to other workflow components.
class ResearchPlanner:
def __init__(self, moonshot_api_key: str):
self.client = openai.OpenAI(
api_key=moonshot_api_key,
base_url="https://api.moonshot.ai/v1"
)
self.llm = OpenAI(api_key=moonshot_api_key, model="kimi-k2-0711-preview", temperature=0.1)
self.embeddings = OpenAIEmbeddings(api_key=openai_api_key)
self.search_tool = DuckDuckGoSearchRun()
# Initialize LangGraph
self.graph = self._create_graph()
# EXACT prompt as provided - no changes
self.prompt_template = """You are an expert research planner. Your task is to break down a complex research query (delimited by <user_query></user_query>) into specific search subtasks, each focusing on a different aspect or source type.
The current date and time is: {current_time}
For each subtask, provide:
1. A unique string ID for the subtask (e.g., 'subtask_1', 'news_update')
2. A specific search query that focuses on one aspect of the main query
3. The source type to search (web, news, academic, specialized)
4. Time period relevance (today, last week, recent, past_year, all_time)
5. Domain focus if applicable (technology, science, health, etc.)
6. Priority level (1-highest to 5-lowest)
All fields (id, query, source_type, time_period, domain_focus, priority) are required for each subtask, except time_period and domain_focus which can be null if not applicable.
Create 2 subtasks that together will provide comprehensive coverage of the topic. Focus on different aspects, perspectives, or sources of information.
Each substask will include the following information:
id: str
query: str
source_type: str # e.g., "web", "news", "academic", "specialized"
time_period: Optional[str] = None # e.g., "today", "last week", "recent", "past_year", "all_time"
domain_focus: Optional[str] = None # e.g., "technology", "science", "health"
priority: int # 1 (highest) to 5 (lowest)
After obtaining the above subtasks information, you will add two extra fields. Those correspond to start_date and end_date. Infer this information given the current date and the time_period selected. start_date and end_date should use the format as in the example below:
"start_date": "2024-06-03T06:00:00.000Z",
"end_date": "2024-06-11T05:59:59.999Z",
<user_query>{user_query}</user_query>"""
After that, I create a calculate date range that takes a period string and returns a tuple of start_date end_date strings in a specific ISO 8601 format, or (None, None) If no date filtering is needed, it checks if the period is missing or set to “all_time”, in which case it returns no dates. Otherwise, it gets the current datetime and uses conditional logic to calculate the start and end of the desired range. All end dates default to the current day
def calculate_date_range(self, time_period: str) -> tuple[Optional[str], Optional[str]]:
"""Calculate start and end dates based on time period"""
if not time_period or time_period == "all_time":
return None, None
now = datetime.now()
if time_period == "today":
start = now.replace(hour=6, minute=0, second=0, microsecond=0)
end = now.replace(hour=23, minute=59, second=59, microsecond=999000)
elif time_period == "last week":
start = now - timedelta(days=7)
start = start.replace(hour=6, minute=0, second=0, microsecond=0)
end = now.replace(hour=5, minute=59, second=59, microsecond=999000)
elif time_period == "recent":
start = now - timedelta(days=30)
start = start.replace(hour=6, minute=0, second=0, microsecond=0)
end = now.replace(hour=5, minute=59, second=59, microsecond=999000)
elif time_period == "past_year":
start = now - timedelta(days=365)
start = start.replace(hour=6, minute=0, second=0, microsecond=0)
end = now.replace(hour=5, minute=59, second=59, microsecond=999000)
else:
return None, None
return start.strftime("%Y-%m-%dT%H:%M:%S.%fZ")[:-3] + "Z", end.strftime("%Y-%m-%dT%H:%M:%S.%fZ")[:-3] + "Z"
Then I create a graph to organise the research process into four sequential stages and compile it for execution. It begins by creating a new StateGraph object based on theResearchState, which likely holds shared state data across steps.
It adds four nodes and sets it plan_research as the entry point, and connects the nodes with directed edges to ensure the process flows in order. The last node generate_reportlinks to the special END state to mark completion.
def _create_graph(self) -> StateGraph:
"""Create LangGraph workflow"""
workflow = StateGraph(ResearchState)
workflow.add_node("plan_research", self._plan_research_node)
workflow.add_node("execute_searches", self._execute_searches_node)
workflow.add_node("build_rag", self._build_rag_node)
workflow.add_node("generate_report", self._generate_report_node)
workflow.set_entry_point("plan_research")
workflow.add_edge("plan_research", "execute_searches")
workflow.add_edge("execute_searches", "build_rag")
workflow.add_edge("build_rag", "generate_report")
workflow.add_edge("generate_report", END)
memory = MemorySaver()
return workflow.compile(checkpointer=memory)
So I made a plan to research node takes the user’s query from the shared state, breaks it down into focused subtasks and generates a research plan, stores those subtasks and adds the original query as a Human Message to state["messages"] for tracking.
and create execute_searches_nodeloops through each subtask, modifies the query slightly based on the period and uses the search_tool to perform the search. The results are collected into a search_results dictionary, with each subtask ID as the key and both the original subtask and results as the value. If a search fails, it records an error message instead.
def _plan_research_node(self, state: ResearchState) -> ResearchState:
"""LangGraph node for planning research"""
subtasks = self.generate_research_plan(state["user_query"])
state["subtasks"] = subtasks
state["messages"] = [HumanMessage(content=state["user_query"])]
return state
def _execute_searches_node(self, state: ResearchState) -> ResearchState:
"""LangGraph node for executing searches"""
search_results = {}
for subtask in state["subtasks"]:
try:
search_query = subtask.query
if subtask.time_period == "today":
search_query += " today"
elif subtask.time_period == "recent":
search_query += " 2024 2025"
results = self.search_tool.run(search_query)
search_results[subtask.id] = {
"subtask": asdict(subtask),
"results": results
}
except Exception as e:
search_results[subtask.id] = {
"subtask": asdict(subtask),
"results": f"Search failed: {str(e)}"
}
state["search_results"] = search_results
return state
Let's build a rag node process the search results stored in the state by extracting relevant text from each subtask — only if the results are plain strings — and format them with subtask IDs and queries into a list of documents. It then splits these documents into chunks using Recursive Character Text Splitter and creates a FAISS vector store, which is stored in state["vector_store"] for retrieval-augmented generation.
Next, I built a generate report node function to construct a full research context by iterating through each subtask’s results and formatting them with headers and queries. It made a report_prompt asking the LLM to write a structured report based on this context, summarising findings, and suggesting insights.
def _build_rag_node(self, state: ResearchState) -> ResearchState:
"""LangGraph node for building RAG vector store"""
documents = []
for subtask_id, result_data in state["search_results"].items():
if isinstance(result_data["results"], str):
doc_content = f"Subtask: {subtask_id}\n"
doc_content += f"Query: {result_data['subtask']['query']}\n"
doc_content += f"Results: {result_data['results']}\n"
documents.append(doc_content)
if documents:
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
splits = text_splitter.create_documents(documents)
vector_store = FAISS.from_documents(splits, self.embeddings)
state["vector_store"] = vector_store
return state
def _generate_report_node(self, state: ResearchState) -> ResearchState:
"""LangGraph node for generating final report"""
context = ""
for subtask_id, result_data in state["search_results"].items():
context += f"\n=== {subtask_id} ===\n"
context += f"Query: {result_data['subtask']['query']}\n"
context += f"Results: {result_data['results']}\n"
report_prompt = f"""Based on the research conducted, create a comprehensive report for the query: "{state['user_query']}"
Research Context:
{context}
Create a well-structured report that summarizes key findings from each research subtask and provides actionable insights."""
response = self.llm.invoke([HumanMessage(content=report_prompt)])
state["final_report"] = response.content
state["messages"].append(AIMessage(content=response.content))
return state
Then I create langgraph_research to set up a starting state with the user’s question and empty spots for subtasks, search results, vector data, a final report, and messages.
it runs the query through all steps: planning, searching, building a vector store, and writing a report. When it’s done, it gives back a dictionary with the original question, the subtasks made, search results, the final report, and the vector store if one was built.
The query_rag method lets you ask more questions based on the saved vector store. It finds and returns the top k document chunks that are most similar to your query. If there’s no vector store, it just returns an empty list.
def run_langgraph_research(self, user_query: str) -> Dict[str, Any]:
"""Run complete LangGraph + RAG research pipeline"""
initial_state = {
"user_query": user_query,
"subtasks": [],
"search_results": {},
"vector_store": None,
"final_report": "",
"messages": []
}
config = {"configurable": {"thread_id": "research_session"}}
final_state = self.graph.invoke(initial_state, config)
return {
"query": user_query,
"subtasks": [asdict(task) for task in final_state["subtasks"]],
"search_results": final_state["search_results"],
"report": final_state["final_report"],
"vector_store": final_state.get("vector_store")
}
def query_rag(self, query: str, vector_store, k: int = 3) -> List[str]:
"""Query the RAG vector store"""
if vector_store:
docs = vector_store.similarity_search(query, k=k)
return [doc.page_content for doc in docs]
return []
Finally, I developed extract_json_from_response to pull JSON data from the LLM’s text response. It first looks for anything inside square brackets which usually means a list of subtasks. If it finds a match, it tries to load it as JSON. If that fails, it tries to load the full response text as JSON. If both fail, it returns an empty list.
then generate research plan builds a custom prompt using the current date and the user’s query, then sends it to GPT-4 to get subtasks. It reads the response, pulls out the JSON usingextract_json_from_response, and loops through each subtask.
For each one, it calculates the date range using the time period and creates a ResearchSubtask object with fields like ID, query, type of source, domain, priority, and time range.
def extract_json_from_response(self, response_text: str) -> List[Dict]:
"""Extract JSON from LLM response"""
json_pattern = r'\[.*?\]'
matches = re.findall(json_pattern, response_text, re.DOTALL)
for match in matches:
try:
return json.loads(match)
except json.JSONDecodeError:
continue
try:
return json.loads(response_text)
except json.JSONDecodeError:
return []
def generate_research_plan(self, user_query: str) -> List[ResearchSubtask]:
"""Generate research subtasks using the LLM"""
prompt = self.prompt_template.format(
current_time=datetime.now().isoformat(),
user_query=user_query
)
response = self.client.chat.completions.create(
model="kimi-k2-0711-preview",
messages=[{"role": "user", "content": prompt}],
temperature=0.1
)
response_text = response.choices[0].message.content
subtasks_data = self.extract_json_from_response(response_text)
subtasks = []
for task_data in subtasks_data:
start_date, end_date = self.calculate_date_range(task_data.get("time_period"))
subtask = ResearchSubtask(
id=task_data.get("id", f"subtask_{len(subtasks) + 1}"),
query=task_data.get("query", ""),
source_type=task_data.get("source_type", "web"),
time_period=task_data.get("time_period"),
domain_focus=task_data.get("domain_focus"),
priority=task_data.get("priority", 3),
start_date=start_date,
end_date=end_date
)
subtasks.append(subtask)
return subtasks
Conclusion :
So, after this round of testing, I feel that Kimi K2 and context engineering have obvious advantages and disadvantages. Its strategic bet on Agentic is a very far-sighted move. But there is indeed a gap between its strong potential and its current reliable execution.
Kimi K2’s most valuable asset is its smart brain. But the value of this core asset is being consumed by unstable tool calls and ecological friction.
In general, Kimi K2 and context engineering give me the feeling of a rough jade that has not been finely polished. Its core material, that is, the intelligence of the model, is excellent, allowing us to see the great possibilities of Agentic AI in the future.
I would highly appreciate it if you
❣ Join my Patreon: https://www.patreon.com/GaoDalie_AI
Book an Appointment with me: https://topmate.io/gaodalie_ai
Support the Content (every Dollar goes back into the -video):https://buymeacoffee.com/gaodalie98d
Subscribe to the Newsletter for free:https://substack.com/@gaodalie
Top comments (0)