We shipped the AI layer for Kobin today — an agency operating system that replaces Slack, Notion, HubSpot, Linear, and Buffer. This is the technical story of how we built it and the specific decisions that made it work.
The problem with every AI feature shipped in the last two years
Every productivity tool adds AI. The result is always the same: a model that knows about the host tool and nothing else. Slack AI knows about messages. Asana AI knows about tasks. HubSpot AI knows about contacts.
None of them can answer the question "which of my clients has gone quiet, and what's blocking their project?" — because the answer requires tasks, messages, and CRM data to exist in the same context.
We decided to solve this before writing AI code. We spent the first year building the data model: tasks, projects, clients, CRM pipeline, vault files, calendar events, and real-time inbox — all in Supabase, all linked by foreign keys. Only then did we build the AI layer.
Architecture: MCP-style tools, not context dumps
The naive approach is to dump your entire workspace into a system prompt and let the model figure it out. This fails for three reasons:
- Token bloat: A workspace with 50 tasks, 10 projects, 30 CRM contacts, and 100 recent messages easily exceeds 8,000 tokens before you ask a single question.
- Hallucination risk: The more unstructured data you put in a prompt, the more the model interpolates and invents.
- Staleness: Workspace data changes constantly. A context dump taken at request time is already stale for any response that takes more than a few seconds.
Instead, we built a tool-calling architecture inspired by the Model Context Protocol (MCP). The model starts with a minimal workspace summary (~100 tokens) and calls specific read tools to fetch exactly what it needs.
// Mini context — ~100 tokens, always fresh
export async function buildMiniContext(founderId: string): Promise<string> {
const [profileRes, tasksRes, projectsRes, ...] = await Promise.all([
supabaseAdmin.from("profiles").select("full_name").eq("id", founderId).single(),
supabaseAdmin.from("tasks").select("id, status, due_date").eq("user_id", founderId).eq("is_completed", false),
// ...
])
return [
`Founder: ${profile?.full_name}`,
`Tasks: ${tasks.length} active, ${overdueCount} overdue`,
`Projects: ${activeProjects} active`,
`Calendar: ${eventsCount} events this week`,
].join('\n')
}
Then the model calls tools like get_tasks, get_team_workload, or search_contacts to drill into exactly what it needs.
The 8 read tools
export const READ_TOOLS = [
'get_workspace_overview', // High-level stats across all modules
'get_tasks', // Tasks with filter presets (overdue, blocked, due today...)
'get_projects', // Projects with task completion counts
'get_team_workload', // Team members with active task counts + workload labels
'get_crm_pipeline', // Pipeline by stage with deal values and stale flags
'get_calendar', // Events with flexible range presets
'get_vault_files', // Vault documents filterable by project name
'search_contacts', // Full contact profile by name (fuzzy match)
]
Each tool returns compact, structured text rather than full JSON objects. This keeps tool results under 500 tokens while still providing all the detail the model needs.
// Example: get_tasks returns compact text, not full objects
lines.push(`- [${PRI[t.priority]}] ${t.title} | ${STAT[t.status]} | ${shortDate(t.due_date)}${overdue}${assignee}${project}`)
// → "- [U] Finish homepage redesign | ip | 3/28 [OD] | →Ahmed | →Reelix"
The 5 action tools
export const ACTION_TOOLS = [
'create_task', // Full task creation with name resolution
'update_task', // Partial updates by fuzzy title match
'delete_task', // Returns confirmation request (never auto-deletes)
'create_project', // Project creation
'update_project', // Partial updates by fuzzy name match
]
Name resolution
The hardest part of action tools is resolving human names to database IDs. The user says "assign to Sarah" — the model needs user_id: "uuid-here". We built a three-tier fuzzy matcher:
function fuzzyMatch(query: string, candidates: string[]): { match: string; index: number } | null {
const q = query.toLowerCase().trim()
// 1. Exact match
const exactIdx = candidates.findIndex(c => c.toLowerCase() === q)
if (exactIdx !== -1) return { match: candidates[exactIdx], index: exactIdx }
// 2. Starts with
const startsIdx = candidates.findIndex(c => c.toLowerCase().startsWith(q))
if (startsIdx !== -1) return { match: candidates[startsIdx], index: startsIdx }
// 3. Contains
const containsIdx = candidates.findIndex(c => c.toLowerCase().includes(q))
if (containsIdx !== -1) return { match: candidates[containsIdx], index: containsIdx }
// 4. First name match
const firstNameIdx = candidates.findIndex(c => c.split(' ')[0]?.toLowerCase() === q)
if (firstNameIdx !== -1) return { match: candidates[firstNameIdx], index: firstNameIdx }
return null
}
If the match fails, the action returns an error message with the available names — so the model can correct itself on the next attempt rather than hallucinating an ID.
Auto-fetching missing context
The model gets richer context when read tools are called first — but sometimes it tries to take an action without calling the relevant read tool first. Rather than fail, action tools auto-fetch what they need:
async function resolveTeamMember(name: string, team: TeamMemberContext[], founderId: string) {
// If team context wasn't pre-loaded by a read tool, fetch it now
if (team.length === 0) {
const { data: members } = await supabaseAdmin
.from('team_members')
.select('user_id, position, profile:profiles!...(full_name)')
.eq('founder_id', founderId)
.eq('is_active', true)
// ... populate team array
}
return fuzzyMatch(name, team.map(m => m.full_name))
}
The multi-step reasoning loop
Some requests require multiple tool calls in sequence. "Assign the most urgent overdue task to whoever has the lightest workload" requires:
- Call
get_taskswithfilter: "overdue" - Call
get_team_workload - Resolve name from workload results
- Call
update_taskwith the resolved assignee
We run a loop of up to 4 steps. Each iteration appends the tool call and its result to the conversation before calling the model again:
for (let step = 0; step < 4; step++) {
const response = await groq.chat.completions.create({
model: GROQ_MODEL,
messages,
tools: ALL_TOOLS,
tool_choice: 'auto',
max_tokens: 1024,
})
const toolCalls = response.choices[0]?.message?.tool_calls
if (!toolCalls || toolCalls.length === 0) {
// No more tool calls — stream the final response
return streamFinalResponse(response, actionEvents)
}
// Execute tools, append results, continue loop
for (const toolCall of toolCalls) {
const result = isReadTool(toolCall.function.name)
? await executeReadTool(toolCall.function.name, args, founderId)
: await executeAction(toolCall.function.name, args, ctx)
messages.push(toolCallResult(toolCall.id, result))
}
}
The mixed batch problem and how we solved it
Groq's Llama models sometimes try to call read tools and action tools in the same step — before the read tool results are available to inform the action. This causes the action to run with incomplete context.
We detect and defer:
const hasReadCalls = toolCalls.some(tc => READ_TOOL_NAMES.has(tc.function.name))
const hasActionCalls = toolCalls.some(tc => !READ_TOOL_NAMES.has(tc.function.name))
const isMixedBatch = hasReadCalls && hasActionCalls
if (isMixedBatch) {
// Execute read tools, skip action tools with a "try again" message
// The model re-calls action tools in the next step with actual data
}
Deduplication guard for create actions
One reliability issue with AI action tools: the model sometimes tries to call create_task twice for the same request. We use a Set to track which create actions have fired:
const createActionsExecuted = new Set<string>()
if (toolName === 'create_task' || toolName === 'create_project') {
if (createActionsExecuted.has(toolName)) {
return { success: false, message: `${toolName} already executed — task already exists.` }
}
createActionsExecuted.add(toolName)
}
Streaming SSE responses
The command bar streams responses using Server-Sent Events from a Next.js API route. Action events (task created, etc.) are emitted before the text stream so the UI can show success cards immediately:
function createStreamSSEResponse(stream: AsyncIterable<any>, actionEvents: Array<Record<string, any>>): Response {
const encoder = new TextEncoder()
const readable = new ReadableStream({
async start(controller) {
// Emit action events first (for immediate UI feedback)
for (const event of actionEvents) {
controller.enqueue(encoder.encode(
`data: ${JSON.stringify({ type: 'action_executed', ...event })}\n\n`
))
}
// Then stream text response
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content || ''
if (delta) {
controller.enqueue(encoder.encode(
`data: ${JSON.stringify({ type: 'delta', content: delta })}\n\n`
))
}
}
controller.close()
},
})
return new Response(readable, { headers: SSE_HEADERS })
}
Why Groq + Llama 4 Scout
We evaluated several options:
- OpenAI GPT-4o: Great quality, but 3-5 second response times for tool-heavy requests killed the UX.
- Anthropic Claude Haiku: Fast, but tool-calling reliability was inconsistent on multi-step sequences.
- Groq + Llama 4 Scout (17B): Sub-2-second responses even for 3-step tool chains. Reliable function calling. Free for users at our current scale.
Groq's inference speed is genuinely a product differentiator here. When the AI responds in under 2 seconds, it feels like a feature. When it takes 6 seconds, it feels like a bug.
The delete confirmation pattern
Destructive actions need a human in the loop. When the model calls delete_task, it never deletes immediately. Instead it returns a needs_confirmation: true flag with a confirmation_action object. The frontend renders a "Confirm Delete" button:
// action-executor.ts
return {
success: true,
needs_confirmation: true,
confirmation_action: {
tool: 'delete_task_confirmed',
resolved_id: task.id,
description: `Delete task "${task.title}"`,
},
}
// Frontend: shows "Confirm Delete" button
// On click → DELETE /api/ai/command with { task_id }
// Server executes the actual deletion
What we learned
1. Start with the data model. The AI is only as good as the structure underneath it. We could not have built this if tasks, projects, clients, and vault files were in separate siloed tables without foreign key relationships.
2. Read tools beat context dumps. Structured tool calls with compact return values outperform pasting raw JSON into the system prompt. The model makes better decisions with less noise.
3. Auto-fetch gracefully. If the model skips a read tool and goes straight to an action, the action should auto-fetch what it needs rather than return an error. Failing gracefully is better than a strict execution order.
4. Name resolution is 30% of the work. Users say "assign to Sarah." The model needs a UUID. Building reliable fuzzy matching (with fallback error messages that list available options) took more work than the actual tool implementations.
What's next
- Daily morning brief (push notification, 8am, 6 sections)
- Pre-meeting brief (10 minutes before every calendar event)
- Client silence detection (background scan every 6 hours)
- Weekly client report auto-draft
The architecture supports all of these as new read tools + a background cron calling the AI with specific prompts.
If you're building something similar or have questions about the tool-calling architecture, I'm happy to discuss in the comments.
— Arham
Founder, Kobin — Agency Operating System
Full documentation: kobin.team/docs
Top comments (0)