DEV Community

Cover image for How We Built an AI Layer That Understands an Entire Agency Workspace (Not Just One Module)
Arham Mirkar
Arham Mirkar

Posted on • Originally published at kobin.team

How We Built an AI Layer That Understands an Entire Agency Workspace (Not Just One Module)

We shipped the AI layer for Kobin today — an agency operating system that replaces Slack, Notion, HubSpot, Linear, and Buffer. This is the technical story of how we built it and the specific decisions that made it work.


The problem with every AI feature shipped in the last two years

Every productivity tool adds AI. The result is always the same: a model that knows about the host tool and nothing else. Slack AI knows about messages. Asana AI knows about tasks. HubSpot AI knows about contacts.

None of them can answer the question "which of my clients has gone quiet, and what's blocking their project?" — because the answer requires tasks, messages, and CRM data to exist in the same context.

We decided to solve this before writing AI code. We spent the first year building the data model: tasks, projects, clients, CRM pipeline, vault files, calendar events, and real-time inbox — all in Supabase, all linked by foreign keys. Only then did we build the AI layer.


Architecture: MCP-style tools, not context dumps

The naive approach is to dump your entire workspace into a system prompt and let the model figure it out. This fails for three reasons:

  1. Token bloat: A workspace with 50 tasks, 10 projects, 30 CRM contacts, and 100 recent messages easily exceeds 8,000 tokens before you ask a single question.
  2. Hallucination risk: The more unstructured data you put in a prompt, the more the model interpolates and invents.
  3. Staleness: Workspace data changes constantly. A context dump taken at request time is already stale for any response that takes more than a few seconds.

Instead, we built a tool-calling architecture inspired by the Model Context Protocol (MCP). The model starts with a minimal workspace summary (~100 tokens) and calls specific read tools to fetch exactly what it needs.

// Mini context — ~100 tokens, always fresh
export async function buildMiniContext(founderId: string): Promise<string> {
  const [profileRes, tasksRes, projectsRes, ...] = await Promise.all([
    supabaseAdmin.from("profiles").select("full_name").eq("id", founderId).single(),
    supabaseAdmin.from("tasks").select("id, status, due_date").eq("user_id", founderId).eq("is_completed", false),
    // ...
  ])

  return [
    `Founder: ${profile?.full_name}`,
    `Tasks: ${tasks.length} active, ${overdueCount} overdue`,
    `Projects: ${activeProjects} active`,
    `Calendar: ${eventsCount} events this week`,
  ].join('\n')
}
Enter fullscreen mode Exit fullscreen mode

Then the model calls tools like get_tasks, get_team_workload, or search_contacts to drill into exactly what it needs.


The 8 read tools

export const READ_TOOLS = [
  'get_workspace_overview',  // High-level stats across all modules
  'get_tasks',               // Tasks with filter presets (overdue, blocked, due today...)
  'get_projects',            // Projects with task completion counts
  'get_team_workload',       // Team members with active task counts + workload labels
  'get_crm_pipeline',        // Pipeline by stage with deal values and stale flags
  'get_calendar',            // Events with flexible range presets
  'get_vault_files',         // Vault documents filterable by project name
  'search_contacts',         // Full contact profile by name (fuzzy match)
]
Enter fullscreen mode Exit fullscreen mode

Each tool returns compact, structured text rather than full JSON objects. This keeps tool results under 500 tokens while still providing all the detail the model needs.

// Example: get_tasks returns compact text, not full objects
lines.push(`- [${PRI[t.priority]}] ${t.title} | ${STAT[t.status]} | ${shortDate(t.due_date)}${overdue}${assignee}${project}`)
// → "- [U] Finish homepage redesign | ip | 3/28 [OD] | →Ahmed | →Reelix"
Enter fullscreen mode Exit fullscreen mode

The 5 action tools

export const ACTION_TOOLS = [
  'create_task',    // Full task creation with name resolution
  'update_task',    // Partial updates by fuzzy title match
  'delete_task',    // Returns confirmation request (never auto-deletes)
  'create_project', // Project creation
  'update_project', // Partial updates by fuzzy name match
]
Enter fullscreen mode Exit fullscreen mode

Name resolution

The hardest part of action tools is resolving human names to database IDs. The user says "assign to Sarah" — the model needs user_id: "uuid-here". We built a three-tier fuzzy matcher:

function fuzzyMatch(query: string, candidates: string[]): { match: string; index: number } | null {
  const q = query.toLowerCase().trim()

  // 1. Exact match
  const exactIdx = candidates.findIndex(c => c.toLowerCase() === q)
  if (exactIdx !== -1) return { match: candidates[exactIdx], index: exactIdx }

  // 2. Starts with
  const startsIdx = candidates.findIndex(c => c.toLowerCase().startsWith(q))
  if (startsIdx !== -1) return { match: candidates[startsIdx], index: startsIdx }

  // 3. Contains
  const containsIdx = candidates.findIndex(c => c.toLowerCase().includes(q))
  if (containsIdx !== -1) return { match: candidates[containsIdx], index: containsIdx }

  // 4. First name match
  const firstNameIdx = candidates.findIndex(c => c.split(' ')[0]?.toLowerCase() === q)
  if (firstNameIdx !== -1) return { match: candidates[firstNameIdx], index: firstNameIdx }

  return null
}
Enter fullscreen mode Exit fullscreen mode

If the match fails, the action returns an error message with the available names — so the model can correct itself on the next attempt rather than hallucinating an ID.

Auto-fetching missing context

The model gets richer context when read tools are called first — but sometimes it tries to take an action without calling the relevant read tool first. Rather than fail, action tools auto-fetch what they need:

async function resolveTeamMember(name: string, team: TeamMemberContext[], founderId: string) {
  // If team context wasn't pre-loaded by a read tool, fetch it now
  if (team.length === 0) {
    const { data: members } = await supabaseAdmin
      .from('team_members')
      .select('user_id, position, profile:profiles!...(full_name)')
      .eq('founder_id', founderId)
      .eq('is_active', true)
    // ... populate team array
  }
  return fuzzyMatch(name, team.map(m => m.full_name))
}
Enter fullscreen mode Exit fullscreen mode

The multi-step reasoning loop

Some requests require multiple tool calls in sequence. "Assign the most urgent overdue task to whoever has the lightest workload" requires:

  1. Call get_tasks with filter: "overdue"
  2. Call get_team_workload
  3. Resolve name from workload results
  4. Call update_task with the resolved assignee

We run a loop of up to 4 steps. Each iteration appends the tool call and its result to the conversation before calling the model again:

for (let step = 0; step < 4; step++) {
  const response = await groq.chat.completions.create({
    model: GROQ_MODEL,
    messages,
    tools: ALL_TOOLS,
    tool_choice: 'auto',
    max_tokens: 1024,
  })

  const toolCalls = response.choices[0]?.message?.tool_calls
  if (!toolCalls || toolCalls.length === 0) {
    // No more tool calls — stream the final response
    return streamFinalResponse(response, actionEvents)
  }

  // Execute tools, append results, continue loop
  for (const toolCall of toolCalls) {
    const result = isReadTool(toolCall.function.name)
      ? await executeReadTool(toolCall.function.name, args, founderId)
      : await executeAction(toolCall.function.name, args, ctx)

    messages.push(toolCallResult(toolCall.id, result))
  }
}
Enter fullscreen mode Exit fullscreen mode

The mixed batch problem and how we solved it

Groq's Llama models sometimes try to call read tools and action tools in the same step — before the read tool results are available to inform the action. This causes the action to run with incomplete context.

We detect and defer:

const hasReadCalls = toolCalls.some(tc => READ_TOOL_NAMES.has(tc.function.name))
const hasActionCalls = toolCalls.some(tc => !READ_TOOL_NAMES.has(tc.function.name))
const isMixedBatch = hasReadCalls && hasActionCalls

if (isMixedBatch) {
  // Execute read tools, skip action tools with a "try again" message
  // The model re-calls action tools in the next step with actual data
}
Enter fullscreen mode Exit fullscreen mode

Deduplication guard for create actions

One reliability issue with AI action tools: the model sometimes tries to call create_task twice for the same request. We use a Set to track which create actions have fired:

const createActionsExecuted = new Set<string>()

if (toolName === 'create_task' || toolName === 'create_project') {
  if (createActionsExecuted.has(toolName)) {
    return { success: false, message: `${toolName} already executed — task already exists.` }
  }
  createActionsExecuted.add(toolName)
}
Enter fullscreen mode Exit fullscreen mode

Streaming SSE responses

The command bar streams responses using Server-Sent Events from a Next.js API route. Action events (task created, etc.) are emitted before the text stream so the UI can show success cards immediately:

function createStreamSSEResponse(stream: AsyncIterable<any>, actionEvents: Array<Record<string, any>>): Response {
  const encoder = new TextEncoder()
  const readable = new ReadableStream({
    async start(controller) {
      // Emit action events first (for immediate UI feedback)
      for (const event of actionEvents) {
        controller.enqueue(encoder.encode(
          `data: ${JSON.stringify({ type: 'action_executed', ...event })}\n\n`
        ))
      }
      // Then stream text response
      for await (const chunk of stream) {
        const delta = chunk.choices[0]?.delta?.content || ''
        if (delta) {
          controller.enqueue(encoder.encode(
            `data: ${JSON.stringify({ type: 'delta', content: delta })}\n\n`
          ))
        }
      }
      controller.close()
    },
  })
  return new Response(readable, { headers: SSE_HEADERS })
}
Enter fullscreen mode Exit fullscreen mode

Why Groq + Llama 4 Scout

We evaluated several options:

  • OpenAI GPT-4o: Great quality, but 3-5 second response times for tool-heavy requests killed the UX.
  • Anthropic Claude Haiku: Fast, but tool-calling reliability was inconsistent on multi-step sequences.
  • Groq + Llama 4 Scout (17B): Sub-2-second responses even for 3-step tool chains. Reliable function calling. Free for users at our current scale.

Groq's inference speed is genuinely a product differentiator here. When the AI responds in under 2 seconds, it feels like a feature. When it takes 6 seconds, it feels like a bug.


The delete confirmation pattern

Destructive actions need a human in the loop. When the model calls delete_task, it never deletes immediately. Instead it returns a needs_confirmation: true flag with a confirmation_action object. The frontend renders a "Confirm Delete" button:

// action-executor.ts
return {
  success: true,
  needs_confirmation: true,
  confirmation_action: {
    tool: 'delete_task_confirmed',
    resolved_id: task.id,
    description: `Delete task "${task.title}"`,
  },
}

// Frontend: shows "Confirm Delete" button
// On click → DELETE /api/ai/command with { task_id }
// Server executes the actual deletion
Enter fullscreen mode Exit fullscreen mode

What we learned

1. Start with the data model. The AI is only as good as the structure underneath it. We could not have built this if tasks, projects, clients, and vault files were in separate siloed tables without foreign key relationships.

2. Read tools beat context dumps. Structured tool calls with compact return values outperform pasting raw JSON into the system prompt. The model makes better decisions with less noise.

3. Auto-fetch gracefully. If the model skips a read tool and goes straight to an action, the action should auto-fetch what it needs rather than return an error. Failing gracefully is better than a strict execution order.

4. Name resolution is 30% of the work. Users say "assign to Sarah." The model needs a UUID. Building reliable fuzzy matching (with fallback error messages that list available options) took more work than the actual tool implementations.


What's next

  • Daily morning brief (push notification, 8am, 6 sections)
  • Pre-meeting brief (10 minutes before every calendar event)
  • Client silence detection (background scan every 6 hours)
  • Weekly client report auto-draft

The architecture supports all of these as new read tools + a background cron calling the AI with specific prompts.


If you're building something similar or have questions about the tool-calling architecture, I'm happy to discuss in the comments.

— Arham

Founder, Kobin — Agency Operating System

Full documentation: kobin.team/docs

Top comments (0)