LyricalString

Posted on Mar 20

Building an AI Coworker That Asks Questions Instead of Guessing

#architecture

You tell your AI coworker: "create a task for the new feature."

It creates the task. Assigns it to nobody. Sets priority to medium. Picks a random project.

Nothing is technically wrong. But everything is useless.

The AI didn't have context. And instead of asking, it guessed.

This is the default behavior of every LLM tool system I've seen. Missing parameter? Use a default. Ambiguous input? Pick the most likely interpretation. The AI never stops and says "hey, who should I assign this to?"

So I built a system that does exactly that.

The Design: AskUserQuestion as a First-Class Tool

The idea is simple: give the LLM a tool called ask_user_question that it can call like any other tool. Instead of creating a task, sending a message, or querying a database — it asks the human a question.

Here's the tool definition the LLM sees:

{
  name: "ask_user_question",
  description: "Ask the user a clarifying question with a rich interactive UI.
    Use when you need user input before proceeding. Supports free-text,
    single/multi-choice, and yes/no questions.",
  parameters: {
    question: "The question to ask",
    question_type: "free_text | single_choice | multi_choice | yes_no",
    options: [{ label: "Option A", description: "..." }],
    // Or for sequences:
    questions: [
      { question: "Who should own this?", question_type: "single_choice", options: [...] },
      { question: "What priority?", question_type: "single_choice", options: [...] },
    ]
  }
}

The LLM decides when to use it. Not the user. Not the system. The AI recognizes it's missing information and proactively asks before proceeding.

The AI isn't a chatbot waiting for input. It's an agent executing a task that chooses to pause because it needs clarification.

The Hard Part: Blocking Execution

When the LLM calls ask_user_question, the tool needs to:

Show the question to the user
Wait for their answer
Return the answer as the tool result
Let the LLM continue in the same execution context

Steps 1 and 4 are easy. Steps 2 and 3 are the interesting engineering problem.

The LLM is running inside a tool execution pipeline. When it calls a tool, the pipeline expects a synchronous result. But our "result" depends on a human doing something in a browser — which could take seconds or minutes.

The Redis Pub/Sub Parking Pattern

Here's how we solved it:

class AskUserQuestionService {
  async parkAndWaitForAnswer(
    questionId: string,
    questionData: StoredQuestion,
  ): Promise<QuestionAnswer | null> {
    // 1. Store the question in Redis with a 5-minute TTL
    await redisService.setState(
      `ask-question:${questionId}`,
      questionData,
      300, // 5 minutes
    );

    // 2. Create a dedicated Redis subscriber for this question
    const subscriber = createClient({ url: redisUrl });
    await subscriber.connect();

    const channel = `ask-question-answer:${questionId}`;

    try {
      // 3. Block until we receive the answer (or timeout)
      const answer = await new Promise<QuestionAnswer | null>((resolve) => {
        const timeout = setTimeout(() => {
          resolve(null); // Timed out
        }, 300_000); // 5 minutes

        subscriber.subscribe(channel, (message) => {
          clearTimeout(timeout);
          resolve(JSON.parse(message));
        });
      });

      return answer;
    } finally {
      await subscriber.unsubscribe(channel);
      await subscriber.quit();
      await redisService.deleteState(`ask-question:${questionId}`);
    }
  }
}

The tool execution blocks on a Promise that resolves when the user answers. Redis pub/sub acts as the bridge between the user's browser and the waiting tool.

When the user submits their answer, the API endpoint publishes to that specific channel:

async submitAnswer(questionId: string, answer: string | string[]) {
  // Validate caller matches the intended recipient
  const stored = await redisService.getState(`ask-question:${questionId}`);
  if (stored.workspaceId !== callerWorkspaceId || stored.memberId !== callerMemberId) {
    return { success: false, error: "Unauthorized" };
  }

  // Publish -> the waiting subscriber resolves -> tool returns -> LLM continues
  await publisher.publish(
    `ask-question-answer:${questionId}`,
    JSON.stringify({ answer, answeredAt: new Date().toISOString() }),
  );
}

The dedicated subscriber per question is important. You can't use a shared connection because Redis subscriptions are per-connection. Each pending question gets its own subscriber, its own channel, and its own cleanup.

Delivering Questions via WebSocket

The question needs to appear in the user's chat in real time. We use our existing WebSocket infrastructure to push a ask-user-question event that the frontend listens for.

On the frontend, when a question arrives:

A Zustand store maps conversationId to pendingQuestion
The chat input component is replaced with the question card
The user interacts with the card (selects options, types text)
On submit, a POST to /ai-ask-question/answer sends the answer back

The input replacement is key UX. The question doesn't appear as a message in the chat — it takes over the input area. This makes it clear that the AI is waiting for you, and you can't do anything else in that conversation until you answer (or dismiss).

Multi-Question Sequences

Sometimes the AI needs to ask multiple related things. Instead of calling the tool three times (which would show three separate cards), it can send a sequence:

ask_user_question({
  questions: [
    { question: "Which project?", question_type: "single_choice", options: [...] },
    { question: "Who should own it?", question_type: "single_choice", options: [...] },
    { question: "Any additional context?", question_type: "free_text" },
  ]
})

The user sees a paginated card with arrow navigation. Answers are collected locally and submitted all at once. The LLM receives all answers in a single tool result.

This is better than multiple tool calls because:

One round-trip instead of three
The user sees all questions upfront (progress indicator: "2 of 3")
They can skip questions they don't want to answer
The LLM gets all context at once, not incrementally

Mobile: Same Feature, Different Challenges

We built the same feature in React Native. Same WebSocket delivery, same Zustand store pattern, same question types. But mobile has its own quirks:

Keyboard management: the input replacement needs to handle the software keyboard showing/hiding
Haptic feedback: option selection triggers Haptics.impactAsync() for tactile confirmation
Scroll behavior: the question card needs to stay visible above the keyboard
Offline: if the user is offline when the question arrives, WebSocket reconnect needs to re-deliver

What the LLM Actually Receives

When the user answers, the tool returns a plain string. For single questions:

"Project Alpha"

For sequences, it's formatted as:

Q1 (Which project?): Project Alpha
Q2 (Who should own it?): Maria
Q3 (Additional context?): This is for the Q2 release

Simple text. No JSON. The LLM reads it naturally and continues its task with full context.

If the user times out (5 minutes), the tool returns:

{ "userAnswer": null, "timedOut": true, "message": "The user did not respond within the time limit" }

The LLM then decides what to do — usually it falls back to reasonable defaults and mentions it assumed values.

When Should AI Ask vs. Infer?

This is the real design question. You don't want an AI that asks about everything — that's worse than one that guesses.

Our heuristic: ask when the wrong guess has meaningful consequences.

Assigning a task to the wrong person? Ask.
Picking the wrong project? Ask.
Choosing between high and medium priority? Infer (low stakes).
Formatting a message slightly differently? Infer.

The tool description tells the LLM: "Use when you need user input before proceeding." The emphasis on "before proceeding" signals that this is for blockers, not preferences.

In practice, the LLM uses it about once every 10-15 tool calls. Just enough to be helpful without being annoying.

Security Considerations

Every question is scoped:

Stored in Redis with workspaceId + memberId
Answer submission validates the caller matches the stored recipient
Questions auto-expire after 5 minutes (Redis TTL)
All transport is via authenticated WebSocket
No question data persists after the flow completes

What I'd Do Differently

If I were building this again:

Batch questions more aggressively. The LLM sometimes asks one question, gets the answer, then realizes it needs to ask another. I'd add a system prompt nudge to gather all unknowns before asking.
Persistent questions. If the user closes the app and reopens, the pending question is gone. The Redis TTL is 5 minutes. For async workflows, this should be longer and stored in the database, not just Redis.
Question templates. The LLM generates the question text every time. Pre-defined templates for common patterns (assignee selection, project picker) would be faster and more consistent.

Wrapping Up

An AI tool system is incomplete if the AI can't ask questions. Every other tool (create task, send message, query data) assumes the AI has enough context. This one covers the case where it doesn't.

Small addition to the tool set. Big difference in how the AI actually works with you.

I'm building Trilo, a workspace that unifies tasks, chat, and notes for solopreneurs — with an AI coworker that actually understands your work. If you're interested in AI-powered productivity tools, let's connect.

Find me on LinkedIn or GitHub.

DEV Community