DEV Community

Cover image for Real-Time Streaming in Microsoft Teams: Simulating LLM Response Streaming with a Workaround
Ziyad Meribaa
Ziyad Meribaa

Posted on

Real-Time Streaming in Microsoft Teams: Simulating LLM Response Streaming with a Workaround

Meta Description:

Learn how to simulate real-time streaming of LLM responses in Microsoft Teams using a clever workaround. Surpass Teams' limitations with incremental message updates, rate limit management, and enhanced user engagement.


Real-Time Streaming in Microsoft Teams: A Workaround for Simulating LLM Response Streaming

Microsoft Teams, a leading collaboration platform, poses unique challenges for implementing real-time streaming due to its constraints on message updates and rate limits. Unlike platforms designed for continuous updates, Teams does not natively support streaming responses in a true real-time manner. To address this, a workaround was developed to simulate real-time streaming by dynamically editing messages as chunks comes from the LLM (Large Language Model).

This article explores how to achieve real-time streaming in Microsoft Teams, the challenges faced, and the technical methods used to simulate a seamless user experience.


The Workaround: Simulating Streaming with Incremental Message Updates

The workaround leverages Teams' ability to edit existing messages. Here’s how the simulation is achieved:

🔹. Chunked Responses from the LLM

  • When a user sends a message, the bot forwards the input to the LLM through a streaming-enabled API.

  • The LLM generates and streams the response incrementally as smaller chunks.

🔹. Initial Placeholder Message

  • Upon receiving the first chunk, the bot sends a placeholder message to Teams, such as "Generating response… 🕒" or an empty message.

  • This establishes the interaction thread and reassures the user that the bot is processing their request.

🔹. Incremental Updates

  • As each chunk arrives, the bot updates the placeholder message by appending the new content.

  • This simulates a typing or streaming effect ✍️, where the message appears to grow in real-time.

🔹. Rate Limit Management

  • To avoid hitting Teams’ API rate limits 🚦, updates are throttled and sent at controlled intervals (e.g., once every few hundred milliseconds).

  • Chunks arriving faster than the update interval are buffered to ensure no data is lost.

🔹. Finalization

  • When the LLM completes its response, the bot sends a final update to the message, ensuring the entire content is visible to the user.

  • Any necessary formatting (e.g., Markdown for readability 📄) is applied during this step.


Challenges and Solutions

❌ Teams’ Rate Limits

  • Challenge: Teams enforces strict limits on how frequently messages can be sent or updated.

  • Solution: The bot implements an update scheduler ⏲️ to space edits at regular intervals while buffering chunks in the background.

❌ Partial Message Consistency

  • Challenge: Users may experience broken or incomplete messages if chunks are sent too quickly or overlap.

  • Solution: Each update carefully appends new data to the existing content, ensuring no chunk is lost or duplicated.

❌ Latency Perception

  • Challenge: Streaming delays might make the bot feel unresponsive.

  • Solution: Sending an initial placeholder message reassures users that the bot is processing their input.


Advantages of the Workaround

  • Simulates Real-Time Typing: Users experience the message unfolding progressively ✨, closely resembling real-time streaming.

  • Enhanced Engagement: Incremental updates keep users engaged, reducing perceived wait times ⏳.

  • Platform Compatibility: This approach works entirely within the constraints of Teams’ messaging API, requiring no custom client modifications.


Technical Methods for Streaming Responses

The implementation of streaming responses in the bot revolves around leveraging the LLM API's streaming capability and updating the message dynamically in Microsoft Teams to simulate a real-time effect. Here’s how it works:

🔹Handling Streamed Data

  • The LLM API sends data in incremental chunks, which are processed in real-time by listening to the data event of the response stream.

  • Each chunk is parsed and analyzed to determine its purpose (e.g., a partial response or the end of the message).


stream.on('data', async (chunk) => { 

    buffer += chunk.toString(); 

    const lines = buffer.split('\n'); 

    buffer = lines.pop(); // Retain the partial line for the next chunk 

    for (const line of lines) { 

        if (line.trim() && line.startsWith('data: ')) { 

            const data = JSON.parse(line.slice(6)); 

            // Process the data based on the event type 

        } 

    } 

}); 

Enter fullscreen mode Exit fullscreen mode

🔹 Updating Teams Messages Dynamically

  • The initial response creates a placeholder message in Teams.

  • As chunks accumulate, the bot updates this message using updateActivity, progressively appending new data to create a "typing" effect ✍️.

  • To prevent excessive updates and comply with Teams' rate limits 🚦, updates are batched and sent only at defined intervals (CHUNK_UPDATE_INTERVAL).


if (chunkCount % CHUNK_UPDATE_INTERVAL === 0) { 

    await context.updateActivity({ 

        id: initialResponse.id, 

        type: "message", 

        text: answer 

    }); 

} 

Enter fullscreen mode Exit fullscreen mode

🔹 Error Handling During Streaming

  • Errors can occur while parsing chunks, updating messages, or processing the stream. Each step includes robust error handling ⚠️ to ensure the bot remains functional.

  • Errors during updates are logged but do not interrupt the overall streaming process.


stream.on('error', (error) => { 

    logger.error("Stream error:", error); 

    reject(new Error("Error processing the response stream")); 

}); 

Enter fullscreen mode Exit fullscreen mode

🔹 End of Stream Processing

  • When the end event is triggered, the bot finalizes the response. If some chunks remain unprocessed (e.g., due to throttling), a final update is sent to ensure all content is displayed.

  • Conversation history is updated to include both the user’s input and the assistant’s full response.


stream.on('end', async () => { 

    if (answer.trim()) { 

        if (chunkCount % CHUNK_UPDATE_INTERVAL !== 0) { 

            await context.updateActivity({ 

                id: initialResponse.id, 

                type: "message", 

                text: answer 

            }); 

        } 

        // Update conversation history and resolve the promise 

    } 

}); 

Enter fullscreen mode Exit fullscreen mode

Key Takeaway

By leveraging Teams' ability to edit messages ✏️, a realistic streaming effect can be achieved even within the platform's limitations. This solution combines technical ingenuity with user-focused design, offering a seamless and engaging interaction with LLMs in enterprise environments.

Top comments (0)