Med Marrouchi

Posted on Jun 10

How I Built a Discord AI Assistant That Talks to Gmail

#webdev #programming #javascript #ai

What if you could talk to your inbox directly from Discord?

Not just ask an AI to summarize emails, but actually trigger Gmail actions from a chat conversation:

Read unread emails
Reply to an existing email
Send a brand-new email
Handle Google authentication
Format the result back into a clean Discord message

That is exactly what I built in this video tutorial:

In this tutorial, I use Hexabot, a self-hosted AI chatbot and workflow automation platform, to build a Discord AI assistant that communicates with Gmail.

The goal is not just to connect an LLM to Gmail.

The goal is to show a more controlled way to build AI automation workflows: conversation, intent detection, conditional logic, tool execution, authentication handling, and response formatting.

Why not just use an MCP Gmail server?

Technically, we could have built this faster.

For example, we could configure an AI agent with an MCP connection to a Gmail MCP server and let the agent decide when to call Gmail tools.

That approach is useful, especially for quick prototypes.

But this tutorial takes a different route on purpose.

Instead of giving the agent full freedom over the execution flow, we build the workflow visually step by step. This gives us more control over what happens, when it happens, and how each decision is handled.

In other words, the AI does not directly “do everything.”

The workflow separates responsibilities:

The AI detects the user intent.
The workflow validates the intent.
The Gmail action performs the actual operation.
A conditional branch handles authentication.
Another AI step formats the final response for Discord.

This makes the assistant easier to debug, safer to extend, and more understandable for developers who want to build production-ready AI automations.

What the assistant can do

The Discord Gmail assistant supports three main use cases.

1. Read emails

Example prompt:

Show me the top two unread emails.

The assistant understands that the user wants to read emails, extracts the limit, calls Gmail, and returns a clean Discord-friendly summary.

2. Reply to an email

Example prompt:

Reply to this email saying: Thank you, well noted.

The assistant extracts the action as reply, identifies the target email, captures the reply text, and sends the response through Gmail.

3. Send a new email

Example prompt:

Send an email to someone@example.com about AI evolution during the past year.

The assistant detects that this is a new email, extracts the recipient, prepares the subject and content, sends the email, and confirms the result in Discord.

The workflow architecture

The full workflow follows this pattern:

Discord message
   ↓
AI Infer Intent
   ↓
Valid action?
   ├── No → Ask the user what they want to do
   └── Yes → Gmail Action
              ↓
              Gmail status?
              ├── 401 → Send Google sign-in button
              └── 200 → AI Formatter → Send Discord message

This is the most important part of the tutorial.

We are not building a black-box AI agent that randomly decides what to do. We are building a controlled execution flow where every important step is visible.

Step 1: Detect the user intent

The first step is an AI Infer Object action.

Instead of asking the LLM to return a free-form text answer, we ask it to return a structured object.

The schema includes fields like:

action
limit
targetMail
mailText
subject

The action field can be:

read
reply
new
empty

This acts as a contract between the AI and the workflow.

For example:

Show me the top two unread emails.

Becomes something like:

{
  "action": "read",
  "limit": "2"
}

And:

Reply to the last email saying thank you.

Becomes:

{
  "action": "reply",
  "mailText": "Thank you"
}

This is a powerful pattern because the AI is not responsible for executing Gmail operations directly. It only extracts the intent and the required fields.

The workflow decides what happens next.

Step 2: Add a conditional branch

After intent detection, the workflow checks whether the AI detected a valid action.

If the action exists, the workflow continues to the Gmail step.

If the action is empty, the assistant sends a fallback message:

Hello, how can I help you today?

This prevents the assistant from guessing when the user request is unclear.

That matters a lot when the assistant has access to real actions like sending emails.

Step 3: Execute the Gmail action

The next step is the Gmail action.

This is where the workflow maps the structured AI output into the Gmail operation:

mailText   → email body
subject    → email subject
targetMail → recipient or target email
action     → read, reply, or new
limit      → number of emails to fetch

The key idea is this:

The AI detects the intent.
The Gmail action executes the operation.

That separation is what makes the workflow more predictable.

You can inspect the values, debug the action, add conditions, add validation, or insert an approval step before sending emails.

For a production setup, I would strongly recommend adding a confirmation step before sending or replying to emails, especially if the assistant is used by a team.

Step 4: Handle Google authentication

The Gmail action can return different statuses.

If the status is 200, the Gmail operation worked.

If the status is 401, the user still needs to authenticate with Google.

So the workflow handles the authentication case separately.

When authentication is required, the assistant sends a Discord button:

Sign in with Google

The button links to the Google OAuth flow.

This makes the user experience much smoother. The user can start from Discord, receive the authentication button, sign in with Google, and continue the conversation.

Step 5: Format the response for Discord

Once the Gmail action succeeds, the raw Gmail result still needs to be formatted.

For that, the workflow uses an AI Agent step called AI Formatter.

Its job is simple:

Take the Gmail result
Keep the useful information
Format it as a Discord-friendly message
Avoid unnecessary explanations

This is another good design choice.

The Gmail action handles the tool call.
The AI formatter handles presentation.
The final message is sent back to Discord.

Again, each step has a clear responsibility.

Step 6: Connect the workflow to Discord

After the workflow is ready, we connect it to the Discord channel.

In Hexabot, the Discord channel is configured with:

The target workflow
The Discord bot token
The application ID

Once connected, messages sent to the Discord bot trigger the Gmail assistant workflow.

At that point, the assistant is ready to test directly inside Discord.

Demo: speaking with your inbox

In the demo, I test three real use cases.

First, I ask the assistant to access Gmail. Since the user is not authenticated yet, the workflow sends the Google sign-in button.

Then I test reading unread emails:

Show me the top two unread emails.

The assistant fetches the emails and returns a clean summary inside Discord.

Next, I test replying to an email:

Reply to this email saying: Thank you, well noted.

The assistant sends the reply and confirms the action.

Finally, I test sending a brand-new email.

The assistant prepares the email, sends it through Gmail, and returns the result in Discord.

At this point, the assistant is no longer just answering questions. It is taking real actions through Gmail.

Why this pattern matters

The interesting part of this tutorial is not only Gmail.

The real value is the workflow pattern:

Conversation → Intent Detection → Tool Calling → Authentication → Formatting → Response

You can reuse the same pattern for many other automations:

CRM updates
Support inbox triage
Calendar scheduling
Internal admin tools
Lead qualification
Notification workflows
Reporting assistants

This is where AI automation becomes more practical.

Instead of building a fully autonomous agent and hoping it behaves correctly, you can build a workflow where the AI is used where it makes sense, while the business logic stays explicit and visible.

A note about custom actions

This tutorial focuses on building the workflow visually from the Hexabot editor.

It does not cover how to develop the Gmail custom action itself.

The source code for the project is linked in the YouTube description, and custom action development will be covered in a separate video.

So if your goal is to understand how to build the workflow, this tutorial is for you.

If your goal is to create new custom actions from scratch, stay tuned for the follow-up.

Security notes

Because this workflow connects to Gmail, security matters.

Before publishing or deploying something similar, make sure to:

Never commit API keys or OAuth client secrets
Use environment variables or secure credential storage
Limit OAuth scopes to what your assistant actually needs
Add a confirmation step before sending emails in production
Log important actions for auditability
Handle errors and expired sessions properly

AI agents are powerful, but production workflows need guardrails.

Watch the full tutorial

You can watch the full step-by-step tutorial here:

https://www.youtube.com/watch?v=FKrzVK1fqK4

Useful links:

Hexabot website: https://hexabot.ai
Hexabot documentation: https://docs.hexabot.ai
Hexabot GitHub: https://github.com/hexabot-ai/hexabot
Video tutorial: https://www.youtube.com/watch?v=FKrzVK1fqK4

Final thoughts

Building an AI assistant is easy when everything is a prompt.

Building a useful AI assistant requires more structure.

This Discord Gmail assistant is a simple example, but it shows an important idea: AI agents become much more reliable when they are combined with workflows, conditions, typed outputs, authentication handling, and controlled tool execution.

That is the difference between a cool demo and something you can actually build on top of.

If you are building AI automations, try thinking less in terms of “one big agent” and more in terms of “controlled execution flow.”

That is where things get interesting.

DEV Community

How I Built a Discord AI Assistant That Talks to Gmail

Why not just use an MCP Gmail server?

What the assistant can do

1. Read emails

2. Reply to an email

3. Send a new email

The workflow architecture

Step 1: Detect the user intent

Step 2: Add a conditional branch

Step 3: Execute the Gmail action

Step 4: Handle Google authentication

Step 5: Format the response for Discord

Step 6: Connect the workflow to Discord

Demo: speaking with your inbox

Why this pattern matters

A note about custom actions

Security notes

Watch the full tutorial

Final thoughts

Top comments (0)