DEV Community

Med Marrouchi
Med Marrouchi

Posted on

How I Built a Discord AI Assistant That Talks to Gmail

What if you could talk to your inbox directly from Discord?

Not just ask an AI to summarize emails, but actually trigger Gmail actions from a chat conversation:

  • Read unread emails
  • Reply to an existing email
  • Send a brand-new email
  • Handle Google authentication
  • Format the result back into a clean Discord message

That is exactly what I built in this video tutorial:

In this tutorial, I use Hexabot, a self-hosted AI chatbot and workflow automation platform, to build a Discord AI assistant that communicates with Gmail.

The goal is not just to connect an LLM to Gmail.

The goal is to show a more controlled way to build AI automation workflows: conversation, intent detection, conditional logic, tool execution, authentication handling, and response formatting.

Why not just use an MCP Gmail server?

Technically, we could have built this faster.

For example, we could configure an AI agent with an MCP connection to a Gmail MCP server and let the agent decide when to call Gmail tools.

That approach is useful, especially for quick prototypes.

But this tutorial takes a different route on purpose.

Instead of giving the agent full freedom over the execution flow, we build the workflow visually step by step. This gives us more control over what happens, when it happens, and how each decision is handled.

In other words, the AI does not directly “do everything.”

The workflow separates responsibilities:

  • The AI detects the user intent.
  • The workflow validates the intent.
  • The Gmail action performs the actual operation.
  • A conditional branch handles authentication.
  • Another AI step formats the final response for Discord.

This makes the assistant easier to debug, safer to extend, and more understandable for developers who want to build production-ready AI automations.

What the assistant can do

The Discord Gmail assistant supports three main use cases.

1. Read emails

Example prompt:

Show me the top two unread emails.
Enter fullscreen mode Exit fullscreen mode

The assistant understands that the user wants to read emails, extracts the limit, calls Gmail, and returns a clean Discord-friendly summary.

2. Reply to an email

Example prompt:

Reply to this email saying: Thank you, well noted.
Enter fullscreen mode Exit fullscreen mode

The assistant extracts the action as reply, identifies the target email, captures the reply text, and sends the response through Gmail.

3. Send a new email

Example prompt:

Send an email to someone@example.com about AI evolution during the past year.
Enter fullscreen mode Exit fullscreen mode

The assistant detects that this is a new email, extracts the recipient, prepares the subject and content, sends the email, and confirms the result in Discord.

The workflow architecture

The full workflow follows this pattern:

Discord message
   ↓
AI Infer Intent
   ↓
Valid action?
   ├── No → Ask the user what they want to do
   └── Yes → Gmail Action
              ↓
              Gmail status?
              ├── 401 → Send Google sign-in button
              └── 200 → AI Formatter → Send Discord message
Enter fullscreen mode Exit fullscreen mode

This is the most important part of the tutorial.

We are not building a black-box AI agent that randomly decides what to do. We are building a controlled execution flow where every important step is visible.

Step 1: Detect the user intent

The first step is an AI Infer Object action.

Instead of asking the LLM to return a free-form text answer, we ask it to return a structured object.

The schema includes fields like:

action
limit
targetMail
mailText
subject
Enter fullscreen mode Exit fullscreen mode

The action field can be:

read
reply
new
empty
Enter fullscreen mode Exit fullscreen mode

This acts as a contract between the AI and the workflow.

For example:

Show me the top two unread emails.
Enter fullscreen mode Exit fullscreen mode

Becomes something like:

{
  "action": "read",
  "limit": "2"
}
Enter fullscreen mode Exit fullscreen mode

And:

Reply to the last email saying thank you.
Enter fullscreen mode Exit fullscreen mode

Becomes:

{
  "action": "reply",
  "mailText": "Thank you"
}
Enter fullscreen mode Exit fullscreen mode

This is a powerful pattern because the AI is not responsible for executing Gmail operations directly. It only extracts the intent and the required fields.

The workflow decides what happens next.

Step 2: Add a conditional branch

After intent detection, the workflow checks whether the AI detected a valid action.

If the action exists, the workflow continues to the Gmail step.

If the action is empty, the assistant sends a fallback message:

Hello, how can I help you today?
Enter fullscreen mode Exit fullscreen mode

This prevents the assistant from guessing when the user request is unclear.

That matters a lot when the assistant has access to real actions like sending emails.

Step 3: Execute the Gmail action

The next step is the Gmail action.

This is where the workflow maps the structured AI output into the Gmail operation:

mailText   → email body
subject    → email subject
targetMail → recipient or target email
action     → read, reply, or new
limit      → number of emails to fetch
Enter fullscreen mode Exit fullscreen mode

The key idea is this:

The AI detects the intent.
The Gmail action executes the operation.

That separation is what makes the workflow more predictable.

You can inspect the values, debug the action, add conditions, add validation, or insert an approval step before sending emails.

For a production setup, I would strongly recommend adding a confirmation step before sending or replying to emails, especially if the assistant is used by a team.

Step 4: Handle Google authentication

The Gmail action can return different statuses.

If the status is 200, the Gmail operation worked.

If the status is 401, the user still needs to authenticate with Google.

So the workflow handles the authentication case separately.

When authentication is required, the assistant sends a Discord button:

Sign in with Google
Enter fullscreen mode Exit fullscreen mode

The button links to the Google OAuth flow.

This makes the user experience much smoother. The user can start from Discord, receive the authentication button, sign in with Google, and continue the conversation.

Step 5: Format the response for Discord

Once the Gmail action succeeds, the raw Gmail result still needs to be formatted.

For that, the workflow uses an AI Agent step called AI Formatter.

Its job is simple:

  • Take the Gmail result
  • Keep the useful information
  • Format it as a Discord-friendly message
  • Avoid unnecessary explanations

This is another good design choice.

The Gmail action handles the tool call.
The AI formatter handles presentation.
The final message is sent back to Discord.

Again, each step has a clear responsibility.

Step 6: Connect the workflow to Discord

After the workflow is ready, we connect it to the Discord channel.

In Hexabot, the Discord channel is configured with:

  • The target workflow
  • The Discord bot token
  • The application ID

Once connected, messages sent to the Discord bot trigger the Gmail assistant workflow.

At that point, the assistant is ready to test directly inside Discord.

Demo: speaking with your inbox

In the demo, I test three real use cases.

First, I ask the assistant to access Gmail. Since the user is not authenticated yet, the workflow sends the Google sign-in button.

Then I test reading unread emails:

Show me the top two unread emails.
Enter fullscreen mode Exit fullscreen mode

The assistant fetches the emails and returns a clean summary inside Discord.

Next, I test replying to an email:

Reply to this email saying: Thank you, well noted.
Enter fullscreen mode Exit fullscreen mode

The assistant sends the reply and confirms the action.

Finally, I test sending a brand-new email.

The assistant prepares the email, sends it through Gmail, and returns the result in Discord.

At this point, the assistant is no longer just answering questions. It is taking real actions through Gmail.

Why this pattern matters

The interesting part of this tutorial is not only Gmail.

The real value is the workflow pattern:

Conversation → Intent Detection → Tool Calling → Authentication → Formatting → Response
Enter fullscreen mode Exit fullscreen mode

You can reuse the same pattern for many other automations:

  • CRM updates
  • Support inbox triage
  • Calendar scheduling
  • Internal admin tools
  • Lead qualification
  • Notification workflows
  • Reporting assistants

This is where AI automation becomes more practical.

Instead of building a fully autonomous agent and hoping it behaves correctly, you can build a workflow where the AI is used where it makes sense, while the business logic stays explicit and visible.

A note about custom actions

This tutorial focuses on building the workflow visually from the Hexabot editor.

It does not cover how to develop the Gmail custom action itself.

The source code for the project is linked in the YouTube description, and custom action development will be covered in a separate video.

So if your goal is to understand how to build the workflow, this tutorial is for you.

If your goal is to create new custom actions from scratch, stay tuned for the follow-up.

Security notes

Because this workflow connects to Gmail, security matters.

Before publishing or deploying something similar, make sure to:

  • Never commit API keys or OAuth client secrets
  • Use environment variables or secure credential storage
  • Limit OAuth scopes to what your assistant actually needs
  • Add a confirmation step before sending emails in production
  • Log important actions for auditability
  • Handle errors and expired sessions properly

AI agents are powerful, but production workflows need guardrails.

Watch the full tutorial

You can watch the full step-by-step tutorial here:

https://www.youtube.com/watch?v=FKrzVK1fqK4

Useful links:

Final thoughts

Building an AI assistant is easy when everything is a prompt.

Building a useful AI assistant requires more structure.

This Discord Gmail assistant is a simple example, but it shows an important idea: AI agents become much more reliable when they are combined with workflows, conditions, typed outputs, authentication handling, and controlled tool execution.

That is the difference between a cool demo and something you can actually build on top of.

If you are building AI automations, try thinking less in terms of “one big agent” and more in terms of “controlled execution flow.”

That is where things get interesting.

Top comments (0)