So here's what happened: i Built My First Discord AI Bot and Here's What Blew My Mind
Three months ago I graduated from a coding bootcamp. Last week I shipped my first Discord bot that talks to an AI. I genuinely did not think I would be writing about something like this so soon, and I'm still a little stunned at how cheap it ended up being.
Let me back up. My friends run a small Discord server for our book club. We have about 40 people in there. Someone asked if we could get a bot that would summarize articles people pasted into the channel, because apparently nobody in our group has the attention span to read a 2000-word blog post anymore. Including me, honestly.
So I thought, how hard could this be? I've been using OpenAI's API for a few small side projects. I know how to make HTTP requests. I know how to set up a Discord bot. This should take me a weekend.
Reader, it took me a weekend. But only because I stumbled onto something called Global API that completely changed what I thought this project was going to cost.
The Pricing Thing I Was Not Prepared For
Here's where I need to slow down, because this is the part that genuinely shocked me.
When I was in bootcamp, my instructors drilled one thing into our heads. AI is expensive. Don't put it in anything you're not sure will make money. Use the free tiers. Cache aggressively. Only call the model when you absolutely have to.
And honestly? They were right. If you go to OpenAI directly and use GPT-4o, you're paying $2.50 per million input tokens and $10.00 per million output tokens. I plugged my expected usage into a calculator at like 2am and almost closed my laptop. For a book club bot? That's insane money to spend.
So I went hunting. I had no idea there were this many options out there. I kept finding blog posts that said things like "use DeepSeek" or "try Qwen" and I kept clicking through and getting confused by all these different APIs with different auth schemes and different SDKs.
Then I found Global API. And I was shocked.
They have 184 different AI models all accessible through one endpoint. That's not a typo. One hundred and eighty four. And the prices range from $0.01 all the way up to $3.50 per million tokens. Some of these models cost literally pennies compared to what I was budgeting for.
I sat there for a solid ten minutes just refreshing their model list. It blew my mind that this was something I had never heard mentioned once during my entire bootcamp.
The Models I Actually Considered
Here's the breakdown of what I looked at. I'm going to give you the actual prices because these numbers are what convinced me this whole thing was doable.
GPT-4o, the one I was originally going to use, sits at $2.50 input and $10.00 output per million tokens. It has a 128K context window which is plenty for summarizing articles. But that output price. Yikes.
Then I looked at the alternatives:
DeepSeek V4 Flash comes in at $0.27 input and $1.10 output, with a 128K context window. That's almost a tenth of GPT-4o's output price.
DeepSeek V4 Pro is $0.55 input and $2.20 output, but you get a 200K context window. For a summarization bot this is overkill but I liked knowing it was there.
Qwen3-32B is $0.30 input and $1.20 output. The context is smaller at 32K, but most articles fit comfortably in that.
GLM-4 Plus was another option I hadn't heard of before. $0.20 input and $0.80 output, 128K context. Even cheaper than DeepSeek Flash.
I ended up going with DeepSeek V4 Flash because the quality felt great in my testing and the price was already so low that I didn't need to optimize further. But just knowing I had options like GLM-4 Plus for the really simple stuff felt like a safety net.
The Code Part (Where I Actually Built It)
Okay so here's where I get to feel useful for a second. The actual code is surprisingly short. I was expecting to have to write a ton of stuff to handle different API formats for different providers, but Global API uses the OpenAI-compatible interface, which means I could just use the openai Python library I'm already familiar with.
Here's the core of my bot:
import openai
import os
client = openai.OpenAI(
base_url="https://global-apis.com/v1",
api_key=os.environ["GLOBAL_API_KEY"],
)
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V4-Flash",
messages=[{"role": "user", "content": "Your prompt"}],
)
That's it. That's the whole integration. I had no idea this would be so clean. The only weird thing was the model name format, which has the company prefix in it, but once I got past that it was smooth sailing.
For the actual Discord part, I used discord.py which is what my bootcamp instructor recommended for bot projects. The full flow looks roughly like this:
import discord
import openai
import os
intents = discord.Intents.default()
intents.message_content = True
client = discord.Client(intents=intents)
ai_client = openai.OpenAI(
base_url="https://global-apis.com/v1",
api_key=os.environ["GLOBAL_API_KEY"],
)
@client.event
async def on_message(message):
if message.author.bot:
return
if message.content.startswith("!summarize"):
article = message.content[len("!summarize"):].strip()
response = ai_client.chat.completions.create(
model="deepseek-ai/DeepSeek-V4-Flash",
messages=[
{"role": "system", "content": "You are a helpful assistant that summarizes articles concisely."},
{"role": "user", "content": f"Summarize this: {article}"}
],
)
await message.channel.send(response.choices[0].message.content)
client.run(os.environ["DISCORD_TOKEN"])
I wrote this, ran it, and it worked. The first message my bot summarized was about the history of paper clips. My friends were weirdly impressed. I was weirdly proud.
The Stuff I Wish Someone Told Me In Bootcamp
Once I had the bot working, I started reading more about how to actually run something like this responsibly. Here's the stuff I picked up that I think every beginner should know.
First, caching. If two people paste the same article, you're hitting the API twice for nothing. Setting up even a basic cache with a 40% hit rate saves real money. I used a simple dictionary in memory to start, and I'm planning to move to Redis when this gets bigger.
Second, streaming. Instead of waiting for the whole response and sending it all at once, you can stream the tokens as they come back. This makes the bot feel way more responsive even though the total time is similar. Average latency I measured was around 1.2 seconds to first token, with throughput around 320 tokens per second. Streaming makes that 1.2 second wait feel like nothing.
Third, use cheaper models for simple stuff. Global API has this tier called GA-Economy that gives you about 50% cost reduction for queries that don't need a fancy model. For things like "is this message spam?" you don't need GPT-4o. You need anything that can do basic classification.
Fourth, monitor quality. I added a simple reaction-based feedback system where users can react with 👍 or 👎 to a bot response. This gave me an 84.6% satisfaction rate based on my early numbers, which matched the benchmark scores I'd seen for DeepSeek V4 Flash.
Fifth, have a fallback. Sometimes APIs rate limit you. Sometimes they go down. I added a try/except that falls back to a simpler response if the API fails. The bot still feels useful even when the smart stuff breaks.
The Numbers That Made Me A Believer
Let me put this in perspective with actual math, because this is what sold me.
My book club bot, in its first week, processed about 200 article summaries. Average input was maybe 4000 tokens per article. Average output was around 500 tokens per summary.
If I had used GPT-4o, my weekly cost would have been:
- Input: 200 × 4000 = 800,000 tokens = $2.00
- Output: 200 × 500 = 100,000 tokens = $1.00
- Total: $3.00 per week
With DeepSeek V4 Flash, my weekly cost was:
- Input: 800,000 tokens × $0.27/M = $0.216
- Output: 100,000 tokens × $1.10/M = $0.11
- Total: $0.326 per week
That's roughly 90% cheaper. The article I read on Global API's blog mentioned a 40-65% cost reduction for typical platform workloads, but my case was even more dramatic because I happened to pick a model that was way more affordable than GPT-4o.
For a book club with 40 people and modest usage, I'm spending basically nothing. If usage exploded ten times over, I'd still be under five dollars a week. That math just does not work with GPT-4o.
Setup Time Was Wildly Underestimated (In A Good Way)
The article I read claimed you can get set up in under 10 minutes using Global API's unified SDK. I thought that was marketing speak. It's basically true.
Signing up took a couple minutes. Getting an API key was instant. The first API call I made, from my terminal, worked on the first try. That never happens to me. Usually I'm debugging for an hour before something works.
The whole thing from "I should build this" to "my friends are using it" was one afternoon. I had no idea a project like this could move that fast.
Where I Am Now
The bot has been live for about a week. People use it. Nobody has complained. A few people have said nice things about it, which is a feeling I am not used to getting from software I wrote.
I'm planning to add a few features:
- A "key points" mode that pulls out bullet points instead of a paragraph
- A question mode where you can ask follow-up questions about the article
- Maybe a weekly digest that summarizes the most-discussed articles
Each of these is going to add API calls. And each one is going to make me grateful that I found an option that doesn't charge like a luxury service.
If you're a bootcamp grad like me, or just someone who's been too scared to put AI in their side projects because of the cost, I genuinely think you should check out Global API. The pricing page is at global-apis.com/v1 and they have 100 free credits to start, which is more than enough to run a small bot for a while and see if you like it.
I'm not being paid to say any of this. I'm just a person who was genuinely shocked at how much cheaper this was than what I expected, and I figured I'd write it down while the feeling was still fresh.
Anyway. That's my Discord bot story. If you want to build one too, you absolutely can. It's easier and cheaper than I thought it would be. That's the whole post.
Top comments (0)