DEV Community

agenthustler
agenthustler

Posted on

How to Scrape Discord: Servers, Messages, and User Data

Discord has become a major hub for communities. This guide covers legitimate approaches to collecting Discord data using bots and the official API.

Important: Use the Official API

Discord explicitly prohibits scraping via browser automation or self-bots. The correct approach is to use Discord's Bot API or OAuth2. This guide focuses entirely on legitimate methods.

Setting Up a Discord Bot

Create a bot at discord.com/developers:

  1. Create a new application
  2. Go to "Bot" tab and create a bot
  3. Enable necessary intents (Message Content, Server Members)
  4. Copy the bot token
pip install discord.py aiohttp
Enter fullscreen mode Exit fullscreen mode

Building a Data Collection Bot

import discord
from discord.ext import commands
import json
from datetime import datetime

class DataCollectorBot(commands.Bot):
    def __init__(self):
        intents = discord.Intents.default()
        intents.message_content = True
        intents.members = True
        super().__init__(command_prefix="!", intents=intents)

    async def on_ready(self):
        print(f"Bot connected as {self.user}")
        print(f"In {len(self.guilds)} servers")

bot = DataCollectorBot()
Enter fullscreen mode Exit fullscreen mode

Collecting Server Information

@bot.command()
async def server_info(ctx):
    guild = ctx.guild
    data = {
        "name": guild.name,
        "id": guild.id,
        "member_count": guild.member_count,
        "created_at": guild.created_at.isoformat(),
        "channels": len(guild.channels),
        "roles": len(guild.roles),
        "boost_level": guild.premium_tier,
        "categories": [{
            "name": cat.name,
            "channels": [ch.name for ch in cat.channels]
        } for cat in guild.categories]
    }
    await ctx.send(f"Collected data for {guild.name}")
    return data
Enter fullscreen mode Exit fullscreen mode

Collecting Message History

@bot.command()
async def collect_messages(ctx, channel_name: str, limit: int = 100):
    channel = discord.utils.get(ctx.guild.channels, name=channel_name)
    if not channel:
        await ctx.send(f"Channel '{channel_name}' not found")
        return

    messages = []
    async for message in channel.history(limit=limit):
        messages.append({
            "author": str(message.author),
            "content": message.content,
            "timestamp": message.created_at.isoformat(),
            "reactions": [{
                "emoji": str(r.emoji), "count": r.count
            } for r in message.reactions],
            "attachments": [a.url for a in message.attachments]
        })

    filename = f"messages_{channel_name}_{datetime.now():%Y%m%d}.json"
    with open(filename, "w") as f:
        json.dump(messages, f, indent=2)
    await ctx.send(f"Collected {len(messages)} messages from #{channel_name}")
Enter fullscreen mode Exit fullscreen mode

Tracking Server Activity

class ActivityTracker:
    def __init__(self):
        self.hourly_activity = {}
        self.active_users = {}
        self.popular_channels = {}

    def record_message(self, message):
        hour = message.created_at.hour
        self.hourly_activity[hour] = self.hourly_activity.get(hour, 0) + 1
        author = str(message.author)
        self.active_users[author] = self.active_users.get(author, 0) + 1
        channel = message.channel.name
        self.popular_channels[channel] = self.popular_channels.get(channel, 0) + 1

    def get_report(self):
        return {
            "peak_hour": max(self.hourly_activity, key=self.hourly_activity.get)
                if self.hourly_activity else None,
            "top_users": sorted(self.active_users.items(),
                key=lambda x: x[1], reverse=True)[:10],
            "top_channels": sorted(self.popular_channels.items(),
                key=lambda x: x[1], reverse=True)[:10]
        }

tracker = ActivityTracker()

@bot.event
async def on_message(message):
    if not message.author.bot:
        tracker.record_message(message)
    await bot.process_commands(message)
Enter fullscreen mode Exit fullscreen mode

Exporting Data

import csv

@bot.command()
async def export(ctx, channel_name: str):
    channel = discord.utils.get(ctx.guild.channels, name=channel_name)
    messages = []
    async for message in channel.history(limit=500):
        messages.append({
            "author": str(message.author),
            "content": message.content,
            "timestamp": message.created_at.isoformat()
        })

    filename = f"{channel_name}_export.csv"
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=["author", "content", "timestamp"])
        writer.writeheader()
        writer.writerows(messages)
    await ctx.send(file=discord.File(filename))
Enter fullscreen mode Exit fullscreen mode

Running the Bot

import os

if __name__ == "__main__":
    token = os.environ.get("DISCORD_BOT_TOKEN")
    if not token:
        print("Set DISCORD_BOT_TOKEN environment variable")
        exit(1)
    bot.run(token)
Enter fullscreen mode Exit fullscreen mode

Scaling Your Data Collection

ScraperAPI can help when fetching external links shared in Discord. ThorData provides proxy support for processing URLs from messages. Monitor with ScrapeOps.

Rules and Ethics

  1. Always use the official Bot API — never self-bots or browser automation
  2. Get server admin permission before collecting data
  3. Respect Discord's Terms of Service and rate limits
  4. Don't collect DMs without explicit consent
  5. Handle personal data carefully — comply with GDPR/CCPA
  6. Be transparent — let members know about data collection
  7. Never sell personal data from Discord

Conclusion

Discord data collection should always go through the official Bot API. Build your bot with transparency, get proper permissions, and always prioritize user privacy over data quantity.

Top comments (0)