agenthustler

Posted on Mar 26

How to Scrape Discord: Servers, Messages, and User Data

#python #tutorial #webdev #programming

Discord has become a major hub for communities. This guide covers legitimate approaches to collecting Discord data using bots and the official API.

Important: Use the Official API

Discord explicitly prohibits scraping via browser automation or self-bots. The correct approach is to use Discord's Bot API or OAuth2. This guide focuses entirely on legitimate methods.

Setting Up a Discord Bot

Create a bot at discord.com/developers:

Create a new application
Go to "Bot" tab and create a bot
Enable necessary intents (Message Content, Server Members)
Copy the bot token

pip install discord.py aiohttp

Building a Data Collection Bot

import discord
from discord.ext import commands
import json
from datetime import datetime

class DataCollectorBot(commands.Bot):
    def __init__(self):
        intents = discord.Intents.default()
        intents.message_content = True
        intents.members = True
        super().__init__(command_prefix="!", intents=intents)

    async def on_ready(self):
        print(f"Bot connected as {self.user}")
        print(f"In {len(self.guilds)} servers")

bot = DataCollectorBot()

Collecting Server Information

@bot.command()
async def server_info(ctx):
    guild = ctx.guild
    data = {
        "name": guild.name,
        "id": guild.id,
        "member_count": guild.member_count,
        "created_at": guild.created_at.isoformat(),
        "channels": len(guild.channels),
        "roles": len(guild.roles),
        "boost_level": guild.premium_tier,
        "categories": [{
            "name": cat.name,
            "channels": [ch.name for ch in cat.channels]
        } for cat in guild.categories]
    }
    await ctx.send(f"Collected data for {guild.name}")
    return data

Collecting Message History

@bot.command()
async def collect_messages(ctx, channel_name: str, limit: int = 100):
    channel = discord.utils.get(ctx.guild.channels, name=channel_name)
    if not channel:
        await ctx.send(f"Channel '{channel_name}' not found")
        return

    messages = []
    async for message in channel.history(limit=limit):
        messages.append({
            "author": str(message.author),
            "content": message.content,
            "timestamp": message.created_at.isoformat(),
            "reactions": [{
                "emoji": str(r.emoji), "count": r.count
            } for r in message.reactions],
            "attachments": [a.url for a in message.attachments]
        })

    filename = f"messages_{channel_name}_{datetime.now():%Y%m%d}.json"
    with open(filename, "w") as f:
        json.dump(messages, f, indent=2)
    await ctx.send(f"Collected {len(messages)} messages from #{channel_name}")

Tracking Server Activity

class ActivityTracker:
    def __init__(self):
        self.hourly_activity = {}
        self.active_users = {}
        self.popular_channels = {}

    def record_message(self, message):
        hour = message.created_at.hour
        self.hourly_activity[hour] = self.hourly_activity.get(hour, 0) + 1
        author = str(message.author)
        self.active_users[author] = self.active_users.get(author, 0) + 1
        channel = message.channel.name
        self.popular_channels[channel] = self.popular_channels.get(channel, 0) + 1

    def get_report(self):
        return {
            "peak_hour": max(self.hourly_activity, key=self.hourly_activity.get)
                if self.hourly_activity else None,
            "top_users": sorted(self.active_users.items(),
                key=lambda x: x[1], reverse=True)[:10],
            "top_channels": sorted(self.popular_channels.items(),
                key=lambda x: x[1], reverse=True)[:10]
        }

tracker = ActivityTracker()

@bot.event
async def on_message(message):
    if not message.author.bot:
        tracker.record_message(message)
    await bot.process_commands(message)

Exporting Data

import csv

@bot.command()
async def export(ctx, channel_name: str):
    channel = discord.utils.get(ctx.guild.channels, name=channel_name)
    messages = []
    async for message in channel.history(limit=500):
        messages.append({
            "author": str(message.author),
            "content": message.content,
            "timestamp": message.created_at.isoformat()
        })

    filename = f"{channel_name}_export.csv"
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=["author", "content", "timestamp"])
        writer.writeheader()
        writer.writerows(messages)
    await ctx.send(file=discord.File(filename))

Running the Bot

import os

if __name__ == "__main__":
    token = os.environ.get("DISCORD_BOT_TOKEN")
    if not token:
        print("Set DISCORD_BOT_TOKEN environment variable")
        exit(1)
    bot.run(token)

Scaling Your Data Collection

ScraperAPI can help when fetching external links shared in Discord. ThorData provides proxy support for processing URLs from messages. Monitor with ScrapeOps.

Rules and Ethics

Always use the official Bot API — never self-bots or browser automation
Get server admin permission before collecting data
Respect Discord's Terms of Service and rate limits
Don't collect DMs without explicit consent
Handle personal data carefully — comply with GDPR/CCPA
Be transparent — let members know about data collection
Never sell personal data from Discord

Conclusion

Discord data collection should always go through the official Bot API. Build your bot with transparency, get proper permissions, and always prioritize user privacy over data quantity.

DEV Community