Olaniyi Olabode

Posted on Nov 28

📝How I Built a Fully Automated Telegram Moderation Bot — An Engineering Case Study

#webdev #programming #development #telegram

I built a Telegram bot that automatically detects and handles duplicate messages (spam), promotional messages, and blacklisted keywords; logs every action; tracks infractions in Redis; and provides a clean admin surface (planned) for whitelist and keyword management. It runs on Node + TypeScript, uses Telegraf for Telegram, PostgreSQL for persistent data, and Redis for fast infra counters.

This article is a detailed, step by step breakdown of the problem, architecture, implementation, testing, monitoring, and deployment. It’s written as a technical case study so you and other engineers can reproduce, audit, or extend the system.

Table of contents

Why I built this
Goals & constraints
Tech stack
High-level architecture (diagram)
Data model (schema.sql)
Core modules — explanation + code snippets
Message flow
ensureChat / ensureUser (DB helpers)
Duplicate detector (Redis + normalization)
Promo detector
Keyword blacklist
Audit logging
Infractions and auto-ban
How I tested each feature (step-by-step)
Monitoring & alerts (production-lite approach)
Admin frontend (concept + API contract)
Deployment notes (free options)
Security & hardening notes
Future improvements & roadmap
Appendix: useful commands and snippets

1. Why I built this

Telegram groups scale quickly and inevitably attract spam, unwelcome promotions, or repeated posts that lower signal to noise ratio. The goal was a resilient, minimal, low cost system that:

Protects the group from repeated posts (spam) and ads,
Logs everything for auditability,
Lets maintainers whitelist trusted users,
Remains free-hosting friendly and easy to operate.

2. Goals & constraints

Functional: detect duplicates, detect promos, block blacklisted keywords, warn/delete/ban, persist logs.
Operational: work on free or cheap infrastructure, be robust, and recoverable.
Design: simple DB schema, Redis for ephemeral counters, code in TypeScript, clean structure for future admin UI.

3. Tech stack

Language: Node.js + TypeScript
Telegram SDK: Telegraf
DB: PostgreSQL (persistent storage)
Cache / counters: Redis (fast infra counters + TTL)
Dev tooling: ts-node-dev for development, tsc for build
Optional admin frontend: React + Vite + Tailwind (planned)
Hosting: Docker locally for dev; free cloud choices: Render / Railway / Fly / Vercel for frontend

Notes:

DB stores persistent state (chats, users, policies, whitelist, keywords, audit logs).

Redis stores ephemeral infra counters with TTL (e.g., 24 hours).

The admin API reads/writes DB and talks to Redis to reset infra keys.

5. Data model (schema.sql)

This is the schema used in my project (run it against your Postgres):

-- chats
CREATE TABLE IF NOT EXISTS chats (
  chat_id BIGINT PRIMARY KEY,
  title TEXT,
  created_at TIMESTAMP DEFAULT NOW()
);

-- users
CREATE TABLE IF NOT EXISTS users (
  user_id BIGINT PRIMARY KEY,
  first_name TEXT,
  last_name TEXT,
  username TEXT,
  created_at TIMESTAMP DEFAULT NOW()
);

-- policies
CREATE TABLE IF NOT EXISTS policies (
  chat_id BIGINT PRIMARY KEY REFERENCES chats(chat_id) ON DELETE CASCADE,
  duplicate_threshold INT DEFAULT 3,
  created_at TIMESTAMP DEFAULT NOW()
);

-- whitelist
CREATE TABLE IF NOT EXISTS whitelist (
  chat_id BIGINT REFERENCES chats(chat_id) ON DELETE CASCADE,
  user_id BIGINT REFERENCES users(user_id) ON DELETE CASCADE,
  PRIMARY KEY (chat_id, user_id)
);

-- audit logs
CREATE TABLE IF NOT EXISTS audit_logs (
  id SERIAL PRIMARY KEY,
  chat_id BIGINT REFERENCES chats(chat_id) ON DELETE CASCADE,
  user_id BIGINT REFERENCES users(user_id) ON DELETE CASCADE,
  message_id BIGINT,
  rule_triggered TEXT,
  action_taken TEXT,
  message_text TEXT,
  meta JSONB,
  created_at TIMESTAMP DEFAULT NOW()
);

-- keyword blacklist
CREATE TABLE IF NOT EXISTS keyword_blacklist (
  chat_id BIGINT REFERENCES chats(chat_id) ON DELETE CASCADE,
  keyword TEXT,
  PRIMARY KEY(chat_id, keyword)
);

6. Core modules — explanation + code snippets

I’ll walk through the main parts of the code and explain why they were designed that way. (These are extracts adapted to be readable the full project contains types and helpers.)

6.1 Message flow (handleMessage)

Every chat message passes through this handler:

Persist chat/user
Skip if whitelisted
Duplicate detection
Promo detection
Keyword blacklist
Execute action (delete/warn/ban)
Log action to audit_logs

Key points: always use parameterized SQL queries, protect against missing fields (ctx.chat might be undefined in some update types), and catch errors.

Example (simplified handler):

export async function handleMessage(ctx: Context) {
  const chatId = ctx.chat?.id;
  const msg = ctx.message;
  if (!chatId || !msg || !("text" in msg) || !msg.text) return;

  const text = msg.text;
  const user = msg.from;
  if (!user) return;

  await ensureChat(chatId, msg.chat.title);
  await ensureUser(user);

  // Whitelist check
  const wl = await query('SELECT 1 FROM whitelist WHERE chat_id=$1 AND user_id=$2', [chatId, user.id]);
  if (wl.rowCount > 0) return;

  // Duplicate detection + infra counter
  // Promo detection
  // Keyword blacklist
  // Logging and actions...
}

6.2 ensureChat / ensureUser (db helpers)

These upsert basic metadata so other tables can reference them.

export async function ensureChat(chatId: number, title?: string) {
  await query(
    `INSERT INTO chats (chat_id, title) VALUES ($1, $2)
     ON CONFLICT (chat_id) DO UPDATE SET title = EXCLUDED.title`,
    [chatId, title || null]
  );
}

export async function ensureUser(user: any) {
  if (!user) return;
  await query(
    `INSERT INTO users (user_id, username, first_name, last_name)
     VALUES ($1, $2, $3, $4)
     ON CONFLICT (user_id) DO UPDATE SET
       username = EXCLUDED.username, first_name = EXCLUDED.first_name, last_name = EXCLUDED.last_name`,
    [user.id, user.username || null, user.first_name || null, user.last_name || null]
  );
}

6.3 Duplicate detector (Redis + normalizer)

Design:

Normalize text (lowercase, trim, collapse whitespace/punctuation optionally)
Use Redis lists or hashes per user+chat or a key pattern like:
dup:{chatId}:{userId}:{normalizedText} with TTL
Simpler: push to a Redis list keyed by dup:{chatId}:{userId}, but we used a hash of message text to counts.

Implementation (conceptual):

pushAndCheckDuplicate(chatId, userId, messageId, text) → returns { duplicateCount }
If duplicateCount >= DUPLICATE_THRESHOLD → delete+warn and increment infra counter
Redis usage example:

// pseudo

const key = `dup:${chatId}:${userId}:${hash(normalizedText)}`;
const count = await redis.incr(key);
await redis.expire(key, DUPLICATE_WINDOW); // e.g., 300s
return { duplicateCount: count };

Infra counter:

Key: infra:${chatId}:${userId} → incr on each duplicated incident, TTL 24h

When infra >= 3 → bot attempts ctx.banChatMember(user.id) and logs

6.4 Promo detector

A lightweight regex-based detector is fine to start. Typical checks:

Telegram invites t.me/ or telegram.me/

URLs with suspicious keywords (channel, group, join)

Phone numbers or payment links — policy-dependent

Example:

export function detectPromo(text: string): boolean {
  // very simple
  return /t\.me\/\w+/.test(text) || /joinchat|telegram\.me/.test(text);
}

You can make this more advanced later (ML / heuristics).

6.5 Keyword blacklist

Per chat table keyword_blacklist. On message:

Load keywords for chat
Normalize message text
Check includes(keyword) on normalized text
Delete + warn + log

Reason to store in DB: admins can edit keywords and changes are persistent.

6.6 Audit logging

Every action writes a row to audit_logs. Keep meta as JSON for flexible data (duplicate counts, infra counts, matched keyword). Example call:

await logAudit(chatId, user.id, msg.message_id, "duplicate", "deleted+warn", text, { duplicateCount });

7. How I tested each feature (step-by-step)

Testing is crucial. Here’s the exact process I used while developing locally (Docker):

Setup (dev)

Run Postgres + Redis in Docker Compose.
Start bot in dev (npm run dev).
Invite bot to a test group and send messages.

Tests

Duplicate detection

Ensure not whitelisted.
Send Hi three times.
Expect:
On 3rd message, bot deletes the last message and warns the user.

Redis key infra:: increments after repeated offenses.

Check audit_logs for rule_triggered = 'duplicate'.

Promo detection

Send Check this channel: t.me/something.
Expect deletion, a warning, and audit_logs row rule_triggered = 'promo'. ** Keyword blacklist**

1. Insert keyword into DB:

INSERT INTO keyword_blacklist (chat_id, keyword)
VALUES (<CHAT_ID>, 'spoiler')
ON CONFLICT DO NOTHING;

2. Send a message containing spoiler.

3. Expect deletion + warning + audit log.

Whitelist

Insert whitelist entry:

INSERT INTO whitelist (chat_id, user_id)
VALUES (<CHAT_ID>, <USER_ID>)
ON CONFLICT DO NOTHING;

Send spam/promo/keyword messages — expect bot to ignore.

Audit logging

Trigger each rule
Query audit_logs to verify message text, rule trigger, action, and meta.

Infra / Ban
To test real banning, use a non-admin test account (bot cannot ban owners/admins). Repeat infra-causing actions to trigger ban and confirm user is removed from group.

8. Monitoring & alerts (production-lite)

I wanted something free, reliable, and simple to operate.

8.1 Logging

Use winston or pino to write structured logs to files + console.
Keep logs in JSON for easy parsing.

8.2 Admin alerts

On critical errors (unhandled exceptions, DB connection errors), notify a private Telegram admin chat:

// example on error

await bot.telegram.sendMessage(ADMIN_CHAT_ID, `Bot error: ${err.message}`);

8.3 Health checks

Expose /health endpoint on the admin API that checks:
- DB connectivity
- Redis connectivity
- Bot can call getMe or similar
Use a free uptime monitor (UptimeRobot) to ping /health every 5 minutes and notify you by email if it fails.

8.4 Metrics (basic)

Periodically compute counts (audit logs per day) and send a digest to admin chat (once per day).
Optional: lightweight Prometheus client + Grafana later (overkill for now).

Admin frontend (concept + API contract)

I planned a React admin frontend with the following responsibilities:

CRUD for whitelist and blacklist
View & filter audit logs
Edit policies (duplicate threshold, promo action)
Reset infra counters (calls Redis)
Dashboard metrics & charts

API contract (examples):

`- GET /api/chats?page=1&size=20 → list chats

GET /api/chats/:chatId → chat detail with policies, whitelist, keywords
POST /api/chats/:chatId/whitelist → add whitelist
DELETE /api/chats/:chatId/whitelist/:userId → remove whitelist
POST /api/reset-infra → body { chatId, userId } (backend clears Redis key)`

The frontend starts with a mock API so development is frictionless. When ready, swap VITE_API_BASE_URL.

10. Deployment notes (free options)

Backend (bot + admin API)
Deploy as Docker containers (Render, Railway, Fly). Railway/Render provide free tiers for small projects — good for prototyping.
Ensure PG and Redis are persistent volumes or managed services.

Frontend

Deploy the React app on Vercel or Netlify (both have free tiers and easy integration).

Production considerations

Use environment variables for BOT_TOKEN, DATABASE_URL, REDIS_URL.
Use a process manager or container auto-restart to keep bot up.
Add daily backups for Postgres (use pg_dump or provider snapshots).

11. Security & hardening

Never commit tokens. Use env variables and secrets manager where possible.
SQL safety: use parameterized queries — no string concatenation.
Rate limits: respect Telegram API limits (avoid mass requests).
Permissions: bot should have only the permissions it needs in the group (delete messages, ban members only if necessary).
Admin API: protect with a token. Later add OAuth / Telegram login if needed.
Error reporting: do not leak sensitive info in alerts.

12. Future improvements & roadmap

Short term:

Add admin UI (React) and wire to real admin API
Add more flexible penalty steps (mute → temp ban → perm ban)
Better normalization for duplicates (strip emojis, repeated punctuation)

Mid-term:

ML based spam classifier to reduce false positives
Bulk audit exports, per-chat analytics
Multi-bot management (SaaS style)

Long term:

SaaS: multi-tenant admin console, billing, onboarding flows

Appendix — useful commands & snippets

version: "3.8"
services:
  postgres:
    image: postgres:15
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: telegram_bot
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data

  redis:
    image: redis:7
    ports:
      - "6379:6379"

volumes:
  pgdata:

Apply DB schema (copy schema.sql into container, then run)

docker cp schema.sql telegram-bot-postgres-1:/schema.sql
docker exec -i telegram-bot-postgres-1 psql -U postgres -d telegram_bot -f /schema.sql

Reset infra in Redis

docker exec -it telegram-bot-redis-1 redis-cli
DEL infra:<chatId>:<userId>
GET infra:<chatId>:<userId>

Run bot in dev

npm install
npm run dev # uses ts-node-dev

Closing notes

This project shows how much you can achieve with a small, well-designed stack: Telegraf + Postgres + Redis + a bit of Node glue. The important design principles I applied were:

Keep persistent vs ephemeral data separate (Postgres vs Redis)
Keep logic stateless where possible and log all decisions for auditability
Keep admin controls outside the bot (admin API + frontend) so operators can change rules without redeploying the bot

DEV Community

📝How I Built a Fully Automated Telegram Moderation Bot — An Engineering Case Study

Top comments (0)