DEV Community

Mukhtar
Mukhtar

Posted on • Edited on

Turn Your Email Into a SQL Database

Your inbox is data. Query it like data.


The 2-Minute Version:

# Install and ingest your Gmail
brew tap surveilr/tap && brew install surveilr
surveilr admin init -d email.db
surveilr ingest imap \
  -u you@gmail.com \
  -p "your-app-password" \
  -a imap.gmail.com \
  -d email.db

# Query it
surveilr shell -d email.db
Enter fullscreen mode Exit fullscreen mode
-- Find every email from a specific sender
SELECT subject, date, "from"
FROM emails
WHERE "from" LIKE '%@vendor.com%'
ORDER BY date DESC;
Enter fullscreen mode Exit fullscreen mode

Your inbox is now a SQL database. Query it like any other data.


The Problem with Email Search

Gmail and Outlook have search boxes. They work for simple queries.

But try this:

  • "Show me all emails from Q1 2024 where attachments were over 10MB"
  • "Find email threads that mention both 'invoice' and 'overdue'"
  • "List all external recipients I've emailed in the last 6 months"
  • "Which emails have attachments but no reply?"

Most email clients can't answer these questions.

Here's why:

Email search is designed for finding messages, not analyzing communication patterns.

surveilr turns your inbox into a SQLite database where these queries become trivial.


What surveilr Does

surveilr connects to any IMAP server and extracts everything into SQL tables:

  • From, To, CC, BCC
  • Subject lines and dates
  • Message content and headers
  • Attachments (with optional extraction)
  • Thread relationships
  • Folder structure

The output is standard SQLite. Use any SQL tool you want.


Step 1: Get an App Password

For Gmail:

  1. Go to Google Account → App Passwords
  2. Generate an app password
  3. Copy it (you'll use this instead of your regular password)

Step 2: Create a Database

surveilr admin init -d email.db
Enter fullscreen mode Exit fullscreen mode

Standard SQLite file. Nothing fancy.


Step 3: Ingest Your Email

surveilr ingest imap \
  -d email.db \
  -u your.email@gmail.com \
  -p "your-app-password" \
  -a imap.gmail.com \
  -f "INBOX"
Enter fullscreen mode Exit fullscreen mode

surveilr connects via IMAP and inserts messages into ur_ingest_session_imap_acct_folder_message.

Useful flags:

Filter by status:

--status unread    # Only unread messages
--status starred   # Starred/flagged
--status all       # Everything (default)
Enter fullscreen mode Exit fullscreen mode

Extract attachments:

--extract-attachments uniform-resource  # Store in database
Enter fullscreen mode Exit fullscreen mode

Show progress:

--progress  # Display download progress
Enter fullscreen mode Exit fullscreen mode

Limit messages:

-b 1000  # Process up to 1000 messages
Enter fullscreen mode Exit fullscreen mode

Step 4: Query Your Inbox

Open the SQL shell:

surveilr shell -d email.db
Enter fullscreen mode Exit fullscreen mode

Now you can ask questions that Gmail search can't answer.

Find all emails from a specific domain

SELECT
    "from",
    subject,
    date
FROM ur_ingest_session_imap_acct_folder_message
WHERE "from" LIKE '%@vendor.com%'
ORDER BY date DESC;
Enter fullscreen mode Exit fullscreen mode

Note: from is quoted because it's a SQL keyword.

Find emails with large attachments

SELECT
    m.subject,
    m."from",
    m.date,
    a.name AS attachment_name,
    a.size_bytes / 1024 / 1024 AS size_mb
FROM ur_ingest_session_imap_acct_folder_message m
JOIN ur_ingest_session_attachment a
  ON m.message_id = a.message_id
WHERE a.size_bytes > 10485760
ORDER BY a.size_bytes DESC;
Enter fullscreen mode Exit fullscreen mode

Track email volume over time

SELECT
    DATE(date) AS day,
    COUNT(*) AS emails
FROM ur_ingest_session_imap_acct_folder_message
WHERE date > date('now', '-30 days')
GROUP BY DATE(date)
ORDER BY day;
Enter fullscreen mode Exit fullscreen mode

Find emails you sent but never got a reply to

SELECT
    subject,
    "to",
    date
FROM ur_ingest_session_imap_acct_folder_message
WHERE "from" LIKE '%you@company.com%'
  AND message_id NOT IN (
    SELECT in_reply_to
    FROM ur_ingest_session_imap_acct_folder_message
    WHERE in_reply_to IS NOT NULL
  )
ORDER BY date DESC;
Enter fullscreen mode Exit fullscreen mode

These queries are impossible in Gmail.


List Available Folders

Not sure what folders you have?

surveilr ingest imap \
  -u your.email@gmail.com \
  -p "your-app-password" \
  -a imap.gmail.com \
  --list-folders
Enter fullscreen mode Exit fullscreen mode

Then ingest specific folders:

surveilr ingest imap \
  -u your.email@gmail.com \
  -p "your-app-password" \
  -a imap.gmail.com \
  -f "[Gmail]/Sent Mail" \
  -d email.db
Enter fullscreen mode Exit fullscreen mode

Export Results

Need to share findings? Export as JSON:

surveilr shell -d email.db --cmd "
SELECT subject, \"from\", date
FROM ur_ingest_session_imap_acct_folder_message
WHERE subject LIKE '%invoice%'
" --output json > invoices.json
Enter fullscreen mode Exit fullscreen mode

Or CSV for spreadsheet analysis:

surveilr shell -d email.db --cmd "
SELECT * FROM ur_ingest_session_imap_acct_folder_message
" --output csv > all-emails.csv
Enter fullscreen mode Exit fullscreen mode

Why Not Just Export to CSV?

You could export your inbox to CSV or use a backup tool.

But:

Exports are point-in-time snapshots.
You can't incrementally update them.

surveilr supports continuous ingestion.
Run it weekly and accumulate history over time.

CSVs don't preserve relationships.
How do you link attachments to messages?

surveilr uses proper foreign keys.
Attachments reference their parent messages.

CSVs are hard to query.
You're back to scripting with pandas or awk.

surveilr gives you SQL.
Join, filter, aggregate—all standard SQL.


Forensic Email Analysis

Once your inbox is queryable, you can investigate patterns:

Find emails that mention specific keywords across years

SELECT subject, "from", date
FROM emails
WHERE (subject LIKE '%confidential%' OR content LIKE '%confidential%')
  AND date > '2020-01-01'
ORDER BY date DESC;
Enter fullscreen mode Exit fullscreen mode

Track who you email most frequently

SELECT
    "to",
    COUNT(*) AS email_count
FROM emails
WHERE "from" LIKE '%you@company.com%'
GROUP BY "to"
ORDER BY email_count DESC
LIMIT 20;
Enter fullscreen mode Exit fullscreen mode

Find emails sent on weekends or late at night

SELECT subject, "to", date
FROM emails
WHERE strftime('%w', date) IN ('0', '6')  -- Sunday or Saturday
   OR CAST(strftime('%H', date) AS INTEGER) > 22
   OR CAST(strftime('%H', date) AS INTEGER) < 6
ORDER BY date DESC;
Enter fullscreen mode Exit fullscreen mode

These insights are buried in your inbox. SQL makes them visible.


Real-World Uses

For Developers

  • Track bug reports sent via email
  • Find API key or credential leaks in sent mail
  • Analyze support ticket volume
  • Search across years of technical discussions

For Operations

  • Audit external communications
  • Track alert emails and response times
  • Find recurring issues mentioned in email
  • Monitor vendor correspondence

For Legal/E-Discovery

  • Respond to discovery requests with SQL queries
  • Prove communication timelines
  • Find all emails related to a specific matter
  • Extract conversations by date range

For Compliance (Oh, By the Way)

  • HIPAA: Track communications containing PHI
  • SOX: Maintain 7-year email records with queryable evidence
  • GDPR: Respond to "right to access" requests with SQL
  • E-Discovery: Legal hold and litigation support

But the real value is permanent, queryable communication history.


Open the Database in Other Tools

Because it's SQLite, you can use any SQLite tool:

  • DB Browser for SQLite — Visual inspection
  • Datasette — Instant web UI
  • Python pandaspd.read_sql("SELECT * FROM emails", conn)
  • DuckDB — Fast analytical queries
  • Metabase — Build dashboards
  • R / tidyverse — Data analysis workflows

You own the database. Query it however you want.


Security Best Practices

Use App Passwords, Not Your Main Password

Never put your main email password in scripts. Use app-specific passwords that you can revoke.

Store Passwords Securely

Use environment variables:

export IMAP_PASSWORD="your-app-password"
surveilr ingest imap -u you@gmail.com -p "$IMAP_PASSWORD" -a imap.gmail.com
Enter fullscreen mode Exit fullscreen mode

Or use a password manager's CLI:

surveilr ingest imap -u you@gmail.com -p "$(pass show gmail/app-password)" -a imap.gmail.com
Enter fullscreen mode Exit fullscreen mode

Encrypt the Database

SQLite databases can be encrypted. Use tools like SQLCipher if you need encryption at rest.


Troubleshooting

"Authentication failed"?

  • Make sure you're using an app password, not your main password
  • Verify IMAP is enabled in your email settings
  • Check that 2FA isn't blocking IMAP access

"Connection timed out"?

  • Verify the IMAP server address
  • Check if your firewall blocks port 993
  • Some providers require enabling "less secure app access"

"No messages found"?

  • Use --list-folders to see available folders
  • Folder names are case-sensitive
  • Gmail uses [Gmail]/All Mail not All Mail

What's Next?

You just turned your inbox into a queryable database.

Now combine it with other data:

Or explore more:


The Bottom Line

Your inbox shouldn't be a black box.

Email clients give you search bars and folders. That's it.

surveilr gives you SQL.

Want to know:

  • What you emailed last quarter?
  • Who sends you the most messages?
  • Which threads have attachments but no replies?
  • How your communication patterns changed over time?

Just write a query.

Your email is now a SQLite database you own forever.

No export limits. No API restrictions. Just standard SQL.


Ready to query your inbox? Install surveilr and ingest your first folder.

Get Started →

Top comments (1)

Collapse
 
foxck016077 profile image
foxck016077

your "inbox is data, query it" framing is exactly the mental model i ended up at too, but with a different output. i was building an Apify Actor for the same problem space and the question i kept hitting was: most people i talked to wanted the answers, not the SQL.

so the productized version of your framework that worked for me ended up being a one-shot "Friday triage list" — i query the inbox (Gmail API + refresh-token OAuth on my end so the user does not even hand over an app password), rank the threads by days-silent past SLA, output a one-page report of which 10 client conversations to revive first, with suggested re-engage angle. the SQL layer is invisible to the buyer; they just get the actionable rows.

surveilr is the better path if you want to own the data and run any query. mine is the right path if you want one specific outcome (lost-revenue thread recovery) and skip the SQL.

i recently added a $99 done-for-you tier where i run the scan + send the report. happy to test it on your own inbox at $49 if you want to see the report format side-by-side with what your surveilr queries surface — DM via dev.to.