Mukhtar

Posted on May 20

Turn Any API Into a SQL Database

#api #database #sql #tooling

GitHub, Jira, Salesforce, Stripe—600+ sources → SQLite

The 2-Minute Version:

# Install surveilr
brew tap surveilr/tap && brew install surveilr

# Create a Singer tap script (see below for example)
# then ingest it
surveilr admin init -d project.db
surveilr ingest files -r ./github.surveilr[singer].py -d project.db
surveilr orchestrate adapt-singer -d project.db --stream-prefix github_

# Query it
surveilr shell -d project.db

-- Query GitHub commits from the last 7 days
SELECT author, message, timestamp
FROM github_commits
WHERE timestamp > datetime('now', '-7 days')
ORDER BY timestamp DESC;

Your API data is now SQL. Query it like any other database.

The Problem with API Data

You've got project data scattered across platforms:

GitHub — Issues, commits, pull requests
Jira — Tickets, sprints, workflows
GitLab — Merge requests, pipelines
Salesforce — CRM data, opportunities
Stripe — Payments, subscriptions

Each has its own API. Each has rate limits. Each has different authentication.

And you need to answer questions like:

"Show me all commits related to high-priority Jira tickets"
"Which GitHub issues have no linked pull requests?"
"Track deployment frequency over time across all repos"

The usual approach:
Write custom scripts for each API, export to JSON/CSV, wrangle in pandas, hope nothing breaks.

The surveilr approach:
Turn everything into standard SQLite tables you can query with SQL—forever.

What surveilr Does

surveilr ingests data from APIs using Singer taps—Python scripts that follow the Singer protocol.

There are 600+ pre-built Singer taps for:

GitHub, GitLab, Bitbucket
Jira, Linear, Asana
Salesforce, HubSpot, Zendesk
Stripe, Shopify, QuickBooks
Postgres, MySQL, MongoDB
Google Analytics, Mixpanel
...and hundreds more

surveilr executes these taps and transforms their output into queryable SQL tables.

How It Works: Singer Taps

Singer taps are Python scripts that output JSONL (JSON Lines):

{"type": "SCHEMA", "stream": "commits", "schema": {...}}
{"type": "RECORD", "stream": "commits", "record": {"sha": "abc123", "author": "alice", ...}}
{"type": "RECORD", "stream": "commits", "record": {"sha": "def456", "author": "bob", ...}}
{"type": "STATE", "value": {"bookmark": "2024-05-12"}}

surveilr reads this output and creates SQL tables automatically.

You don't write SQL schemas. surveilr infers them from the Singer output.

Step 1: Find or Create a Singer Tap

Option 1: Use an Existing Tap

Browse Singer Hub for 600+ taps:

tap-github — GitHub repos, issues, commits, PRs
tap-gitlab — GitLab projects, merge requests, pipelines
tap-jira — Jira issues, sprints, users
tap-postgres — Extract from Postgres databases
tap-stripe — Payments, customers, subscriptions

Install with pip:

pip install tap-github

Option 2: Write Your Own

Singer taps are just Python scripts. Here's a minimal example:

#!/usr/bin/env python3
import json
import sys

# Print schema
print(json.dumps({
    "type": "SCHEMA",
    "stream": "users",
    "schema": {"properties": {"id": {"type": "integer"}, "name": {"type": "string"}}}
}))

# Print records
for user in fetch_users_from_api():
    print(json.dumps({
        "type": "RECORD",
        "stream": "users",
        "record": {"id": user.id, "name": user.name}
    }))

That's it. surveilr handles the rest.

Step 2: Create a Capturable Executable

surveilr runs Singer taps as capturable executables—scripts with a special naming pattern:

<name>.surveilr[singer].<extension>

Examples:

github.surveilr[singer].py
jira.surveilr[singer].sh
stripe.surveilr[singer].js

The [singer] marker tells surveilr: "This script outputs Singer format."

Example: GitHub Tap Script

#!/usr/bin/env python3
import subprocess
import os
import json

# Configuration
config = {
    "access_token": os.getenv("GITHUB_ACCESS_TOKEN"),
    "repository": os.getenv("GITHUB_REPOSITORY"),  # e.g., "owner/repo"
    "start_date": os.getenv("GITHUB_START_DATE", "2024-01-01T00:00:00Z")
}

# Run the Singer tap
subprocess.run([
    "tap-github",
    "--config", "-"
], input=json.dumps(config), text=True)

Save as github.surveilr[singer].py and make it executable:

chmod +x github.surveilr[singer].py

Step 3: Set Environment Variables

export GITHUB_ACCESS_TOKEN="ghp_yourtoken"
export GITHUB_REPOSITORY="microsoft/vscode"
export GITHUB_START_DATE="2024-01-01T00:00:00Z"

Step 4: Initialize Database

surveilr admin init -d project.db

Standard SQLite database.

Step 5: Ingest the Singer Tap

surveilr ingest files -r ./github.surveilr[singer].py -d project.db

surveilr:

Executes the script
Reads the Singer JSONL output
Stores raw data in uniform_resource

Step 6: Transform to SQL Views

surveilr orchestrate adapt-singer -d project.db --stream-prefix github_

This creates SQL views like:

github_commits
github_issues
github_pull_requests
github_users

Now you can query them with standard SQL.

Step 7: Query Your Data

surveilr shell -d project.db

Find all commits from the last 7 days

SELECT
    commit_sha,
    author,
    message,
    timestamp
FROM github_commits
WHERE timestamp > datetime('now', '-7 days')
ORDER BY timestamp DESC;

Track commit activity by author

SELECT
    author,
    COUNT(*) AS commits,
    MIN(timestamp) AS first_commit,
    MAX(timestamp) AS last_commit
FROM github_commits
WHERE timestamp > '2024-01-01'
GROUP BY author
ORDER BY commits DESC;

Find open issues without pull requests

SELECT
    i.number,
    i.title,
    i.created_at
FROM github_issues i
LEFT JOIN github_pull_requests pr
  ON i.number = pr.issue_number
WHERE i.state = 'open'
  AND pr.number IS NULL
ORDER BY i.created_at DESC;

Cross-platform query: Jira issues + GitHub commits

SELECT
    j.key AS jira_ticket,
    j.summary,
    c.commit_sha,
    c.message,
    c.timestamp
FROM jira_issues j
JOIN github_commits c
  ON c.message LIKE '%' || j.key || '%'
WHERE j.status = 'Done'
ORDER BY c.timestamp DESC;

This is impossible with platform APIs alone.

Why Not Just Use API Clients?

You could write Python scripts with requests or use platform SDKs.

But:

API clients don't give you SQL.
You're stuck with JSON responses you have to parse.

surveilr outputs queryable tables.
Standard SQL from day one.

APIs have rate limits.
Hitting them repeatedly during analysis is painful.

surveilr stores data locally.
Query as much as you want, no API calls.

Custom scripts don't compose.
How do you join GitHub data with Jira data in your script?

surveilr stores everything in one database.
Cross-platform joins just work.

Scripts don't track history.
You get a snapshot, then it's gone.

surveilr supports incremental updates.
Singer taps use STATE messages to bookmark progress.

Forensic Project Analysis

Once your project data is queryable, you can investigate patterns:

Find commits made during incident windows

SELECT commit_sha, author, message, timestamp
FROM github_commits
WHERE timestamp BETWEEN '2024-05-10 14:00' AND '2024-05-10 16:00'
ORDER BY timestamp;

Track issue resolution time

SELECT
    key,
    summary,
    julianday(resolved_at) - julianday(created_at) AS days_to_resolve
FROM jira_issues
WHERE status = 'Done'
ORDER BY days_to_resolve DESC;

Find high-priority issues that took too long

SELECT
    key,
    summary,
    priority,
    created_at,
    resolved_at
FROM jira_issues
WHERE priority IN ('High', 'Critical')
  AND julianday(resolved_at) - julianday(created_at) > 30
ORDER BY created_at DESC;

These insights are buried in platform UIs. SQL makes them visible.

Real-World Uses

For Developers

Track your own commit patterns
Find issues assigned to you across platforms
Identify repos with low test coverage
Build custom dashboards with Datasette

For Engineering Managers

Calculate team velocity over time
Track deployment frequency
Analyze code review turnaround
Identify process bottlenecks

For Product Teams

Link customer requests to shipped features
Track feature adoption via API usage
Measure time from idea to deployment
Build product analytics pipelines

For Compliance (Oh, By the Way)

SOC 2: Show complete change management history
Change Control: Link every production change to an approved ticket
Audit Trails: Provide queryable evidence of who changed what when
Separation of Duties: Prove different people committed vs. approved

But the real value is permanent, queryable operational data.

Combine Data Sources

The power comes from joining data across platforms.

GitHub + Jira: Find unlinked work

SELECT
    c.commit_sha,
    c.message,
    c.author
FROM github_commits c
WHERE NOT EXISTS (
    SELECT 1 FROM jira_issues j
    WHERE c.message LIKE '%' || j.key || '%'
)
ORDER BY c.timestamp DESC;

Stripe + Salesforce: Match payments to deals

SELECT
    s.opportunity_name,
    p.amount,
    p.created_at AS payment_date
FROM salesforce_opportunities s
JOIN stripe_payments p
  ON s.stripe_customer_id = p.customer_id
WHERE p.status = 'succeeded'
ORDER BY p.created_at DESC;

This is why SQLite matters—everything is in one database.

Automate Daily Syncs

Create a script to keep your database up to date:

#!/bin/bash
# daily-project-sync.sh

DB="/secure/project-tracking.db"

# Ingest GitHub
surveilr ingest files -r ./github.surveilr[singer].py -d "$DB"

# Ingest Jira
surveilr ingest files -r ./jira.surveilr[singer].py -d "$DB"

# Transform to views
surveilr orchestrate adapt-singer -d "$DB" --stream-prefix github_
surveilr orchestrate adapt-singer -d "$DB" --stream-prefix jira_

echo "✓ Project data sync completed: $(date)"

Run it daily, and your database accumulates history automatically.

Open the Database in Other Tools

Because it's SQLite, you can use any SQLite tool:

Datasette — Instant web UI for your project data
Metabase — Build team dashboards
DuckDB — Fast analytics on SQLite files
Python pandas — pd.read_sql("SELECT * FROM github_commits", conn)
Grafana — Visualize metrics over time
Observable — Create interactive notebooks

You own the database. Analyze it however you want.

Troubleshooting

Tap script not executing?

Make sure it's executable: chmod +x script.surveilr[singer].py
Check that it outputs valid JSONL to stdout

No tables created after adapt-singer?

Verify the stream prefix matches: --stream-prefix github_
Check that the Singer tap output SCHEMA messages

"Module not found" errors?

Install the Singer tap: pip install tap-github
Make sure it's in your PATH

What's Next?

You just turned GitHub, Jira, or other APIs into queryable SQL tables.

Now combine with other data sources:

Query Your File System — Join commits with file changes
Turn Email Into SQL — Correlate issues with email threads

Or explore more:

surveilr.com — Full documentation
Singer Hub — 600+ pre-built taps
Singer Spec — Build custom taps

The Bottom Line

API data shouldn't be trapped in JSON responses.

Platforms give you REST APIs and pagination. That's it.

surveilr gives you SQL.

Want to know:

What changed in your codebase last quarter?
Which issues took longest to resolve?
How deployment frequency correlates with bug reports?
Which team members are most active?

Just write a query.

Your API data is now a SQLite database you own forever.

No API rate limits. No vendor lock-in. Just standard SQL.

Ready to query your project data? Install surveilr and ingest your first API.

Get Started →

Top comments (2)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.