DEV Community

Mukhtar
Mukhtar

Posted on

Turn Any API Into a SQL Database

GitHub, Jira, Salesforce, Stripe—600+ sources → SQLite


The 2-Minute Version:

# Install surveilr
brew tap surveilr/tap && brew install surveilr

# Create a Singer tap script (see below for example)
# then ingest it
surveilr admin init -d project.db
surveilr ingest files -r ./github.surveilr[singer].py -d project.db
surveilr orchestrate adapt-singer -d project.db --stream-prefix github_

# Query it
surveilr shell -d project.db
Enter fullscreen mode Exit fullscreen mode
-- Query GitHub commits from the last 7 days
SELECT author, message, timestamp
FROM github_commits
WHERE timestamp > datetime('now', '-7 days')
ORDER BY timestamp DESC;
Enter fullscreen mode Exit fullscreen mode

Your API data is now SQL. Query it like any other database.


The Problem with API Data

You've got project data scattered across platforms:

  • GitHub — Issues, commits, pull requests
  • Jira — Tickets, sprints, workflows
  • GitLab — Merge requests, pipelines
  • Salesforce — CRM data, opportunities
  • Stripe — Payments, subscriptions

Each has its own API. Each has rate limits. Each has different authentication.

And you need to answer questions like:

  • "Show me all commits related to high-priority Jira tickets"
  • "Which GitHub issues have no linked pull requests?"
  • "Track deployment frequency over time across all repos"

The usual approach:
Write custom scripts for each API, export to JSON/CSV, wrangle in pandas, hope nothing breaks.

The surveilr approach:
Turn everything into standard SQLite tables you can query with SQL—forever.


What surveilr Does

surveilr ingests data from APIs using Singer taps—Python scripts that follow the Singer protocol.

There are 600+ pre-built Singer taps for:

  • GitHub, GitLab, Bitbucket
  • Jira, Linear, Asana
  • Salesforce, HubSpot, Zendesk
  • Stripe, Shopify, QuickBooks
  • Postgres, MySQL, MongoDB
  • Google Analytics, Mixpanel
  • ...and hundreds more

surveilr executes these taps and transforms their output into queryable SQL tables.


How It Works: Singer Taps

Singer taps are Python scripts that output JSONL (JSON Lines):

{"type": "SCHEMA", "stream": "commits", "schema": {...}}
{"type": "RECORD", "stream": "commits", "record": {"sha": "abc123", "author": "alice", ...}}
{"type": "RECORD", "stream": "commits", "record": {"sha": "def456", "author": "bob", ...}}
{"type": "STATE", "value": {"bookmark": "2024-05-12"}}
Enter fullscreen mode Exit fullscreen mode

surveilr reads this output and creates SQL tables automatically.

You don't write SQL schemas. surveilr infers them from the Singer output.


Step 1: Find or Create a Singer Tap

Option 1: Use an Existing Tap

Browse Singer Hub for 600+ taps:

  • tap-github — GitHub repos, issues, commits, PRs
  • tap-gitlab — GitLab projects, merge requests, pipelines
  • tap-jira — Jira issues, sprints, users
  • tap-postgres — Extract from Postgres databases
  • tap-stripe — Payments, customers, subscriptions

Install with pip:

pip install tap-github
Enter fullscreen mode Exit fullscreen mode

Option 2: Write Your Own

Singer taps are just Python scripts. Here's a minimal example:

#!/usr/bin/env python3
import json
import sys

# Print schema
print(json.dumps({
    "type": "SCHEMA",
    "stream": "users",
    "schema": {"properties": {"id": {"type": "integer"}, "name": {"type": "string"}}}
}))

# Print records
for user in fetch_users_from_api():
    print(json.dumps({
        "type": "RECORD",
        "stream": "users",
        "record": {"id": user.id, "name": user.name}
    }))
Enter fullscreen mode Exit fullscreen mode

That's it. surveilr handles the rest.


Step 2: Create a Capturable Executable

surveilr runs Singer taps as capturable executables—scripts with a special naming pattern:

<name>.surveilr[singer].<extension>
Enter fullscreen mode Exit fullscreen mode

Examples:

  • github.surveilr[singer].py
  • jira.surveilr[singer].sh
  • stripe.surveilr[singer].js

The [singer] marker tells surveilr: "This script outputs Singer format."

Example: GitHub Tap Script

#!/usr/bin/env python3
import subprocess
import os
import json

# Configuration
config = {
    "access_token": os.getenv("GITHUB_ACCESS_TOKEN"),
    "repository": os.getenv("GITHUB_REPOSITORY"),  # e.g., "owner/repo"
    "start_date": os.getenv("GITHUB_START_DATE", "2024-01-01T00:00:00Z")
}

# Run the Singer tap
subprocess.run([
    "tap-github",
    "--config", "-"
], input=json.dumps(config), text=True)
Enter fullscreen mode Exit fullscreen mode

Save as github.surveilr[singer].py and make it executable:

chmod +x github.surveilr[singer].py
Enter fullscreen mode Exit fullscreen mode

Step 3: Set Environment Variables

export GITHUB_ACCESS_TOKEN="ghp_yourtoken"
export GITHUB_REPOSITORY="microsoft/vscode"
export GITHUB_START_DATE="2024-01-01T00:00:00Z"
Enter fullscreen mode Exit fullscreen mode

Step 4: Initialize Database

surveilr admin init -d project.db
Enter fullscreen mode Exit fullscreen mode

Standard SQLite database.


Step 5: Ingest the Singer Tap

surveilr ingest files -r ./github.surveilr[singer].py -d project.db
Enter fullscreen mode Exit fullscreen mode

surveilr:

  1. Executes the script
  2. Reads the Singer JSONL output
  3. Stores raw data in uniform_resource

Step 6: Transform to SQL Views

surveilr orchestrate adapt-singer -d project.db --stream-prefix github_
Enter fullscreen mode Exit fullscreen mode

This creates SQL views like:

  • github_commits
  • github_issues
  • github_pull_requests
  • github_users

Now you can query them with standard SQL.


Step 7: Query Your Data

surveilr shell -d project.db
Enter fullscreen mode Exit fullscreen mode

Find all commits from the last 7 days

SELECT
    commit_sha,
    author,
    message,
    timestamp
FROM github_commits
WHERE timestamp > datetime('now', '-7 days')
ORDER BY timestamp DESC;
Enter fullscreen mode Exit fullscreen mode

Track commit activity by author

SELECT
    author,
    COUNT(*) AS commits,
    MIN(timestamp) AS first_commit,
    MAX(timestamp) AS last_commit
FROM github_commits
WHERE timestamp > '2024-01-01'
GROUP BY author
ORDER BY commits DESC;
Enter fullscreen mode Exit fullscreen mode

Find open issues without pull requests

SELECT
    i.number,
    i.title,
    i.created_at
FROM github_issues i
LEFT JOIN github_pull_requests pr
  ON i.number = pr.issue_number
WHERE i.state = 'open'
  AND pr.number IS NULL
ORDER BY i.created_at DESC;
Enter fullscreen mode Exit fullscreen mode

Cross-platform query: Jira issues + GitHub commits

SELECT
    j.key AS jira_ticket,
    j.summary,
    c.commit_sha,
    c.message,
    c.timestamp
FROM jira_issues j
JOIN github_commits c
  ON c.message LIKE '%' || j.key || '%'
WHERE j.status = 'Done'
ORDER BY c.timestamp DESC;
Enter fullscreen mode Exit fullscreen mode

This is impossible with platform APIs alone.


Why Not Just Use API Clients?

You could write Python scripts with requests or use platform SDKs.

But:

API clients don't give you SQL.
You're stuck with JSON responses you have to parse.

surveilr outputs queryable tables.
Standard SQL from day one.

APIs have rate limits.
Hitting them repeatedly during analysis is painful.

surveilr stores data locally.
Query as much as you want, no API calls.

Custom scripts don't compose.
How do you join GitHub data with Jira data in your script?

surveilr stores everything in one database.
Cross-platform joins just work.

Scripts don't track history.
You get a snapshot, then it's gone.

surveilr supports incremental updates.
Singer taps use STATE messages to bookmark progress.


Forensic Project Analysis

Once your project data is queryable, you can investigate patterns:

Find commits made during incident windows

SELECT commit_sha, author, message, timestamp
FROM github_commits
WHERE timestamp BETWEEN '2024-05-10 14:00' AND '2024-05-10 16:00'
ORDER BY timestamp;
Enter fullscreen mode Exit fullscreen mode

Track issue resolution time

SELECT
    key,
    summary,
    julianday(resolved_at) - julianday(created_at) AS days_to_resolve
FROM jira_issues
WHERE status = 'Done'
ORDER BY days_to_resolve DESC;
Enter fullscreen mode Exit fullscreen mode

Find high-priority issues that took too long

SELECT
    key,
    summary,
    priority,
    created_at,
    resolved_at
FROM jira_issues
WHERE priority IN ('High', 'Critical')
  AND julianday(resolved_at) - julianday(created_at) > 30
ORDER BY created_at DESC;
Enter fullscreen mode Exit fullscreen mode

These insights are buried in platform UIs. SQL makes them visible.


Real-World Uses

For Developers

  • Track your own commit patterns
  • Find issues assigned to you across platforms
  • Identify repos with low test coverage
  • Build custom dashboards with Datasette

For Engineering Managers

  • Calculate team velocity over time
  • Track deployment frequency
  • Analyze code review turnaround
  • Identify process bottlenecks

For Product Teams

  • Link customer requests to shipped features
  • Track feature adoption via API usage
  • Measure time from idea to deployment
  • Build product analytics pipelines

For Compliance (Oh, By the Way)

  • SOC 2: Show complete change management history
  • Change Control: Link every production change to an approved ticket
  • Audit Trails: Provide queryable evidence of who changed what when
  • Separation of Duties: Prove different people committed vs. approved

But the real value is permanent, queryable operational data.


Combine Data Sources

The power comes from joining data across platforms.

GitHub + Jira: Find unlinked work

SELECT
    c.commit_sha,
    c.message,
    c.author
FROM github_commits c
WHERE NOT EXISTS (
    SELECT 1 FROM jira_issues j
    WHERE c.message LIKE '%' || j.key || '%'
)
ORDER BY c.timestamp DESC;
Enter fullscreen mode Exit fullscreen mode

Stripe + Salesforce: Match payments to deals

SELECT
    s.opportunity_name,
    p.amount,
    p.created_at AS payment_date
FROM salesforce_opportunities s
JOIN stripe_payments p
  ON s.stripe_customer_id = p.customer_id
WHERE p.status = 'succeeded'
ORDER BY p.created_at DESC;
Enter fullscreen mode Exit fullscreen mode

This is why SQLite matters—everything is in one database.


Automate Daily Syncs

Create a script to keep your database up to date:

#!/bin/bash
# daily-project-sync.sh

DB="/secure/project-tracking.db"

# Ingest GitHub
surveilr ingest files -r ./github.surveilr[singer].py -d "$DB"

# Ingest Jira
surveilr ingest files -r ./jira.surveilr[singer].py -d "$DB"

# Transform to views
surveilr orchestrate adapt-singer -d "$DB" --stream-prefix github_
surveilr orchestrate adapt-singer -d "$DB" --stream-prefix jira_

echo "✓ Project data sync completed: $(date)"
Enter fullscreen mode Exit fullscreen mode

Run it daily, and your database accumulates history automatically.


Open the Database in Other Tools

Because it's SQLite, you can use any SQLite tool:

  • Datasette — Instant web UI for your project data
  • Metabase — Build team dashboards
  • DuckDB — Fast analytics on SQLite files
  • Python pandaspd.read_sql("SELECT * FROM github_commits", conn)
  • Grafana — Visualize metrics over time
  • Observable — Create interactive notebooks

You own the database. Analyze it however you want.


Troubleshooting

Tap script not executing?

  • Make sure it's executable: chmod +x script.surveilr[singer].py
  • Check that it outputs valid JSONL to stdout

No tables created after adapt-singer?

  • Verify the stream prefix matches: --stream-prefix github_
  • Check that the Singer tap output SCHEMA messages

"Module not found" errors?

  • Install the Singer tap: pip install tap-github
  • Make sure it's in your PATH

What's Next?

You just turned GitHub, Jira, or other APIs into queryable SQL tables.

Now combine with other data sources:

Or explore more:


The Bottom Line

API data shouldn't be trapped in JSON responses.

Platforms give you REST APIs and pagination. That's it.

surveilr gives you SQL.

Want to know:

  • What changed in your codebase last quarter?
  • Which issues took longest to resolve?
  • How deployment frequency correlates with bug reports?
  • Which team members are most active?

Just write a query.

Your API data is now a SQLite database you own forever.

No API rate limits. No vendor lock-in. Just standard SQL.


Ready to query your project data? Install surveilr and ingest your first API.

Get Started →

Top comments (0)