GitHub, Jira, Salesforce, Stripe—600+ sources → SQLite
The 2-Minute Version:
# Install surveilr
brew tap surveilr/tap && brew install surveilr
# Create a Singer tap script (see below for example)
# then ingest it
surveilr admin init -d project.db
surveilr ingest files -r ./github.surveilr[singer].py -d project.db
surveilr orchestrate adapt-singer -d project.db --stream-prefix github_
# Query it
surveilr shell -d project.db
-- Query GitHub commits from the last 7 days
SELECT author, message, timestamp
FROM github_commits
WHERE timestamp > datetime('now', '-7 days')
ORDER BY timestamp DESC;
Your API data is now SQL. Query it like any other database.
The Problem with API Data
You've got project data scattered across platforms:
- GitHub — Issues, commits, pull requests
- Jira — Tickets, sprints, workflows
- GitLab — Merge requests, pipelines
- Salesforce — CRM data, opportunities
- Stripe — Payments, subscriptions
Each has its own API. Each has rate limits. Each has different authentication.
And you need to answer questions like:
- "Show me all commits related to high-priority Jira tickets"
- "Which GitHub issues have no linked pull requests?"
- "Track deployment frequency over time across all repos"
The usual approach:
Write custom scripts for each API, export to JSON/CSV, wrangle in pandas, hope nothing breaks.
The surveilr approach:
Turn everything into standard SQLite tables you can query with SQL—forever.
What surveilr Does
surveilr ingests data from APIs using Singer taps—Python scripts that follow the Singer protocol.
There are 600+ pre-built Singer taps for:
- GitHub, GitLab, Bitbucket
- Jira, Linear, Asana
- Salesforce, HubSpot, Zendesk
- Stripe, Shopify, QuickBooks
- Postgres, MySQL, MongoDB
- Google Analytics, Mixpanel
- ...and hundreds more
surveilr executes these taps and transforms their output into queryable SQL tables.
How It Works: Singer Taps
Singer taps are Python scripts that output JSONL (JSON Lines):
{"type": "SCHEMA", "stream": "commits", "schema": {...}}
{"type": "RECORD", "stream": "commits", "record": {"sha": "abc123", "author": "alice", ...}}
{"type": "RECORD", "stream": "commits", "record": {"sha": "def456", "author": "bob", ...}}
{"type": "STATE", "value": {"bookmark": "2024-05-12"}}
surveilr reads this output and creates SQL tables automatically.
You don't write SQL schemas. surveilr infers them from the Singer output.
Step 1: Find or Create a Singer Tap
Option 1: Use an Existing Tap
Browse Singer Hub for 600+ taps:
-
tap-github— GitHub repos, issues, commits, PRs -
tap-gitlab— GitLab projects, merge requests, pipelines -
tap-jira— Jira issues, sprints, users -
tap-postgres— Extract from Postgres databases -
tap-stripe— Payments, customers, subscriptions
Install with pip:
pip install tap-github
Option 2: Write Your Own
Singer taps are just Python scripts. Here's a minimal example:
#!/usr/bin/env python3
import json
import sys
# Print schema
print(json.dumps({
"type": "SCHEMA",
"stream": "users",
"schema": {"properties": {"id": {"type": "integer"}, "name": {"type": "string"}}}
}))
# Print records
for user in fetch_users_from_api():
print(json.dumps({
"type": "RECORD",
"stream": "users",
"record": {"id": user.id, "name": user.name}
}))
That's it. surveilr handles the rest.
Step 2: Create a Capturable Executable
surveilr runs Singer taps as capturable executables—scripts with a special naming pattern:
<name>.surveilr[singer].<extension>
Examples:
github.surveilr[singer].pyjira.surveilr[singer].shstripe.surveilr[singer].js
The [singer] marker tells surveilr: "This script outputs Singer format."
Example: GitHub Tap Script
#!/usr/bin/env python3
import subprocess
import os
import json
# Configuration
config = {
"access_token": os.getenv("GITHUB_ACCESS_TOKEN"),
"repository": os.getenv("GITHUB_REPOSITORY"), # e.g., "owner/repo"
"start_date": os.getenv("GITHUB_START_DATE", "2024-01-01T00:00:00Z")
}
# Run the Singer tap
subprocess.run([
"tap-github",
"--config", "-"
], input=json.dumps(config), text=True)
Save as github.surveilr[singer].py and make it executable:
chmod +x github.surveilr[singer].py
Step 3: Set Environment Variables
export GITHUB_ACCESS_TOKEN="ghp_yourtoken"
export GITHUB_REPOSITORY="microsoft/vscode"
export GITHUB_START_DATE="2024-01-01T00:00:00Z"
Step 4: Initialize Database
surveilr admin init -d project.db
Standard SQLite database.
Step 5: Ingest the Singer Tap
surveilr ingest files -r ./github.surveilr[singer].py -d project.db
surveilr:
- Executes the script
- Reads the Singer JSONL output
- Stores raw data in
uniform_resource
Step 6: Transform to SQL Views
surveilr orchestrate adapt-singer -d project.db --stream-prefix github_
This creates SQL views like:
github_commitsgithub_issuesgithub_pull_requestsgithub_users
Now you can query them with standard SQL.
Step 7: Query Your Data
surveilr shell -d project.db
Find all commits from the last 7 days
SELECT
commit_sha,
author,
message,
timestamp
FROM github_commits
WHERE timestamp > datetime('now', '-7 days')
ORDER BY timestamp DESC;
Track commit activity by author
SELECT
author,
COUNT(*) AS commits,
MIN(timestamp) AS first_commit,
MAX(timestamp) AS last_commit
FROM github_commits
WHERE timestamp > '2024-01-01'
GROUP BY author
ORDER BY commits DESC;
Find open issues without pull requests
SELECT
i.number,
i.title,
i.created_at
FROM github_issues i
LEFT JOIN github_pull_requests pr
ON i.number = pr.issue_number
WHERE i.state = 'open'
AND pr.number IS NULL
ORDER BY i.created_at DESC;
Cross-platform query: Jira issues + GitHub commits
SELECT
j.key AS jira_ticket,
j.summary,
c.commit_sha,
c.message,
c.timestamp
FROM jira_issues j
JOIN github_commits c
ON c.message LIKE '%' || j.key || '%'
WHERE j.status = 'Done'
ORDER BY c.timestamp DESC;
This is impossible with platform APIs alone.
Why Not Just Use API Clients?
You could write Python scripts with requests or use platform SDKs.
But:
API clients don't give you SQL.
You're stuck with JSON responses you have to parse.
surveilr outputs queryable tables.
Standard SQL from day one.
APIs have rate limits.
Hitting them repeatedly during analysis is painful.
surveilr stores data locally.
Query as much as you want, no API calls.
Custom scripts don't compose.
How do you join GitHub data with Jira data in your script?
surveilr stores everything in one database.
Cross-platform joins just work.
Scripts don't track history.
You get a snapshot, then it's gone.
surveilr supports incremental updates.
Singer taps use STATE messages to bookmark progress.
Forensic Project Analysis
Once your project data is queryable, you can investigate patterns:
Find commits made during incident windows
SELECT commit_sha, author, message, timestamp
FROM github_commits
WHERE timestamp BETWEEN '2024-05-10 14:00' AND '2024-05-10 16:00'
ORDER BY timestamp;
Track issue resolution time
SELECT
key,
summary,
julianday(resolved_at) - julianday(created_at) AS days_to_resolve
FROM jira_issues
WHERE status = 'Done'
ORDER BY days_to_resolve DESC;
Find high-priority issues that took too long
SELECT
key,
summary,
priority,
created_at,
resolved_at
FROM jira_issues
WHERE priority IN ('High', 'Critical')
AND julianday(resolved_at) - julianday(created_at) > 30
ORDER BY created_at DESC;
These insights are buried in platform UIs. SQL makes them visible.
Real-World Uses
For Developers
- Track your own commit patterns
- Find issues assigned to you across platforms
- Identify repos with low test coverage
- Build custom dashboards with Datasette
For Engineering Managers
- Calculate team velocity over time
- Track deployment frequency
- Analyze code review turnaround
- Identify process bottlenecks
For Product Teams
- Link customer requests to shipped features
- Track feature adoption via API usage
- Measure time from idea to deployment
- Build product analytics pipelines
For Compliance (Oh, By the Way)
- SOC 2: Show complete change management history
- Change Control: Link every production change to an approved ticket
- Audit Trails: Provide queryable evidence of who changed what when
- Separation of Duties: Prove different people committed vs. approved
But the real value is permanent, queryable operational data.
Combine Data Sources
The power comes from joining data across platforms.
GitHub + Jira: Find unlinked work
SELECT
c.commit_sha,
c.message,
c.author
FROM github_commits c
WHERE NOT EXISTS (
SELECT 1 FROM jira_issues j
WHERE c.message LIKE '%' || j.key || '%'
)
ORDER BY c.timestamp DESC;
Stripe + Salesforce: Match payments to deals
SELECT
s.opportunity_name,
p.amount,
p.created_at AS payment_date
FROM salesforce_opportunities s
JOIN stripe_payments p
ON s.stripe_customer_id = p.customer_id
WHERE p.status = 'succeeded'
ORDER BY p.created_at DESC;
This is why SQLite matters—everything is in one database.
Automate Daily Syncs
Create a script to keep your database up to date:
#!/bin/bash
# daily-project-sync.sh
DB="/secure/project-tracking.db"
# Ingest GitHub
surveilr ingest files -r ./github.surveilr[singer].py -d "$DB"
# Ingest Jira
surveilr ingest files -r ./jira.surveilr[singer].py -d "$DB"
# Transform to views
surveilr orchestrate adapt-singer -d "$DB" --stream-prefix github_
surveilr orchestrate adapt-singer -d "$DB" --stream-prefix jira_
echo "✓ Project data sync completed: $(date)"
Run it daily, and your database accumulates history automatically.
Open the Database in Other Tools
Because it's SQLite, you can use any SQLite tool:
- Datasette — Instant web UI for your project data
- Metabase — Build team dashboards
- DuckDB — Fast analytics on SQLite files
-
Python pandas —
pd.read_sql("SELECT * FROM github_commits", conn) - Grafana — Visualize metrics over time
- Observable — Create interactive notebooks
You own the database. Analyze it however you want.
Troubleshooting
Tap script not executing?
- Make sure it's executable:
chmod +x script.surveilr[singer].py - Check that it outputs valid JSONL to stdout
No tables created after adapt-singer?
- Verify the stream prefix matches:
--stream-prefix github_ - Check that the Singer tap output SCHEMA messages
"Module not found" errors?
- Install the Singer tap:
pip install tap-github - Make sure it's in your PATH
What's Next?
You just turned GitHub, Jira, or other APIs into queryable SQL tables.
Now combine with other data sources:
- Query Your File System — Join commits with file changes
- Turn Email Into SQL — Correlate issues with email threads
Or explore more:
- surveilr.com — Full documentation
- Singer Hub — 600+ pre-built taps
- Singer Spec — Build custom taps
The Bottom Line
API data shouldn't be trapped in JSON responses.
Platforms give you REST APIs and pagination. That's it.
surveilr gives you SQL.
Want to know:
- What changed in your codebase last quarter?
- Which issues took longest to resolve?
- How deployment frequency correlates with bug reports?
- Which team members are most active?
Just write a query.
Your API data is now a SQLite database you own forever.
No API rate limits. No vendor lock-in. Just standard SQL.
Ready to query your project data? Install surveilr and ingest your first API.
Top comments (0)