Karlis

Posted on Apr 30

Acacia DB for VS Code: Map your database usage in source code (1.0.0 1.2.0)

#vscode #database #sql #productivity

If you've ever inherited a legacy system and asked "which tables are actually used, where, and how do they connect?" — that's the question Acacia DB tries to answer.

It's a VS Code extension that scans your workspace for references to database tables and columns, then turns the raw matches into something you can navigate, rank, and diagram. No LLMs, no cloud calls — just a deterministic pipeline over your source tree and a tables_views.json schema file.

The last three releases (1.0.0, 1.1.0, 1.2.0) reshaped the extension from a workspace scanner into a small analysis suite. Here's what landed.

1.0.0 — The pipeline rewrite

1.0.0 is the first stable release. The headline change is that the column-relationship analysis pipeline was rewritten end-to-end for both performance and correctness.

Parallel column-link analysis. Column matching now runs in a worker_threads pool (min(cpus, 8) workers), with an in-process async fallback when the worker bundle isn't available. File I/O and matching overlap across cores instead of stalling on a single thread.

Targeted line scanning. Workers no longer read every line of every file. They open only the line numbers already recorded in table_refs.json for each file, so the second pass costs a fraction of the first.

One global matcher instead of many. Previously a fresh column matcher was rebuilt per table and per file. Now a single matcher is built once at the start of analysis and shared across all workers.

Trie-level fast paths.

An ASCII first-character bitset gives O(1) reject for non-candidate characters before any trie walk.
The matcher operates on charCodeAt directly instead of text.split('').
The quote-skip state machine is bypassed on lines with no " or '.
On rejection, the matcher skips the rest of the current word in one pass.

Pre-filter for impossible files. Files whose in-scope tables can't contribute at least two candidate columns are skipped before any I/O.

O(1) link recording. Each LinkedTableInfo now carries a Map<key, ColumnLink> plus per-link/per-info Set<string> file caches, so recordColumnLink is O(1) per call instead of O(k) over the link array and O(n) over the file array.

Correctness fix: case-sensitive matching with shared lowercased prefix. Adding two column names that lowercased to the same string (e.g. "ABC" and "abc") used to clobber each other on the shared trie path — case-sensitive lookup could return the wrong canonical name or no match at all. Each terminal node now stores all case variants, and case-sensitive lookup resolves to the variant that actually appears in the text.

The dead pairwise per-file analyzer and the legacy processLines / processFileStream / processLine helpers were removed; the orchestrator now drives a single unified pipeline.

1.1.0 — Hot-Path analysis and a leaner JSON format

1.1.0 added an insight layer on top of the raw usage index, and reshaped the persisted analysis file.

Hot-Path Analysis

A new module, src/hotPathAnalyzer.ts, exposes pure functions: computeTableMetrics, computeEdgeMetrics, suggestIndexes, suggestCaches, and buildHotPathReport.

Per-table metrics: usageCount, fileFanOut, joinDegreeIn/Out, weighted PageRank centrality, and a tunable hotScore:

$$\text{hotScore} = \log(1 + \text{usageCount}) \cdot (1 + \text{joinDegree}) \cdot \sqrt{\text{fileFanOut}}$$

Per-edge metrics: coOccurrenceCount, linkConfidence, and edgeScore = coOccurrenceCount * linkConfidence.

Suggestions:

Index suggestions for join keys on top-quantile edges, emitted as ready-to-paste DDL: CREATE INDEX IX_<table>_<col> ON <table>(<col>).
Cache suggestions for read-heavy leaf tables (high usage, low join degree).
Cold-table report (usageCount === 0).

A Hot Paths tree view in the sidebar surfaces five sections: Top Hot Tables, Top Join Paths, Index Suggestions, Cache Candidates, and Cold / Unused Tables. Hot-table and edge leaves reveal the related table in the Database Explorer; index suggestions support a Copy DDL context action.

New commands: acacia-db.exportHotPathReport, acacia-db.hotPath.refresh, acacia-db.hotPath.diagnose, acacia-db.hotPath.copyIndexSuggestion.

Configuration lives under acaciaDb.hotPath.*: topN, weights.usage, weights.joins, weights.fanOut, minLinkConfidence, indexSuggestionQuantile, cacheCandidateQuantile, cacheCandidateMaxJoinDegree. Changing any setting refreshes the view without a reload.

Relationship-only filtering

A new filterToRelationshipsOnly setting saves only references that participate in table relationships. On large codebases this drops file size by 80–95% and is enabled by default. Tables with zero references are no longer saved or shown.

The filtering algorithm itself was rewritten: complexity went from roughly O(n² × m²) to O(n×m + f×r log r + f×r×w) by grouping references per file, sorting for early termination on proximity checks, and skipping single-table files. Filtering time on a 477-table codebase dropped from 10–20 s to 0.5–2 s. Results are cached after the first computation, so the JSON export no longer re-runs the filter.

File-based JSON format

The persisted table_refs.json was reorganized from table-keyed to file-keyed: each entry is a file with a list of references (line, column, tableName). The new shape is ~40% smaller, doesn't trip JSON.stringify on very large workspaces, loads faster via direct object access, and diffs cleanly in version control. Relationships are simplified to table pairs and counts.

Tree-view sorting was tightened across all levels — files by reference count, relationship files by instance count, proximity instances by distance (closest first) — while the JSON keeps an alphabetical file order for stable diffs.

1.2.0 — Heatmap and Mermaid ER export

1.2.0 adds two visualization paths over the data Acacia DB already collects.

Database Usage Heatmap

src/heatmapView.ts renders a tables × files matrix in a webview panel:

SVG with a log/linear color-scale toggle and adjustable cell size.
Hover tooltip showing table, file, and reference count.
Click-to-open: jumps to the first line where the table appears in the file.
Row totals on the right and column totals on the bottom as bar sparklines, plus a legend gradient with min/max.
Driven by the in-memory tableUsageMap, with .vscode/table_refs.json as a no-scan fallback.
Truncation warning when the matrix exceeds the configured caps.

Caps are configurable: acaciaDb.heatmap.maxTables (default 50, max 1000) and acaciaDb.heatmap.maxFiles (default 100, max 2000). Tables and files are ranked by total reference count before truncation.

The command acacia-db.showUsageHeatmap is wired into the Database Explorer view title bar.

Mermaid ER Diagram Export

src/mermaidErExport.ts exposes a pure buildMermaidEr function (no vscode import) that turns detected column relationships into a Mermaid ER diagram:

Relationships are deduplicated by unordered table pair, and forward / backward / bidirectional directions are aggregated into a single Mermaid cardinality (}o--||, ||--o{, or }o--o{).
Optional per-entity column blocks pulled from tables_views.json, capped per table for readability.
Identifiers are sanitized for Mermaid's [A-Za-z_][A-Za-z0-9_]* rule; columns named PK / FK / UK are renamed (PK_col, etc.) to avoid colliding with Mermaid's reserved attribute-key tokens.

Three export targets via Quick Pick: open as Markdown preview, save to .vscode/erDiagram.md, or copy Mermaid source to clipboard. The command acacia-db.exportMermaidEr lives on the Column Explorer view title bar.

Configuration: acaciaDb.mermaidEr.maxTables (60), acaciaDb.mermaidEr.omitIsolated (true), acaciaDb.mermaidEr.includeColumns (true), acaciaDb.mermaidEr.maxColumnsPerTable (20).

Try it

Marketplace: manacacia.acacia-db
Repo: github.com/AcaciaMan/acacia-db

Point it at a tables_views.json and a source folder, run the analysis, and you'll get a usage tree, a relationship graph, hot-path suggestions, an ER diagram, and a heatmap — all from the same scan, all stored locally in .vscode/.

Feedback and issues welcome.

DEV Community