DEV Community

Cover image for How I Built an Automated JS/TS Repository Analyzer in C#
David Arno
David Arno

Posted on

How I Built an Automated JS/TS Repository Analyzer in C#

TL;DR

I built the JavaScript/TypeScript analysis engine for the Silverfish IDP — an Internal Developer Portal that automatically detects packaging tools, identifies component types, and extracts complete dependency graphs from repos. It handles monorepos, multiple lock file formats, and mixed JS/TS codebases—all without making assumptions about repo structure.

The Problem

At Silverfish Software, we're building an IDP that helps individual developers and engineering teams understand their entire codebase. But when you have hundreds of repositories spanning multiple languages, frameworks, and tools, how do you automatically make sense of it all?

For JavaScript and TypeScript repos specifically, the challenge is significant: every repo is different. Some use Yarn, others npm or pnpm. Some have monorepos with nested package.json files. Some mix JavaScript and TypeScript. Some have multiple lock files checked in (a real mess). And some don't have lock files at all.

I needed an analyzer that could handle all these cases automatically, with no manual configuration. No "please tell us which package manager you use" questions. Just point it at a repo and get back structured metadata about components, dependencies, and versions.

Step 1: Detect the Packaging Tool
The naive approach: Check if yarn.lock exists → use Yarn. Check if package-lock.json exists → use npm.

Reality is messier:

// Priority order matters
1. Check packageManager field in package.json ("yarn@4.1.0")
2. Look for lock files (yarn.lock, pnpm-lock.yaml, package-lock.json, bun.lock)
3. Check config files (.yarnrc.yml, pnpm-workspace.yaml)
4. Default to npm

Enter fullscreen mode Exit fullscreen mode

The packageManager field was the key insight—it's set by corepack and is the source of truth. If it says Yarn, it's Yarn, even if npm somehow created a lock file too.

I also had to handle conflicts: I found real repos with both yarn.lock and package-lock.json checked in. My solution? Detect all of them, report the conflict, and parse only the highest-priority one.

public static async Task<PackagingToolDetectionResult> DetectAsync(
    IReadOnlyCollection<string> repoPaths,
    Func<string, Task<string?>> readFileContentAsync)
{
    // 1. Check packageManager field first
    var fromPackageManager = await TryDetectFromPackageManagerFieldAsync(...);
    if (fromPackageManager is not null) return fromPackageManager;

    // 2. Check lock files
    var fromLockFile = TryDetectFromLockFiles(...);
    if (fromLockFile is not null) return ...;

    // 3. Check config files
    var fromConfigFile = TryDetectFromConfigFiles(...);
    if (fromConfigFile is not null) return ...;

    // 4. Default to npm
    return new(PackagingTool.Npm, true);
}
Enter fullscreen mode Exit fullscreen mode

Result: (PackagingTool.Yarn, LockFileNeedsGenerating: false) or similar.

Step 2: Identify Components and Their Type
Each package.json is a component. But what kind? And what does it do?

I classified each one into: Package (published to npm), Library (internal or private), and determined usage: Frontend, Backend, Fullstack, or Unknown.

The key was looking at dependencies:

static readonly HashSet<string> FrontendSignals = new() 
{ 
    "react", "vue", "@angular/core", "svelte", "react-router", "redux", ...
};

static readonly HashSet<string> BackendSignals = new()
{
    "express", "koa", "mongoose", "pg", "apollo-server", "prisma", ...
};

// If a package depends on react + express = fullstack
// If only react = frontend
// If only express = backend
Enter fullscreen mode Exit fullscreen mode

I also extracted language info:

// Pure JS? Check for no TypeScript signals
// TypeScript? Look for typescript pkg + @types/*
// Mixed? Has flow-bin + typescript OR tsconfig.json's allowJs = true
Enter fullscreen mode Exit fullscreen mode

And pulled in version constraints:

// Node version: from engines.node in package.json or .nvmrc file
// TS version: from devDependencies
// ECMAScript target: from tsconfig.json compilerOptions
Enter fullscreen mode Exit fullscreen mode

Result: A JsComponent record with all metadata attached—used by Silverfish's dashboard to display component details instantly.

Step 3: Parse Lock Files (The Hard Part)
This was the gnarly part. Four different formats, each with quirks.

Yarn Lock (v1 Classic)
Looks like TOML with nested dependency lists:

"@pkgjs/parseargs@^0.11.0":
  version "0.11.0"
  resolved "https://registry.npmjs.org/..."
  dependencies:
    package-json "^6.0.0"
Enter fullscreen mode Exit fullscreen mode

I wrote a line-by-line parser. The trick: track indentation to know when you're inside a package block vs. dependency list.

npm package-lock.json
Flat JSON structure (v2/v3):

{
  "packages": {
    "node_modules/lodash": {
      "version": "4.17.21",
      "dependencies": { ... }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Easier to parse with JsonDocument, but the key names have node_modules/ prefixes that need stripping.

pnpm-lock.yaml
YAML with name@version keys:

packages:
  /lodash/4.17.21:
    version: 4.17.21
    dependencies:
      react: 18.2.0
Enter fullscreen mode Exit fullscreen mode

I treated this as mostly line-based text parsing since I didn't want to add a full YAML dependency. Works for the common cases.

Bun Lock
JSONC format with array-based entries. Least common, so I parse it but mark binary bun.lockb files as unparseable.

Step 4: Resolve Dependencies
Once I had a parsed lock file, I needed to extract:

  • Local dependencies (internal workspace packages like @company/shared)
  • Direct dependencies (what's explicitly in package.json)
  • Transitive dependencies (what your dependencies need)
// Read package.json dependencies
var directRanges = ReadDirectDependencyRanges(packageJsonContent);

// For each direct dep, look it up in the lock file
foreach (var (name, range) in directRanges)
{
    var pkg = Resolve(name, range, parsedLock);
    if (pkg != null)
    {
        // It's resolved to version X.Y.Z
        direct.Add(new ResolvedDependency(pkg.Name, pkg.Version, range));

        // Queue it to traverse its dependencies
        queue.Enqueue(pkg);
    }
}

// Depth-first traversal to collect transitives
while (queue.TryDequeue(out var pkg))
{
    foreach (var (depName, depRange) in pkg.DependencyRanges)
    {
        var dep = Resolve(depName, depRange, parsedLock);
        if (dep != null && !visited.Contains($"{dep.Name}@{dep.Version}"))
        {
            transitive.Add(...);
            queue.Enqueue(dep);
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Result: Three lists of ResolvedDependency objects with exact versions and requested ranges. The Silverfish dashboard uses this to build the full dependency graph in its UI.

Step 5: Handle Monorepos
Monorepos have multiple package.json files. The key insight: walk up the directory tree to find the root lock file.

static IEnumerable<string> AncestorDirs(string dir)
{
    var current = dir;
    while (true)
    {
        yield return current;
        if (string.IsNullOrEmpty(current)) break;
        current = Path.GetDirectoryName(current);
    }
}
Enter fullscreen mode Exit fullscreen mode

So packages/web/package.json in an entria-style monorepo correctly finds the root yarn.lock instead of failing. Each workspace member gets its own component record in Silverfish.

How Silverfish Uses This
Once the analyzer extracts all this metadata, Silverfish:

  • Maps dependencies visually — showing which components depend on what Flags version mismatches — when different packages pin different versions of the same library
  • Detects tech stacks — knowing which services are frontend, which are backend, which databases they use
  • Tracks upgrades — identifying outdated packages and planning coordinated updates
  • Enables governance — enforcing policies like "no direct jquery dependencies" or "all frontends must use React 18+"

Lessons Learned

  • Format-specific parsing is worth it: I could have given up on Yarn/pnpm/Bun and only parsed npm lock files. But each format's parser is ~100-150 lines and handles real repos that exist in the wild.

  • Conflicts are data, not errors: Instead of failing when I find multiple lock files, I report them. That's valuable information ("why do you have both yarn.lock and package-lock.json?").

  • Monorepos are normal: Walking ancestor directories for lock files + detecting internal workspace packages turned out to be essential, not an edge case.

  • Version constraints matter: Storing both the requested range (^1.2.3) and resolved version (1.2.5) proved useful—you can detect upgradeable deps without breaking changes.

What's Next
The JS/TS analyzer is one piece of Silverfish's language support. I'll building similar analyzers for Python, Go, Java, and other ecosystems. The pattern is the same: detect the package manager, identify components, resolve dependencies, extract versions.

If you're trying to understand complex multi-language codebases at scale, this approach should help. The code is C# 14 with only standard library dependencies—no bloat.

Check out the Silverfish Dashboard to see the analyzer in action, or hit me up if you have questions about the implementation.

Top comments (0)