DEV Community

Cover image for I Detect 26 Frameworks Without AI. Here's How Deterministic File-Based Detection Works.
Jonathan Pitter
Jonathan Pitter

Posted on

I Detect 26 Frameworks Without AI. Here's How Deterministic File-Based Detection Works.

When I built Staxa, a platform that deploys isolated customer environments from source code, I needed to answer a deceptively hard question.

"The user pushed a GitHub repo with no Dockerfile. What framework is this, and how do I containerize it?"

The "easy" answer is to "call an LLM." Send the file listing, maybe the first 50 lines of the main config file, and ask it to identify the framework and generate a Dockerfile.

I tried this approach early on and rejected it. Here's why, and what I built instead.

Why Not an LLM?

Three reasons, all deal-breakers for a deployment pipeline:

Latency. Staxa promises a running tenant environment in ~60 seconds. That's 7 pipeline stages: provision, detect, generate, build, deploy, route, health check. An LLM API call adds 2-5 seconds of latency to the detection stage alone. On a bad day, you're waiting for rate limits or retries. For something that should take milliseconds, that's unacceptable.

Cost at scale. Every deployment triggers a detection. At scale, thousands of API calls per month for a task that's fundamentally pattern matching. The cost scales linearly with deployments for no good reason.

External dependency in a critical path. If the LLM provider has an outage, your deployment pipeline stops. For a platform that automates infrastructure provisioning, adding a third-party API call to the critical path is a liability. The platform is designed to minimize external dependencies, the fewer things that can go down, the more reliable the pipeline.

Deterministic file-based detection solves all three: sub-millisecond execution, zero marginal cost, zero external dependencies.

The Detection System: Three Phases

The FrameworkDetector receives a cloned repo directory and returns a DetectResult:

type DetectResult struct {
    Framework  string            // "nextjs", "django", "go", etc.
    Confidence string            // "high" or "medium"
    DetectedBy string            // "found next.config.js" or "found \"next\" in package.json"
    Metadata   map[string]string // Framework-specific extras
}
Enter fullscreen mode Exit fullscreen mode

Detection runs in three phases, each more expensive than the last. If a phase finds a match, the remaining phases are skipped.

Phase 1: Root File Markers (High Confidence)

The cheapest check. Read the root directory listing (one syscall) and look for framework-specific marker files.

// Phase 1: root file markers (high confidence).
for _, fw := range priorityOrder {
    tmpl, ok := templates[fw]
    if !ok || tmpl.Alias != "" {
        continue
    }
    for _, marker := range tmpl.DetectFiles {
        if rootFiles[marker] {
            result = &DetectResult{
                Framework:  fw,
                Confidence: "high",
                DetectedBy: fmt.Sprintf("found %s", marker),
                Metadata:   make(map[string]string),
            }
            break
        }
    }
    if result != nil {
        break
    }
}
Enter fullscreen mode Exit fullscreen mode

Each framework defines its own marker files in the database. For example:

Framework Marker Files
Next.js next.config.js, next.config.mjs, next.config.ts
Nuxt nuxt.config.js, nuxt.config.ts
Rails Gemfile, config.ru, bin/rails
Go go.mod
Django manage.py
Spring Boot pom.xml, build.gradle, build.gradle.kts

If the repo root contains next.config.js, we know it's Next.js with high confidence. No file contents need to be read.

Phase 2: Dependency Markers (Medium Confidence)

If no root file marker matches, we open dependency files and search for framework-specific packages.

// Phase 2: dependency markers (medium confidence).
for _, fw := range priorityOrder {
    tmpl, ok := templates[fw]
    if !ok || tmpl.Alias != "" {
        continue
    }
    for depFile, packages := range tmpl.DetectDeps {
        if !rootFiles[depFile] {
            continue
        }
        content, err := readFileContent(filepath.Join(repoDir, depFile))
        if err != nil {
            continue
        }
        for _, pkg := range packages {
            if containsDependency(depFile, content, pkg) {
                result = &DetectResult{
                    Framework:  fw,
                    Confidence: "medium",
                    DetectedBy: fmt.Sprintf("found %q in %s", pkg, depFile),
                    Metadata:   make(map[string]string),
                }
                break
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

The dependency matching is format-aware. containsDependency knows that package.json is JSON (check for a key), Cargo.toml is TOML (check for a dependency line), and Gemfile is plain text (case-insensitive search). It understands the structure well enough to avoid false positives.

Examples:

Framework Dependency File Package to Find
Fastify package.json fastify
Express package.json express
Actix Cargo.toml actix-web
Phoenix mix.exs phoenix
Rails Gemfile rails

This phase is "medium confidence" because a dependency file tells you what libraries are installed, but not necessarily which one is the primary framework. A repo with both express and next in package.json is Next.js, not Express, which is where priority ordering comes in.

Phase 3: Deep Scan (.NET Projects)

Some frameworks don't follow the "config file in root" convention. .NET projects bury their .csproj or .fsproj files in subdirectories, for example, src/MyApp/MyApp.csproj. Phase 3 does a shallow directory walk to find them:

// Phase 3: shallow scan for .NET projects in subdirectories.
if result == nil {
    if projectFile, err := findDotnetProject(repoDir); err == nil && projectFile != "" {
        result = &DetectResult{
            Framework:  "dotnet",
            Confidence: "high",
            DetectedBy: fmt.Sprintf("found %s", projectFile),
            Metadata:   map[string]string{"project_file": projectFile},
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

This is the most expensive phase (directory traversal), so it only runs when phases 1 and 2 found nothing.

The Priority System: Resolving Ambiguity

What happens when a repo has both go.mod and package.json? Or both Gemfile and config.ru?

The detector iterates frameworks in a fixed priority order, most specific first:

var priorityOrder = []string{
    "nextjs", "nuxt", "svelte", "remix", "nestjs",
    "phoenix",
    "rails", "sinatra",
    "django", "fastapi", "flask",
    "laravel",
    "actix", "axum",
    "spring-boot",
    "dotnet",
    "go",
    "fastify", "express",
}
Enter fullscreen mode Exit fullscreen mode

The ordering is deliberate:

Frontend meta-frameworks first. Next.js, Nuxt, SvelteKit, and Remix are checked before their underlying runtimes. A Next.js project has both next.config.js and package.json; if Express were checked first, it would incorrectly match on package.json.

Full-stack frameworks before micro-frameworks. Rails before Sinatra. Django before Flask. Phoenix before generic Elixir. A Rails project contains config.ru (which Sinatra also uses), but the Rails-specific markers are more definitive.

Go and generic Node last. go.mod and package.json are present in many polyglot repos. They're the catch-all; if nothing more specific matches, these are the right fallbacks.

The priority list is evaluated in the same order for both Phase 1 (file markers) and Phase 2 (dependency markers). The first framework to match in priority order wins.

Metadata Enrichment: Framework-Specific Intelligence

After detection, the system enriches the result with framework-specific metadata that the Dockerfile generator needs:

func (d *FrameworkDetector) enrichMetadata(result *DetectResult, repoDir string) {
    switch result.Framework {
    case "nextjs":
        if !inspectNextStandalone(repoDir) {
            result.Metadata["template_override"] = "nextjs_standard"
            result.Metadata["suggest_standalone"] = "true"
        }
    case "actix", "axum":
        if name := parseCargoBinaryName(repoDir); name != "" {
            result.Metadata["binary_name"] = name
        }
    case "dotnet":
        if pf, err := findDotnetProject(repoDir); err == nil && pf != "" {
            result.Metadata["project_file"] = pf
        }
    case "phoenix":
        if name := parseElixirAppName(repoDir); name != "" {
            result.Metadata["binary_name"] = name
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Next.js: Standalone Detection

Next.js has two deployment modes. The output: 'standalone' config produces a minimal, self-contained build. Without it, you get a full node_modules directory in the image.

The detector inspects the Next config file, strips comment lines to avoid false positives from commented-out config, normalizes whitespace, and checks for the standalone output setting:

func inspectNextStandalone(repoDir string) bool {
    for _, name := range []string{"next.config.js", "next.config.mjs", "next.config.ts"} {
        content, err := readFileContent(filepath.Join(repoDir, name))
        if err != nil {
            continue
        }
        // Strip // line comments to avoid false positives
        var uncommented strings.Builder
        for _, line := range strings.Split(content, "\n") {
            trimmed := strings.TrimSpace(line)
            if strings.HasPrefix(trimmed, "//") {
                continue
            }
            uncommented.WriteString(line)
        }
        normalized := strings.ReplaceAll(uncommented.String(), " ", "")
        if strings.Contains(normalized, "output:'standalone'") ||
            strings.Contains(normalized, `output:"standalone"`) {
            return true
        }
    }
    return false
}
Enter fullscreen mode Exit fullscreen mode

If standalone isn't configured, the detector sets template_override: "nextjs_standard" so the Dockerfile generator uses a different template and emits a suggestion to the user:

Tip: Add output: 'standalone' to your next.config.js for smaller, optimized images

Rust: Binary Name from Cargo.toml

Rust builds produce a binary named after the [package] name in Cargo.toml. The Dockerfile template needs this name to copy the correct binary from the builder stage. The detector parses it out:

func parseCargoBinaryName(repoDir string) string {
    content, err := readFileContent(filepath.Join(repoDir, "Cargo.toml"))
    if err != nil {
        return ""
    }
    inPackage := false
    for _, line := range strings.Split(content, "\n") {
        trimmed := strings.TrimSpace(line)
        if strings.HasPrefix(trimmed, "[") {
            inPackage = trimmed == "[package]"
            continue
        }
        if inPackage && strings.HasPrefix(trimmed, "name") {
            parts := strings.SplitN(trimmed, "=", 2)
            if len(parts) == 2 {
                return strings.Trim(strings.TrimSpace(parts[1]), "\"'")
            }
        }
    }
    return ""
}
Enter fullscreen mode Exit fullscreen mode

Without this, the Dockerfile would use a generic binary name and the build would fail at the COPY --from=builder stage when the binary doesn't exist at the expected path.

Everything Lives in the Database

Here's the design decision that ties this all together: every detection rule, every Dockerfile template, and every default is stored in the platform_config database table as JSONB.

INSERT INTO platform_config (key, value, description) VALUES
('dockerfile_templates.nextjs', '{
  "description": "Next.js (Node.js)",
  "detect_files": ["next.config.js", "next.config.mjs", "next.config.ts"],
  "detect_deps": {"package.json": ["next"]},
  "default_port": 3000,
  "default_build_cmd": "npm run build",
  "default_start_cmd": "npm start",
  "template": "FROM node:{{.RuntimeVersion}}-alpine AS deps\n..."
}', 'Dockerfile template for Next.js applications');
Enter fullscreen mode Exit fullscreen mode

Each framework entry contains:

  • detect_files: root file markers for Phase 1
  • detect_deps: dependency file → package name mappings for Phase 2
  • template: Go text/template string for Dockerfile generation
  • default_port, default_build_cmd, default_start_cmd: defaults that can be overridden per-deployment An operator can add support for a new framework or modify detection rules for an existing one with a single database insert. No code changes. No redeployment. The detector loads all templates from the database on every detection run.

Framework aliases handle sub-frameworks that share the same Dockerfile template. Gin, Fiber, Echo, and Chi all alias to the go template:

INSERT INTO platform_config (key, value, description) VALUES
('dockerfile_templates.gin',   '{"alias": "go"}', 'Alias → dockerfile_templates.go'),
('dockerfile_templates.fiber', '{"alias": "go"}', 'Alias → dockerfile_templates.go'),
('dockerfile_templates.echo',  '{"alias": "go"}', 'Alias → dockerfile_templates.go'),
('dockerfile_templates.chi',   '{"alias": "go"}', 'Alias → dockerfile_templates.go');
Enter fullscreen mode Exit fullscreen mode

From Detection to Dockerfile

Once the framework is detected, the DockerfileGenerator takes over:

gen := builder.NewDockerfileGenerator(cfg.Store)
tmpl, err := gen.LoadTemplate(ctx, templateKey)
// ...
rendered, err := gen.Generate(tmpl, builder.DockerfileVars{
    RuntimeVersion: runtimeVersion,
    Port:           svc.Port,
    Framework:      result.Framework,
    AppName:        svc.Name,
    BinaryName:     result.Metadata["binary_name"],
    ProjectFile:    result.Metadata["project_file"],
})
Enter fullscreen mode Exit fullscreen mode

The template variables flow through naturally:

  • {{.RuntimeVersion}}: pinned per-service, or falls back to a sensible default (Node 20, Python 3.12, Go 1.23, etc.)
  • {{.Port}}: from service config, or the template's default_port
  • {{.BinaryName}}: parsed from Cargo.toml or mix.exs by the metadata enrichment phase
  • {{.ProjectFile}}: the relative path to the .csproj for .NET projects The generated Dockerfile is then passed to the Buildah build pipeline (covered in the previous post) as if the user had written it themselves.

The Dashboard Integration: Real-Time Detection

Detection doesn't just happen at build time. When a user connects a GitHub repo in the dashboard wizard, the Go API runs the same detection logic against the GitHub API, before any code is cloned:

func (s *Server) handleAnalyzeRepo(w http.ResponseWriter, r *http.Request) {
    // 1. Get root tree via GitHub API
    tree, _ := s.githubApp.GetTree(ctx, installationID, owner, repo, branch)

    // 2. Determine which dependency files to fetch
    filesToFetch := ghpkg.DetectFilesToFetch(tree)

    // 3. Fetch them in parallel (bounded to 4 concurrent requests)
    fileContents := s.fetchFilesParallel(ctx, installationID,
        owner, repo, branch, filesToFetch)

    // 4. Detect framework from the fetched contents
    fwResult := ghpkg.DetectFrameworkFromDeps(tree, fileContents)
}
Enter fullscreen mode Exit fullscreen mode

The parallel file fetching uses a bounded semaphore pattern, up to 4 concurrent GitHub API calls:

sem := make(chan struct{}, 4) // bounded concurrency

for _, name := range files {
    wg.Add(1)
    go func(filename string) {
        defer wg.Done()
        sem <- struct{}{}
        defer func() { <-sem }()

        content, err := s.githubApp.GetFileContent(ctx,
            installationID, owner, repo, filename, branch)
        if err != nil {
            return // Non-fatal — skip files we can't fetch
        }

        mu.Lock()
        results[filename] = []byte(content)
        mu.Unlock()
    }(name)
}
Enter fullscreen mode Exit fullscreen mode

The result appears instantly in the wizard UI: the framework card gets an "Auto-detected" badge, the runtime version auto-populates, and the user can override any of these before deploying.

The Fallback Strategy

What if detection fails? The platform has a configurable fallback strategy, stored in ... you guessed it, the database:

INSERT INTO platform_config (key, value, description) VALUES
('dockerfile_gen.fallback_strategy', '"error"',
 'What to do when framework is unknown and no Dockerfile: "error" or "generic_node"');
Enter fullscreen mode Exit fullscreen mode

In "error" mode (the default), the deploy fails with a clear message telling the user to add a Dockerfile. In "generic_node" mode, it falls back to a generic Node.js Express template, a reasonable bet for repos where package.json exists but no specific framework was identified.

The fallback result is clearly marked:

result = &builder.DetectResult{
    Framework:  "express",
    Confidence: "low",
    DetectedBy: "fallback:generic_node",
    Metadata:   make(map[string]string),
}
Enter fullscreen mode Exit fullscreen mode

The "low" confidence and "fallback:generic_node" detection source are both surfaced to the user via SSE events, so there's no silent guessing.

The Numbers

Metric Value
Frameworks detected 26
Dockerfile templates 19 (+ 7 aliases)
Detection phases 3 (file markers → deps → deep scan)
Confidence levels 3 (high, medium, low/fallback)
External API calls 0
Detection time Sub-millisecond (after clone)
Template variables 8 (RuntimeVersion, Port, BuildCommand, StartCommand, Framework, AppName, BinaryName, ProjectFile)
Configuration 100% database-driven, zero hardcoded rules

Try It

Push a repo with no Dockerfile and watch the detection work in real-time through the dashboard wizard or SSE event stream.

👉 Join the waitlist at staxa.dev


This is the third post in the Building Staxa series. Previously: the multi-tenancy problem and the Buildah build pipeline. Next up: how Kubernetes NetworkPolicies, ResourceQuotas, and namespace isolation create real tenant boundaries, and why tenant_id columns aren't enough.

Top comments (2)

Collapse
 
xwero profile image
david duymelinck

Isn't the files lookup a cause for false positives?

I'm no ruby expert so I checked and config.ru isn't the config file. .ru was the giveaway, because the ruby extension is .rb. so I think that is a typo.
Config.rb can be the Chef configuration file. And I assume there are other solutions that look for that file.
The Rails main config file is config/application.rb.

The first place I would look to find the framework are the package manager files. There can be multiple when the project uses multiple languages.
The package manager file itself tells you nothing about the framework. You seem to suggest it does in the case of Rails.

I'm focussing on Rails because that was the one solution that I found being off.

Collapse
 
jonny2k26 profile image
Jonathan Pitter

You're right in that I would need to be more specific when looking at files. Thanks for that.

config.ru is a Rackup file (.ru = rackup), not a Ruby config file. It's present in any Rack-based app, Sinatra, Hanami, Roda, not just Rails.

stackoverflow.com/questions/550728...

The way Phase 1 works, if any marker in detect_files matches, the framework is detected. Since Rails is higher priority than Sinatra, a Sinatra app with a Gemfile and config.ru would be incorrectly detected as Rails. That's a false positive.

So I'm thinking of narrowing Rails' Phase 1 markers down to just bin/rails, the only file that's genuinely Rails-specific. Then let Phase 2 handle the rest by checking for the rails gem inside the Gemfile. That way:

  • bin/rails present → Rails, high confidence (Phase 1)
  • Gemfile with rails gem → Rails, medium confidence (Phase 2)
  • Gemfile with sinatra gem → Sinatra, medium confidence (Phase 2)
  • config.ru alone → Sinatra catches it via its own detect_files

I'll update the rules and try some evaluations when I get some free time today. Then update this post.