Viacheslav Bogdanov

Posted on Jun 4

Add a 50x+ faster duplicate-code gate to GitHub Actions with jscpd-rs

#npm #performance #devops #ai

Duplicate-code checks are useful, but they often become one more slow quality gate that teams run less often than they should.

That trade-off is getting worse. Large teams already create repeated code through parallel feature work. AI coding agents make code generation even cheaper, which also makes accidental copy-paste cheaper to create. Reviewers still need deterministic checks that catch repetition before it settles into the codebase.

This post shows how to add a fast duplicate-code gate to GitHub Actions with jscpd-rs, a native Rust implementation of the common jscpd workflow.

If you only want the CI snippet, jump to the GitHub Actions workflow below and adjust the threshold and ignore list for your repository.

What is jscpd?

jscpd is a copy-paste detector for source code. It scans a project, finds duplicated fragments across files, writes reports for humans and CI systems, and can fail a build when the duplicated-line percentage crosses a configured threshold.

jscpd-rs keeps the familiar workflow:

scan source trees from the CLI;
load .jscpd.json or package.json#jscpd;
generate console, JSON, SARIF, HTML, XML, CSV, Markdown, badge, and Xcode reports;
fail CI on a threshold;
expose the jscpd, jscpd-rs, and jscpd-server command names.

The difference is the hot path: file discovery, tokenization, matching, and reporting run natively in Rust. The npm package uses prebuilt platform binaries for Linux, macOS, and Windows. On those platforms, npm users do not need a Rust toolchain just to run the check. Unsupported platforms can install the CLI through Cargo.

Quick local check

Try it locally before putting it in CI:

npx --yes jscpd-rs --threshold 5 --exitCode 1 .

That command scans the current project and exits with code 1 if duplication is above 5%.

The examples use npx --yes so CI does not stop on an interactive package install prompt.

For a more useful first run, write machine-readable reports:

npx --yes jscpd-rs \
  --threshold 5 \
  --exitCode 1 \
  --reporters console,json,sarif,html \
  --output report \
  --ignore "node_modules/**" \
  --ignore "dist/**" \
  --ignore "coverage/**" \
  --ignore "target/**" \
  .

The important outputs are:

terminal summary from console;
report/jscpd-report.json for scripts;
report/jscpd-sarif.json for GitHub Code Scanning;
report/html/ for a browsable local report.

Add it to GitHub Actions

Create .github/workflows/duplicate-code.yml:

name: duplicate-code

on:
  pull_request:
  push:
    branches: [main]

jobs:
  jscpd:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      security-events: write

    steps:
      - uses: actions/checkout@v5

      - uses: actions/setup-node@v5
        with:
          node-version: 22

      - name: Run duplicate-code check
        run: |
          npx --yes jscpd-rs \
            --threshold 5 \
            --exitCode 1 \
            --reporters console,json,sarif \
            --output report \
            --ignore "node_modules/**" \
            --ignore "dist/**" \
            --ignore "coverage/**" \
            --ignore "target/**" \
            .

      - name: Upload SARIF
        if: always()
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: report/jscpd-sarif.json

If you do not use GitHub Code Scanning, remove the security-events: write permission and the Upload SARIF step.

The duplicate-code step is still the gate. The SARIF upload only makes findings visible in the GitHub Security tab and code-scanning UI.

Use a config file for real projects

For a real repository, I prefer moving the policy into .jscpd.json and keeping the CI command short:

{
  "minLines": 5,
  "minTokens": 50,
  "threshold": 5,
  "reporters": ["console", "json", "sarif"],
  "output": "report",
  "ignore": [
    "node_modules/**",
    "dist/**",
    "coverage/**",
    "target/**",
    ".next/**",
    "generated/**",
    "**/*.snap"
  ],
  "gitignore": true,
  "noTips": true
}

Then the workflow step becomes:

- name: Run duplicate-code check
  run: npx --yes jscpd-rs .

This is easier to maintain because the threshold, ignored paths, and reporters live with the code-quality policy instead of being hidden inside YAML.

Choosing a threshold

Start with a threshold that does not block the whole team on day one.

For existing codebases, I usually recommend:

Run the tool without failing CI and inspect the report.
Ignore generated files, build output, snapshots, and vendored code.
Set the threshold slightly above the current duplicated percentage.
Ratchet it down over time.

For a new project, a stricter threshold is reasonable:

npx --yes jscpd-rs --threshold 3 --exitCode 1 src

The point is not to delete every repeated line immediately. The point is to stop new accidental duplication from entering silently.

Why speed matters

Slow checks are the first checks teams disable, move to nightly jobs, or run only before releases. That is exactly the wrong place for duplicate-code detection: copy-paste is cheapest to fix when the pull request is still fresh.

The current public benchmark suite for jscpd-rs uses pinned React, Next.js, and Prometheus revisions and compares against upstream jscpd with the same high-level inputs and options:

Case	Format	jscpd-rs avg	upstream jscpd avg	Speedup
React `f0dfee3`	JavaScript	0.197325s	10.413453s	52.77x
Next.js `2bbb67b9`	TypeScript	0.270786s	14.983243s	55.33x
Prometheus `a0524ee`	Go	0.083162s	4.842499s	58.23x

Those numbers are not a guarantee that every repository will see the same ratio. They are a public baseline for the current release gate: large enough to be useful, pinned enough to be reproducible, and focused on the kind of check you would actually run in CI.

The benchmark table is also published in the README performance section, along with the release-candidate command used to rerun the public suite.

Compatibility model

The goal is practical compatibility with upstream jscpd, not a new tool with similar-looking output.

For the current 0.x releases, the compatibility gate is coverage-first: on the same inputs and options, jscpd-rs must not miss duplicated source lines reported by upstream jscpd. Extra Rust findings are visible in compatibility reports and treated as follow-up work when they are noisy.

That model is useful for CI adoption because missing real duplicated ranges is the dangerous failure mode. Exact clone-pair identity can differ, especially for multi-way clones, while still covering the same duplicated source ranges.

There are also intentional first-release limits:

dynamic npm reporters, stores, listeners, and plugins are not loaded;
HTML output is practical and self-contained, not pixel-perfect;
exact token totals may differ from upstream;
this is a native CLI and Rust library, not a JavaScript package API clone.

npm or Cargo?

Use npm when Node is already part of your workflow:

npm install -g jscpd-rs
jscpd --threshold 5 --exitCode 1 .

Or run it without a global install:

npx --yes jscpd-rs --threshold 5 --exitCode 1 .

Use Cargo when Rust is the natural toolchain for the project or when npm prebuilt binaries are not available for your platform:

cargo install jscpd-rs --locked
jscpd --threshold 5 --exitCode 1 .

Where to go next

Links:

Repository: https://github.com/vv-bogdanov/jscpd-rs
npm package: https://www.npmjs.com/package/jscpd-rs
crates.io: https://crates.io/crates/jscpd-rs
docs.rs: https://docs.rs/jscpd-rs
User guide: https://github.com/vv-bogdanov/jscpd-rs/blob/main/docs/user-guide.md
Migration notes: https://github.com/vv-bogdanov/jscpd-rs/blob/main/docs/migrating-from-jscpd.md
Longer launch note: https://vv-bogdanov.github.io/posts/fast-duplicate-code-detection-for-agents/

I would especially like feedback on:

repositories where jscpd-rs misses duplicates found by upstream jscpd;
report compatibility issues in JSON, SARIF, HTML, XML, CSV, or Markdown;
npm install friction on Linux, macOS, or Windows;
public benchmark cases that represent real monorepos;
formats where generic tokenization is too noisy or not sensitive enough.

If duplicate-code checks are currently too slow to keep in every pull request, try running this once:

npx --yes jscpd-rs --threshold 5 --exitCode 1 .

That should be enough to see whether the check is cheap enough for your normal CI loop.

DEV Community