Duplicate-code checks are useful, but they often become one more slow quality gate that teams run less often than they should.
That trade-off is getting worse. Large teams already create repeated code through parallel feature work. AI coding agents make code generation even cheaper, which also makes accidental copy-paste cheaper to create. Reviewers still need deterministic checks that catch repetition before it settles into the codebase.
This post shows how to add a fast duplicate-code gate to GitHub Actions with jscpd-rs, a native Rust implementation of the common jscpd workflow.
If you only want the CI snippet, jump to the GitHub Actions workflow below and adjust the threshold and ignore list for your repository.
What is jscpd?
jscpd is a copy-paste detector for source code. It scans a project, finds duplicated fragments across files, writes reports for humans and CI systems, and can fail a build when the duplicated-line percentage crosses a configured threshold.
jscpd-rs keeps the familiar workflow:
- scan source trees from the CLI;
- load
.jscpd.jsonorpackage.json#jscpd; - generate console, JSON, SARIF, HTML, XML, CSV, Markdown, badge, and Xcode reports;
- fail CI on a threshold;
- expose the
jscpd,jscpd-rs, andjscpd-servercommand names.
The difference is the hot path: file discovery, tokenization, matching, and reporting run natively in Rust. The npm package uses prebuilt platform binaries for Linux, macOS, and Windows. On those platforms, npm users do not need a Rust toolchain just to run the check. Unsupported platforms can install the CLI through Cargo.
Quick local check
Try it locally before putting it in CI:
npx --yes jscpd-rs --threshold 5 --exitCode 1 .
That command scans the current project and exits with code 1 if duplication is above 5%.
The examples use npx --yes so CI does not stop on an interactive package install prompt.
For a more useful first run, write machine-readable reports:
npx --yes jscpd-rs \
--threshold 5 \
--exitCode 1 \
--reporters console,json,sarif,html \
--output report \
--ignore "node_modules/**" \
--ignore "dist/**" \
--ignore "coverage/**" \
--ignore "target/**" \
.
The important outputs are:
- terminal summary from
console; -
report/jscpd-report.jsonfor scripts; -
report/jscpd-sarif.jsonfor GitHub Code Scanning; -
report/html/for a browsable local report.
Add it to GitHub Actions
Create .github/workflows/duplicate-code.yml:
name: duplicate-code
on:
pull_request:
push:
branches: [main]
jobs:
jscpd:
runs-on: ubuntu-latest
permissions:
contents: read
security-events: write
steps:
- uses: actions/checkout@v5
- uses: actions/setup-node@v5
with:
node-version: 22
- name: Run duplicate-code check
run: |
npx --yes jscpd-rs \
--threshold 5 \
--exitCode 1 \
--reporters console,json,sarif \
--output report \
--ignore "node_modules/**" \
--ignore "dist/**" \
--ignore "coverage/**" \
--ignore "target/**" \
.
- name: Upload SARIF
if: always()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: report/jscpd-sarif.json
If you do not use GitHub Code Scanning, remove the security-events: write permission and the Upload SARIF step.
The duplicate-code step is still the gate. The SARIF upload only makes findings visible in the GitHub Security tab and code-scanning UI.
Use a config file for real projects
For a real repository, I prefer moving the policy into .jscpd.json and keeping the CI command short:
{
"minLines": 5,
"minTokens": 50,
"threshold": 5,
"reporters": ["console", "json", "sarif"],
"output": "report",
"ignore": [
"node_modules/**",
"dist/**",
"coverage/**",
"target/**",
".next/**",
"generated/**",
"**/*.snap"
],
"gitignore": true,
"noTips": true
}
Then the workflow step becomes:
- name: Run duplicate-code check
run: npx --yes jscpd-rs .
This is easier to maintain because the threshold, ignored paths, and reporters live with the code-quality policy instead of being hidden inside YAML.
Choosing a threshold
Start with a threshold that does not block the whole team on day one.
For existing codebases, I usually recommend:
- Run the tool without failing CI and inspect the report.
- Ignore generated files, build output, snapshots, and vendored code.
- Set the threshold slightly above the current duplicated percentage.
- Ratchet it down over time.
For a new project, a stricter threshold is reasonable:
npx --yes jscpd-rs --threshold 3 --exitCode 1 src
The point is not to delete every repeated line immediately. The point is to stop new accidental duplication from entering silently.
Why speed matters
Slow checks are the first checks teams disable, move to nightly jobs, or run only before releases. That is exactly the wrong place for duplicate-code detection: copy-paste is cheapest to fix when the pull request is still fresh.
The current public benchmark suite for jscpd-rs uses pinned React, Next.js, and Prometheus revisions and compares against upstream jscpd with the same high-level inputs and options:
| Case | Format | jscpd-rs avg | upstream jscpd avg | Speedup |
|---|---|---|---|---|
React f0dfee3
|
JavaScript | 0.197325s | 10.413453s | 52.77x |
Next.js 2bbb67b9
|
TypeScript | 0.270786s | 14.983243s | 55.33x |
Prometheus a0524ee
|
Go | 0.083162s | 4.842499s | 58.23x |
Those numbers are not a guarantee that every repository will see the same ratio. They are a public baseline for the current release gate: large enough to be useful, pinned enough to be reproducible, and focused on the kind of check you would actually run in CI.
The benchmark table is also published in the README performance section, along with the release-candidate command used to rerun the public suite.
Compatibility model
The goal is practical compatibility with upstream jscpd, not a new tool with similar-looking output.
For the current 0.x releases, the compatibility gate is coverage-first: on the same inputs and options, jscpd-rs must not miss duplicated source lines reported by upstream jscpd. Extra Rust findings are visible in compatibility reports and treated as follow-up work when they are noisy.
That model is useful for CI adoption because missing real duplicated ranges is the dangerous failure mode. Exact clone-pair identity can differ, especially for multi-way clones, while still covering the same duplicated source ranges.
There are also intentional first-release limits:
- dynamic npm reporters, stores, listeners, and plugins are not loaded;
- HTML output is practical and self-contained, not pixel-perfect;
- exact token totals may differ from upstream;
- this is a native CLI and Rust library, not a JavaScript package API clone.
npm or Cargo?
Use npm when Node is already part of your workflow:
npm install -g jscpd-rs
jscpd --threshold 5 --exitCode 1 .
Or run it without a global install:
npx --yes jscpd-rs --threshold 5 --exitCode 1 .
Use Cargo when Rust is the natural toolchain for the project or when npm prebuilt binaries are not available for your platform:
cargo install jscpd-rs --locked
jscpd --threshold 5 --exitCode 1 .
Where to go next
Links:
- Repository: https://github.com/vv-bogdanov/jscpd-rs
- npm package: https://www.npmjs.com/package/jscpd-rs
- crates.io: https://crates.io/crates/jscpd-rs
- docs.rs: https://docs.rs/jscpd-rs
- User guide: https://github.com/vv-bogdanov/jscpd-rs/blob/main/docs/user-guide.md
- Migration notes: https://github.com/vv-bogdanov/jscpd-rs/blob/main/docs/migrating-from-jscpd.md
- Longer launch note: https://vv-bogdanov.github.io/posts/fast-duplicate-code-detection-for-agents/
I would especially like feedback on:
- repositories where
jscpd-rsmisses duplicates found by upstreamjscpd; - report compatibility issues in JSON, SARIF, HTML, XML, CSV, or Markdown;
- npm install friction on Linux, macOS, or Windows;
- public benchmark cases that represent real monorepos;
- formats where generic tokenization is too noisy or not sensitive enough.
If duplicate-code checks are currently too slow to keep in every pull request, try running this once:
npx --yes jscpd-rs --threshold 5 --exitCode 1 .
That should be enough to see whether the check is cheap enough for your normal CI loop.
Top comments (0)