I Built fintech-fraud-sim: A TypeScript CLI for Synthetic Fraud Testing Data

#fintech #typescript #cli #testing

Fraud systems are hard to test well.

Not because engineers cannot write tests, but because the data needs to tell a story. A suspicious transaction is rarely suspicious because of one field. It is usually suspicious because of patterns:

a new account with too many beneficiaries
many transactions in a short window
failed logins followed by a device change
KYC failures mixed with country mismatches
transaction amounts far above normal behavior

Using production customer data for this is risky and often unacceptable. Basic mock data is usually too flat.

So I built fintech-fraud-sim, a TypeScript CLI that generates synthetic fintech users and transactions with configurable fraud patterns.

NPM package: https://www.npmjs.com/package/fintech-fraud-sim

Quick Start

npx fintech-fraud-sim generate --users 1000 --fraud-rate 0.08

That command generates:

users.csv
transactions.csv
users.json
transactions.json
summary.json

You can also choose a format:

npx fintech-fraud-sim generate --users 5000 --fraud-rate 0.12 --format csv

Or send output to a folder:

npx fintech-fraud-sim generate --users 2000 --fraud-rate 0.05 --format json --out ./data

Why I Built It

Fraud detection products need test data in many places:

dashboards
QA fixtures
rules engines
risk scoring services
transaction monitoring demos
analytics pipelines
prototype fraud models

But realistic fraud test data is not just random rows. It needs consistent signals across users and transactions.

For example, an account takeover scenario should not only mark a transaction as suspicious. The user should also show supporting signals such as failed logins, device changes, and country mismatch.

That is the idea behind this CLI.

What the Data Looks Like

Generated users include fields like:

user_id
country
account_age_days
kyc_status
failed_kyc_attempts
device_count
ip_country
declared_country
failed_login_attempts_24h
beneficiary_count_24h
chargeback_count
is_fraud
fraud_pattern
risk_label
reason_codes

Generated transactions include:

transaction_id
user_id
timestamp
amount
currency
channel
beneficiary_id
beneficiary_country
device_id
ip_country
status
is_suspicious
fraud_pattern
reason_codes

The package does not generate real names, emails, phone numbers, BVNs, NINs, or bank account numbers.

Supported Fraud Patterns

The CLI currently supports:

Pattern	Description
`mule_account`	New account, high beneficiary count, rapid funds movement
`account_takeover`	Device change, country mismatch, failed login spike
`velocity_abuse`	Many transactions in a short period
`kyc_abuse`	Multiple failed KYC attempts and inconsistent country data
`chargeback_risk`	Prior chargebacks and high-value transactions
`transaction_spike`	Amount far above user baseline
`cross_border_anomaly`	IP or beneficiary country mismatch
`beneficiary_burst`	Many new beneficiaries within 24 hours

You can select specific patterns:

npx fintech-fraud-sim generate \
  --users 1000 \
  --fraud-rate 0.08 \
  --patterns mule,account_takeover,velocity_abuse

mule is accepted as an alias for mule_account.

Deterministic Generation

For tests and demos, deterministic output is important.

npx fintech-fraud-sim generate --users 1000 --fraud-rate 0.08 --seed demo

The same seed produces the same dataset, which makes the CLI useful in CI and repeatable test suites.

Testing

The project includes tests for:

generating users and transactions
fraud rate handling
seeded deterministic output
zero fraud rate behavior
fraud pattern logic
CSV and JSON writers
CLI execution

Example:

npm test

The test suite uses Node's built-in test runner with tsx, so TypeScript tests can run directly.

Example Use Cases

You can use the generated data to:

demo fraud dashboards
test transaction monitoring rules
validate CSV import workflows
build realistic QA fixtures
prototype risk scoring logic
test data pipelines without exposing customer records

Safety Disclaimer

fintech-fraud-sim is for testing, education, and fraud-model prototyping only. It generates synthetic data and should not be treated as real customer data or used as a production fraud decisioning system.