If you've ever needed realistic business data for testing, demos, or development, you've probably used one of these:
- AdventureWorks — last updated 2014, SQL Server only, no real accounting
- Northwind — last updated ~2000, 8 tables, no financial integrity
- Faker/Mockaroo — random flat data with no relationships between tables
They all have the same problem: they don't reflect how a real business actually works.
A real business has sales that generate invoices, invoices that trigger payments, payments that hit the bank, and bank transactions that flow into double-entry journal entries. None of the above give you that.
So I built one that does.
What is sme-sim?
It's a day-by-day business simulator. You spin up a fake Australian retail company and let it operate for 2 financial years. Each simulated day, the company:
- Receives and fulfils customer orders
- Processes payments (some early, some late, some partial)
- Runs fortnightly payroll with real tax calculations
- Reorders inventory when stock drops below reorder points
- Generates double-entry journal entries for every financial event
- Lodges quarterly BAS (tax returns) with the ATO
After 2 years, you get 42 interconnected tables with 83,000+ rows and 44 foreign key relationships.
What makes it different
1. End-to-end traceability
Every sale traces all the way through:
Customer → Sales Order → Sales Order Lines → Invoice → Payment
→ Bank Transaction → Journal Entry → Journal Entry Lines
You can pick any transaction and follow it across 8 tables. This is what real business data looks like.
2. Double-entry accounting that actually balances
Every financial event generates balanced journal entries. Debits always equal credits. Across 7,400+ entries, not a single one is unbalanced.
This matters because if you're testing accounting software, you need data where the books actually work. Random generators can't do this.
3. Real tax compliance
The dataset uses real ATO (Australian Tax Office) 2024-25 rules:
- PAYG withholding — actual tax brackets, not made-up percentages
- Medicare levy — 2% on taxable income
- Superannuation — 11.5% employer contribution
- GST — 10% on all sales and purchases
- Quarterly BAS — Business Activity Statements derived from the GL
Every payslip satisfies: Gross = Net + Tax. Every BAS return reconciles to the general ledger.
4. Temporal realism
The simulation creates patterns you'd see in a real business:
- Seasonal sales — camping equipment sells more in spring/summer
- Staff turnover — employees get hired, promoted, and terminated
- Late payments — some customers always pay late, others pay early
- Inventory cycles — stock levels fluctuate with demand and lead times
Comparison
| Feature | AdventureWorks | Northwind | Faker | sme-sim |
|---|---|---|---|---|
| Tables | 71 | 13 | N/A | 42 |
| Cross-domain traceability | Partial | No | No | Full |
| Double-entry accounting | No | No | No | Yes |
| Tax compliance | US-only | None | None | AU + US |
| Temporal realism | Static | Static | Random | Simulated |
| FK relationships | Good | Basic | None | 44 enforced |
| Last updated | 2014 | ~2000 | N/A | 2025 |
| Deterministic | No | N/A | No | Yes (seeded RNG) |
Who is this for?
- Developers building ERP, accounting, CRM, or HR software
- QA teams testing complex workflows that span multiple modules
- Consultants who need realistic demo data without exposing client data
- Data engineers building ETL pipelines or data warehouses
- Students studying business systems, accounting, or databases
- AI/ML teams who need realistic training data for business intelligence models
Get the data
Browse all datasets → mindweave.tech/datasets
Free sample (~2,800 rows, 26 tables):
Full datasets:
- Complete SME Dataset — 42 tables, 83K+ rows — $49
- Domain packs (Accounting, Sales, HR, Inventory) — $19 each
- Multi-Company Bundle — 3 unique companies — $99
- Enterprise Pack — 5 unique companies, 400K+ rows — $199
Quick start
git clone https://github.com/MindweaveTech/sme-sim-sample.git
cd sme-sim-sample
# Load into SQLite
sqlite3 :memory: <<'SQL'
.mode csv
.import sales_orders_sample.csv sales_orders
.import journal_entry_lines_sample.csv journal_lines
SELECT count(*) as total_orders FROM sales_orders;
SELECT
sum(debit) as total_debits,
sum(credit) as total_credits,
round(sum(debit) - sum(credit), 2) as difference
FROM journal_lines;
SQL
Output:
total_orders = 200
total_debits = 1847234.56
total_credits = 1847234.56
difference = 0.0
Debits equal credits. Every time.
Technical details
- Engine: Python 3.14, SQLAlchemy 2.x, Click CLI
- Output formats: CSV, SQL (PostgreSQL), SQLite
- Deterministic: Same seed = identical output. Seed 42 always produces "Outback Outdoor Supplies Pty Ltd"
- 12 domain modules: Company, Accounting, HR, Payroll, CRM, Sales, Purchasing, Inventory, Banking, Tax, Assets, Projects
Now available: US variant
Since launching the AU version, I've built a US compliance variant with:
- IRS 2024 federal tax brackets + $14,600 standard deduction
- FICA (Social Security 6.2% + Medicare 1.45%)
- State sales tax (~7.5%)
- Calendar-year fiscal year, LLC with EIN
- US Chart of Accounts (GAAP-style)
Same 42-table structure, same referential integrity — just US-flavoured. Available as US Complete ($49) and US Multi-Company ($99).
Formats
All datasets ship in 4 formats: CSV, SQL (PostgreSQL), Parquet, and SQLite. Load into whatever tool you use — pandas, DuckDB, dbt, Power BI, raw SQL.
What's next
- UK variant (HMRC, PAYE, VAT, GBP)
- More industry presets (restaurant, consulting, e-commerce)
- Open-sourcing the simulation engine
Built by Mindweave Technologies. Browse all datasets → Feedback welcome — what domains or formats would be most useful for your workflow?
Top comments (0)