<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mindweave Technologies</title>
    <description>The latest articles on DEV Community by Mindweave Technologies (@mindweavetech).</description>
    <link>https://dev.to/mindweavetech</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3858409%2F0c8f8887-4b6c-43dd-b126-d1c2bbbf6ff8.png</url>
      <title>DEV Community: Mindweave Technologies</title>
      <link>https://dev.to/mindweavetech</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mindweavetech"/>
    <language>en</language>
    <item>
      <title>AdventureWorks Is Dead. Here's a 42-Table Business Dataset That Actually Balances.</title>
      <dc:creator>Mindweave Technologies</dc:creator>
      <pubDate>Thu, 02 Apr 2026 22:32:35 +0000</pubDate>
      <link>https://dev.to/mindweavetech/adventureworks-is-dead-heres-a-42-table-business-dataset-that-actually-balances-211n</link>
      <guid>https://dev.to/mindweavetech/adventureworks-is-dead-heres-a-42-table-business-dataset-that-actually-balances-211n</guid>
      <description>&lt;p&gt;If you've ever needed realistic business data for testing, demos, or development, you've probably used one of these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AdventureWorks&lt;/strong&gt; — last updated 2014, SQL Server only, no real accounting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Northwind&lt;/strong&gt; — last updated ~2000, 8 tables, no financial integrity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faker/Mockaroo&lt;/strong&gt; — random flat data with no relationships between tables&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They all have the same problem: &lt;strong&gt;they don't reflect how a real business actually works.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A real business has sales that generate invoices, invoices that trigger payments, payments that hit the bank, and bank transactions that flow into double-entry journal entries. None of the above give you that.&lt;/p&gt;

&lt;p&gt;So I built one that does.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is sme-sim?
&lt;/h2&gt;

&lt;p&gt;It's a &lt;strong&gt;day-by-day business simulator&lt;/strong&gt;. You spin up a fake Australian retail company and let it operate for 2 financial years. Each simulated day, the company:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Receives and fulfils customer orders&lt;/li&gt;
&lt;li&gt;Processes payments (some early, some late, some partial)&lt;/li&gt;
&lt;li&gt;Runs fortnightly payroll with real tax calculations&lt;/li&gt;
&lt;li&gt;Reorders inventory when stock drops below reorder points&lt;/li&gt;
&lt;li&gt;Generates double-entry journal entries for every financial event&lt;/li&gt;
&lt;li&gt;Lodges quarterly BAS (tax returns) with the ATO&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After 2 years, you get &lt;strong&gt;42 interconnected tables with 83,000+ rows&lt;/strong&gt; and 44 foreign key relationships.&lt;/p&gt;

&lt;h2&gt;
  
  
  What makes it different
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. End-to-end traceability
&lt;/h3&gt;

&lt;p&gt;Every sale traces all the way through:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Customer → Sales Order → Sales Order Lines → Invoice → Payment
    → Bank Transaction → Journal Entry → Journal Entry Lines
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can pick any transaction and follow it across 8 tables. This is what real business data looks like.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Double-entry accounting that actually balances
&lt;/h3&gt;

&lt;p&gt;Every financial event generates balanced journal entries. Debits always equal credits. Across 7,400+ entries, not a single one is unbalanced.&lt;/p&gt;

&lt;p&gt;This matters because if you're testing accounting software, you need data where the books &lt;em&gt;actually work&lt;/em&gt;. Random generators can't do this.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Real tax compliance
&lt;/h3&gt;

&lt;p&gt;The dataset uses real ATO (Australian Tax Office) 2024-25 rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PAYG withholding&lt;/strong&gt; — actual tax brackets, not made-up percentages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Medicare levy&lt;/strong&gt; — 2% on taxable income&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Superannuation&lt;/strong&gt; — 11.5% employer contribution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GST&lt;/strong&gt; — 10% on all sales and purchases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quarterly BAS&lt;/strong&gt; — Business Activity Statements derived from the GL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every payslip satisfies: &lt;code&gt;Gross = Net + Tax&lt;/code&gt;. Every BAS return reconciles to the general ledger.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Temporal realism
&lt;/h3&gt;

&lt;p&gt;The simulation creates patterns you'd see in a real business:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Seasonal sales&lt;/strong&gt; — camping equipment sells more in spring/summer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Staff turnover&lt;/strong&gt; — employees get hired, promoted, and terminated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Late payments&lt;/strong&gt; — some customers always pay late, others pay early&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inventory cycles&lt;/strong&gt; — stock levels fluctuate with demand and lead times&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Comparison
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;AdventureWorks&lt;/th&gt;
&lt;th&gt;Northwind&lt;/th&gt;
&lt;th&gt;Faker&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;sme-sim&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tables&lt;/td&gt;
&lt;td&gt;71&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;42&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-domain traceability&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Full&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Double-entry accounting&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tax compliance&lt;/td&gt;
&lt;td&gt;US-only&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AU + US&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Temporal realism&lt;/td&gt;
&lt;td&gt;Static&lt;/td&gt;
&lt;td&gt;Static&lt;/td&gt;
&lt;td&gt;Random&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Simulated&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FK relationships&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Basic&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;44 enforced&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Last updated&lt;/td&gt;
&lt;td&gt;2014&lt;/td&gt;
&lt;td&gt;~2000&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2025&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deterministic&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Yes&lt;/strong&gt; (seeded RNG)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Who is this for?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Developers&lt;/strong&gt; building ERP, accounting, CRM, or HR software&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;QA teams&lt;/strong&gt; testing complex workflows that span multiple modules&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consultants&lt;/strong&gt; who need realistic demo data without exposing client data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data engineers&lt;/strong&gt; building ETL pipelines or data warehouses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Students&lt;/strong&gt; studying business systems, accounting, or databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI/ML teams&lt;/strong&gt; who need realistic training data for business intelligence models&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Get the data
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://mindweave.tech/datasets" rel="noopener noreferrer"&gt;Browse all datasets → mindweave.tech/datasets&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Free sample&lt;/strong&gt; (~2,800 rows, 26 tables):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/MindweaveTech/sme-sim-sample" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; — clone and explore&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.kaggle.com/datasets/mindweavetech/australian-sme-business-dataset" rel="noopener noreferrer"&gt;Kaggle&lt;/a&gt; — download or use in notebooks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Full datasets:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://mindweavetech.gumroad.com/l/trcdsq" rel="noopener noreferrer"&gt;Complete SME Dataset&lt;/a&gt; — 42 tables, 83K+ rows — &lt;strong&gt;$49&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://mindweavetech.gumroad.com" rel="noopener noreferrer"&gt;Domain packs&lt;/a&gt; (Accounting, Sales, HR, Inventory) — &lt;strong&gt;$19 each&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://mindweavetech.gumroad.com/l/lnulcg" rel="noopener noreferrer"&gt;Multi-Company Bundle&lt;/a&gt; — 3 unique companies — &lt;strong&gt;$99&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://mindweavetech.gumroad.com/l/olgtyb" rel="noopener noreferrer"&gt;Enterprise Pack&lt;/a&gt; — 5 unique companies, 400K+ rows — &lt;strong&gt;$199&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/MindweaveTech/sme-sim-sample.git
&lt;span class="nb"&gt;cd &lt;/span&gt;sme-sim-sample

&lt;span class="c"&gt;# Load into SQLite&lt;/span&gt;
sqlite3 :memory: &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;SQL&lt;/span&gt;&lt;span class="sh"&gt;'
.mode csv
.import sales_orders_sample.csv sales_orders
.import journal_entry_lines_sample.csv journal_lines
SELECT count(*) as total_orders FROM sales_orders;
SELECT 
  sum(debit) as total_debits, 
  sum(credit) as total_credits,
  round(sum(debit) - sum(credit), 2) as difference
FROM journal_lines;
&lt;/span&gt;&lt;span class="no"&gt;SQL
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;total_orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
&lt;span class="n"&gt;total_debits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1847234.56&lt;/span&gt;
&lt;span class="n"&gt;total_credits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;1847234.56&lt;/span&gt;
&lt;span class="n"&gt;difference&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Debits equal credits. Every time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical details
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Engine:&lt;/strong&gt; Python 3.14, SQLAlchemy 2.x, Click CLI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output formats:&lt;/strong&gt; CSV, SQL (PostgreSQL), SQLite&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic:&lt;/strong&gt; Same seed = identical output. Seed 42 always produces "Outback Outdoor Supplies Pty Ltd"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;12 domain modules:&lt;/strong&gt; Company, Accounting, HR, Payroll, CRM, Sales, Purchasing, Inventory, Banking, Tax, Assets, Projects&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Now available: US variant
&lt;/h2&gt;

&lt;p&gt;Since launching the AU version, I've built a &lt;strong&gt;US compliance variant&lt;/strong&gt; with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IRS 2024 federal tax brackets + $14,600 standard deduction&lt;/li&gt;
&lt;li&gt;FICA (Social Security 6.2% + Medicare 1.45%)&lt;/li&gt;
&lt;li&gt;State sales tax (~7.5%)&lt;/li&gt;
&lt;li&gt;Calendar-year fiscal year, LLC with EIN&lt;/li&gt;
&lt;li&gt;US Chart of Accounts (GAAP-style)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Same 42-table structure, same referential integrity — just US-flavoured. Available as &lt;a href="https://mindweavetech.gumroad.com/l/qnecpy" rel="noopener noreferrer"&gt;US Complete ($49)&lt;/a&gt; and &lt;a href="https://mindweavetech.gumroad.com/l/mfmfo" rel="noopener noreferrer"&gt;US Multi-Company ($99)&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Formats
&lt;/h2&gt;

&lt;p&gt;All datasets ship in &lt;strong&gt;4 formats&lt;/strong&gt;: CSV, SQL (PostgreSQL), Parquet, and SQLite. Load into whatever tool you use — pandas, DuckDB, dbt, Power BI, raw SQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;UK variant (HMRC, PAYE, VAT, GBP)&lt;/li&gt;
&lt;li&gt;More industry presets (restaurant, consulting, e-commerce)&lt;/li&gt;
&lt;li&gt;Open-sourcing the simulation engine&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Built by &lt;a href="https://mindweave.tech" rel="noopener noreferrer"&gt;Mindweave Technologies&lt;/a&gt;. &lt;strong&gt;&lt;a href="https://mindweave.tech/datasets" rel="noopener noreferrer"&gt;Browse all datasets →&lt;/a&gt;&lt;/strong&gt; Feedback welcome — what domains or formats would be most useful for your workflow?&lt;/p&gt;

</description>
      <category>database</category>
      <category>sql</category>
      <category>opensource</category>
      <category>python</category>
    </item>
  </channel>
</rss>
