I'm opening ContractForge — define data ingestion intent once, run it natively anywhere

#dataengineering #datacontract #opensource #ai

Every data engineer who works across platforms knows this pain:

You build a clean ingestion layer for one platform. The next project is on another. You rewrite the same logic again.

The intent is always the same — ingest this source, this way, with these rules. Only the runtime changed: Databricks → Snowflake → Fabric → BigQuery.

The usual fix is a generic abstraction that "hides" the platform. But that almost always becomes a lowest common denominator — you lose Auto Loader, Iceberg, Unity Catalog, the very native features you chose the platform for.

So I built ContractForge to try a different idea: define the intent once, let each platform execute it its own native way.

The contract is just YAML

source:
  type: incremental_files
  path: s3://landing/orders
  format: json

target:
  catalog: main
  schema: bronze
  table: orders

mode: append
schema_policy: additive_only
quality_rules:
  not_null: [order_id]

The semantic core plans it, and each adapter renders native artifacts — Delta/Auto Loader on Databricks, Glue/Iceberg on AWS, and so on.

The part I care about most: honest portability

Most multi-platform tools lie — they "support everything" by silently degrading your semantics to fit the weakest engine.

ContractForge does the opposite. The planner returns one of:

SUPPORTED · SUPPORTED_WITH_WARNINGS · REVIEW_REQUIRED · UNSUPPORTED

If an adapter can't safely preserve your contract, it says so — instead of producing something that looks fine in a demo and breaks in production.

What's in it today

A platform-neutral semantic core for ingestion contracts
Native adapters for Databricks, AWS, Snowflake, Microsoft Fabric and GCP
Bronze-to-gold patterns, write modes (append, overwrite, upsert, hash-diff upsert, historical & snapshot reconciliation)
Deterministic validation, evidence artifacts and platform parity reports
ContractForge-AI — turns user intent into deterministic contract inputs, never skipping validation

It's a public technical preview

Not GA. I'm calling it a preview because it's already useful, documented and covered by 2,000+ tests — but still evolving as more real-world scenarios get validated.

Contributors welcome

This is a public technical preview, and the most useful thing right now is real-world pressure from people who do this for a living. If any of these sound like you, I'd love your help:

New adapters / sources — your platform or connector isn't covered yet? Let's add it.
Write modes & edge cases — break the upsert, hash-diff or snapshot logic with a scenario from your own pipelines.
Real contract scenarios — the more messy, real ingestion cases we validate, the stronger the core gets.
Docs & examples — found something unclear? That's a contribution too.
Just open an issue — even "this confused me" is valuable signal.

No contribution is too small — a typo fix, a question, or a "have you considered X?" all count. Check the contributing guide and pick anything that catches your eye.

GitHub: https://github.com/marquesantero/contractforge
Docs: https://marquesantero.github.io/contractforge/

What's your approach to ingestion across platforms today? Drop it in the comments. I genuinely want to know.