Every data engineer who works across platforms knows this pain:
You build a clean ingestion layer for one platform. The next project is on another. You rewrite the same logic again.
The intent is always the same — ingest this source, this way, with these rules. Only the runtime changed: Databricks → Snowflake → Fabric → BigQuery.
The usual fix is a generic abstraction that "hides" the platform. But that almost always becomes a lowest common denominator — you lose Auto Loader, Iceberg, Unity Catalog, the very native features you chose the platform for.
So I built ContractForge to try a different idea: define the intent once, let each platform execute it its own native way.
The contract is just YAML
source:
type: incremental_files
path: s3://landing/orders
format: json
target:
catalog: main
schema: bronze
table: orders
mode: append
schema_policy: additive_only
quality_rules:
not_null: [order_id]
The semantic core plans it, and each adapter renders native artifacts — Delta/Auto Loader on Databricks, Glue/Iceberg on AWS, and so on.
The part I care about most: honest portability
Most multi-platform tools lie — they "support everything" by silently degrading your semantics to fit the weakest engine.
ContractForge does the opposite. The planner returns one of:
SUPPORTED · SUPPORTED_WITH_WARNINGS · REVIEW_REQUIRED · UNSUPPORTED
If an adapter can't safely preserve your contract, it says so — instead of producing something that looks fine in a demo and breaks in production.
What's in it today
- A platform-neutral semantic core for ingestion contracts
- Native adapters for Databricks, AWS, Snowflake, Microsoft Fabric and GCP
- Bronze-to-gold patterns, write modes (
append,overwrite,upsert, hash-diff upsert, historical & snapshot reconciliation) - Deterministic validation, evidence artifacts and platform parity reports
- ContractForge-AI — turns user intent into deterministic contract inputs, never skipping validation
It's a public technical preview
Not GA. I'm calling it a preview because it's already useful, documented and covered by 2,000+ tests — but still evolving as more real-world scenarios get validated.
Contributors welcome
This is a public technical preview, and the most useful thing right now is real-world pressure from people who do this for a living. If any of these sound like you, I'd love your help:
- New adapters / sources — your platform or connector isn't covered yet? Let's add it.
-
Write modes & edge cases — break the
upsert, hash-diff or snapshot logic with a scenario from your own pipelines. - Real contract scenarios — the more messy, real ingestion cases we validate, the stronger the core gets.
- Docs & examples — found something unclear? That's a contribution too.
- Just open an issue — even "this confused me" is valuable signal.
No contribution is too small — a typo fix, a question, or a "have you considered X?" all count. Check the contributing guide and pick anything that catches your eye.
GitHub: https://github.com/marquesantero/contractforge
Docs: https://marquesantero.github.io/contractforge/
What's your approach to ingestion across platforms today? Drop it in the comments. I genuinely want to know.
Top comments (0)