How I Reverse-Engineered a Reverse ETL Tool and Wrote the Docs Nobody Had

#python #tutorial #opensource #datascience

drt is an open-source reverse ETL tool. Five destination connectors existed. No guide for building new ones. No documentation beyond the source code.

This post walks through the process of reverse-engineering the connector architecture, shipping five new connectors, and writing the official tutorial that got merged.

The approach

Start with the source, not the README. The actual implementation files tell you what the maintainers intended.

drt/destinations/base.py defines the Destination Protocol with one method:

class Destination(Protocol):
    def load(
        self,
        records: list[dict[str, Any]],
        config: DestinationConfig,
        sync_options: SyncOptions,
    ) -> SyncResult:
        ...

That is the entire interface. One method. Takes records, config, and options. Returns success/failure counts. Every destination - Slack, PostgreSQL, REST API, Discord - implements this same method.

Mapping the architecture

I traced the full flow by reading backwards from the CLI:

CLI (_get_destination) -> isinstance check -> Destination.load()
                                                    |
                                            Config model (Pydantic)
                                            with type: Literal["xxx"]
                                                    |
                                            DestinationConfig union
                                            (discriminated by type field)

Four files. That is it. To add a new destination, you touch four files:

Config model in drt/config/models.py - a Pydantic BaseModel with type: Literal["your_type"]
Destination class in drt/destinations/your_dest.py - implements load()
CLI registration in drt/cli/main.py - one isinstance branch
Tests in tests/unit/test_your_dest.py

No plugin registry. No entry points. No dynamic discovery. Just a Pydantic discriminated union and an isinstance chain. Simple enough that I could hold the whole architecture in my head.

Five connectors from one pattern

Once the pattern is clear, building connectors becomes repetitive:

ClickHouse - database destination with batch inserts
Snowflake - cloud warehouse with snowflake-connector-python
Parquet - file-based output for data lake patterns
Teams - Microsoft Teams webhook notifications
CSV/JSON - simple file export

Each one followed the same pattern:

Config model with destination-specific fields
load() method iterating records with RowError on failure
resolve_env() for secrets (never hardcode credentials)
RateLimiter + with_retry() for HTTP destinations
try/finally for database connection cleanup
Respect on_error: "fail" returns early, "skip" continues

All five connectors were merged into the main branch.

Writing the tutorial nobody had

After five connectors, the pattern was clear. But the next contributor should not have to read five implementations to learn it. So the obvious next step was to write the guide.

PR: drt-hub/drt#332 - merged.

The tutorial walks through building a fictional Webhook destination step by step:

Config model with Pydantic validators
Destination class with the full load() implementation
CLI registration (one line)
Tests using pytest-httpserver for HTTP destinations or unittest.mock for databases

I included a checklist at the end - 14 items that every connector should satisfy. Things like "uses resolve_env() for secrets" and "respects on_error setting" and "builds RowError on per-row failures."

Lessons on reverse engineering open source

Start with the interface, not the implementation. base.py told me everything I needed to know about the contract. The implementations were just variations on the theme.
Read the CLI entry point. _get_destination() showed me exactly how destinations are discovered and instantiated. No magic, no reflection, just isinstance checks.
The config layer is the key. Pydantic discriminated unions with type: Literal["xxx"] meant the YAML config drives everything. Understanding the config model meant understanding the whole system.
Test patterns are documentation. The existing tests showed me what the maintainers considered important: success path, error-skip, error-fail, missing credentials, connection cleanup.
Write the docs you wish existed. Five implementations is enough context to write the guide. The next person should not have to repeat the journey.