DEV Community

Pawan Singh Kapkoti
Pawan Singh Kapkoti

Posted on

How I Reverse-Engineered a Reverse ETL Tool and Wrote the Docs Nobody Had

drt is an open-source reverse ETL tool. Five destination connectors existed. No guide for building new ones. No documentation beyond the source code.

This post walks through the process of reverse-engineering the connector architecture, shipping five new connectors, and writing the official tutorial that got merged.

The approach

Start with the source, not the README. The actual implementation files tell you what the maintainers intended.

drt/destinations/base.py defines the Destination Protocol with one method:

class Destination(Protocol):
    def load(
        self,
        records: list[dict[str, Any]],
        config: DestinationConfig,
        sync_options: SyncOptions,
    ) -> SyncResult:
        ...
Enter fullscreen mode Exit fullscreen mode

That is the entire interface. One method. Takes records, config, and options. Returns success/failure counts. Every destination - Slack, PostgreSQL, REST API, Discord - implements this same method.

Mapping the architecture

I traced the full flow by reading backwards from the CLI:

CLI (_get_destination) -> isinstance check -> Destination.load()
                                                    |
                                            Config model (Pydantic)
                                            with type: Literal["xxx"]
                                                    |
                                            DestinationConfig union
                                            (discriminated by type field)
Enter fullscreen mode Exit fullscreen mode

Four files. That is it. To add a new destination, you touch four files:

  1. Config model in drt/config/models.py - a Pydantic BaseModel with type: Literal["your_type"]
  2. Destination class in drt/destinations/your_dest.py - implements load()
  3. CLI registration in drt/cli/main.py - one isinstance branch
  4. Tests in tests/unit/test_your_dest.py

No plugin registry. No entry points. No dynamic discovery. Just a Pydantic discriminated union and an isinstance chain. Simple enough that I could hold the whole architecture in my head.

Five connectors from one pattern

Once the pattern is clear, building connectors becomes repetitive:

  • ClickHouse - database destination with batch inserts
  • Snowflake - cloud warehouse with snowflake-connector-python
  • Parquet - file-based output for data lake patterns
  • Teams - Microsoft Teams webhook notifications
  • CSV/JSON - simple file export

Each one followed the same pattern:

  • Config model with destination-specific fields
  • load() method iterating records with RowError on failure
  • resolve_env() for secrets (never hardcode credentials)
  • RateLimiter + with_retry() for HTTP destinations
  • try/finally for database connection cleanup
  • Respect on_error: "fail" returns early, "skip" continues

All five connectors were merged into the main branch.

Writing the tutorial nobody had

After five connectors, the pattern was clear. But the next contributor should not have to read five implementations to learn it. So the obvious next step was to write the guide.

PR: drt-hub/drt#332 - merged.

The tutorial walks through building a fictional Webhook destination step by step:

  1. Config model with Pydantic validators
  2. Destination class with the full load() implementation
  3. CLI registration (one line)
  4. Tests using pytest-httpserver for HTTP destinations or unittest.mock for databases

I included a checklist at the end - 14 items that every connector should satisfy. Things like "uses resolve_env() for secrets" and "respects on_error setting" and "builds RowError on per-row failures."

Lessons on reverse engineering open source

  1. Start with the interface, not the implementation. base.py told me everything I needed to know about the contract. The implementations were just variations on the theme.

  2. Read the CLI entry point. _get_destination() showed me exactly how destinations are discovered and instantiated. No magic, no reflection, just isinstance checks.

  3. The config layer is the key. Pydantic discriminated unions with type: Literal["xxx"] meant the YAML config drives everything. Understanding the config model meant understanding the whole system.

  4. Test patterns are documentation. The existing tests showed me what the maintainers considered important: success path, error-skip, error-fail, missing credentials, connection cleanup.

  5. Write the docs you wish existed. Five implementations is enough context to write the guide. The next person should not have to repeat the journey.

The code

Top comments (0)