DEV Community

Cover image for Study Note dlt Fundamentals Course - Lesson 2: dlt Sources and Resources, Create First dlt Pipeline
Pizofreude
Pizofreude

Posted on

Study Note dlt Fundamentals Course - Lesson 2: dlt Sources and Resources, Create First dlt Pipeline

Overview

In this lesson, you learned about creating a dlt pipeline by grouping resources into a source. You also learned about dlt transformers and how to use them to perform additional steps in the pipeline.

Key Concepts

  • dlt Resources: A resource is a logical grouping of data within a data source, typically holding data of similar structure and origin.
  • dlt Sources: A source is a logical grouping of resources, e.g., endpoints of a single API.
  • dlt Transformers: Special dlt resources that can be fed data from another resource to perform additional steps in the pipeline.

Creating a dlt Pipeline

A dlt pipeline is created by grouping resources into a source. Here's an example of how to create a pipeline using a list of dictionaries:

@dlt.resource
def my_dict_list():
    return [
        {"id": 1, "name": "Pikachu"},
        {"id": 2, "name": "Charizard"}
    ]

Enter fullscreen mode Exit fullscreen mode

Using dlt Sources

A source is a logical grouping of resources. You can declare a source by decorating a function that returns or yields one or more resources with @dlt.source.

@dlt.source
def my_source():
    return [
        my_dict_list(),
        other_resource()
    ]

Enter fullscreen mode Exit fullscreen mode

Using dlt Transformers

dlt transformers are special resources that can be fed data from another resource to perform additional steps in the pipeline.

@dlt.transformer
def get_pokemon_info(data):
    for pokemon in data:
        response = requests.get(f"<https://pokeapi.co/api/v2/pokemon/{pokemon['id']}>")
        pokemon['info'] = response.json()
    return data

Enter fullscreen mode Exit fullscreen mode

Exercise 1: Create a Pipeline for GitHub API - Repos Endpoint

  • Explore the GitHub API and understand the endpoint to list public repositories for an organization.
  • Build the pipeline using dlt.pipeline, dlt.resource, and dlt.source to extract and load data into a destination.
  • Use duckdb connection, sql_client, or pipeline.dataset() to check the number of columns in the github_repos table.

Exercise 2: Create a Pipeline for GitHub API - Stargazers Endpoint

  • Create a dlt.transformer for the "stargazers" endpoint for the dlt-hub organization.
  • Use the github_repos resource as a main resource for the transformer.
  • Use duckdb connection, sql_client, or pipeline.dataset() to check the number of columns in the github_stargazer table.

Reducing Nesting Level of Generated Tables

You can limit how deep dlt goes when generating nested tables and flattening dicts into columns. By default, the library will descend and generate nested tables for all nested lists, without limit.

@dlt.source
def my_source():
    return [
        my_dict_list(nesting_level=1)
    ]

Enter fullscreen mode Exit fullscreen mode

Typical Settings

  • nesting_level: The number of levels to descend and generate nested tables.

Next Steps

  • Proceed to the next lesson to learn more about dlt pipelines and how to use them to extract and load data into a destination.

Image of Datadog

Create and maintain end-to-end frontend tests

Learn best practices on creating frontend tests, testing on-premise apps, integrating tests into your CI/CD pipeline, and using Datadog’s testing tunnel.

Download The Guide

Top comments (0)

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay