DEV Community

Cover image for Study Notes 4.3.2 - Testing and Documenting the Project
Pizofreude
Pizofreude

Posted on

Study Notes 4.3.2 - Testing and Documenting the Project

1. Introduction to Data Testing

  • Objective: Ensure data delivered to end-users is accurate and prevent building models on incorrect data.
  • DVT Tests: Data Validation Tests (DVT) are assumptions about data. They are SQL-based checks that validate data integrity.
    • Purpose: Identify errors quickly and stop execution if assumptions fail.
    • Implementation: Defined in YAML files and compiled into SQL code.

2. Types of DVT Tests

  • Four Default Tests:
    1. Unique Test: Ensures a field's uniqueness within a data model.
    2. Not Null Test: Ensures a field does not contain null values.
    3. Accepted Values Test: Validates that a field contains only specified values.
    4. Foreign Key Test: Ensures a field has a valid relationship with another table.
  • Example:
    • Payment Type Description: Must be one of the values 1, 2, 3, 4, or 5.
    • Pickup Location: Must relate to a valid taxi zone in the ref_taxi_lookup table.
    • Trip ID: Must be unique and not null (primary key).

3. Compiling and Running Tests

  • YAML File: Contains test definitions.
  • SQL Compilation: Tests are translated into SQL queries.
    • Example: SELECT * FROM production.my_schema.yellow_trip_data WHERE trip_id IS NULL.
  • Test Execution: If no values are returned, the test passes. If values are returned, the test fails or issues a warning.

4. Macros in DVT

  • Cross-Database Macros: Abstract SQL flavor differences across databases (e.g., BigQuery vs. PostgreSQL).
    • Example: DATE_TRUNC and SAFE_CAST macros adapt to the underlying database.
  • Usage: Simplifies SQL code and ensures compatibility across databases.

5. Code Generation with Kosan Package

  • Purpose: Automates the generation of YAML files and base models.
  • Features:
    • Generate Source: Creates YAML files for database sources.
    • Generate Base Model: Creates CTE-based models.
    • Generate Model YAML: Automates YAML file creation for models.
  • Usage:
    • Add the Kosan package to the project.
    • Use functions like generate_model_yaml to create YAML files for multiple models.

6. Adding Tests to Models

  • Steps:
    1. Define unique and not null tests for primary keys (e.g., trip_id).
    2. Add relationship tests for foreign keys (e.g., pickup_location_id must relate to ref_taxi_lookup).
    3. Add accepted values tests for specific fields (e.g., payment_type must be 1, 2, 3, 4, or 5).
  • Severity Levels:
    • Warning: Issues a warning but does not stop execution.
    • Error: Stops execution if the test fails.
    • Threshold: Define thresholds for warnings (e.g., up to 10 records can fail).

7. Building and Testing the Project

  • Process:
    1. Build the project, starting with sources and seeds.
    2. Execute tests for each model as it is built.
    3. Parallel execution: Independent models are built simultaneously.
  • Test Results:
    • Warnings or errors are displayed for failed tests.
    • Example: A warning is issued if payment_type contains values outside the accepted range.

8. Documentation in DVT

  • Purpose: Create comprehensive documentation for the project.
  • Sources of Documentation:
    1. YAML Files: Descriptions of models and columns.
    2. Code: Lineage and dependencies extracted from the code.
    3. Information Schema: Metadata like table size.
  • Generating Documentation:
    • Use dbt docs generate to create documentation files.
    • Use dbt docs serve to host documentation locally.
  • Documentation Features:
    • Lineage diagrams.
    • Test results and descriptions.
    • Compiled SQL code and dependencies.

9. Hosting Documentation

  • Locally: Use dbt docs serve to host documentation on localhost.
  • Cloud: Documentation can be hosted on dbt Cloud.
  • Example: Documentation includes model descriptions, column details, and test results.

10. Additional Tools and Packages

  • dbt Utils: Provides additional utilities for testing and documentation.
  • dbt Expectations: Based on Great Expectations, offers advanced testing capabilities.

11. Key Takeaways

  • Testing: Essential for ensuring data accuracy and integrity.
  • Automation: Use tools like Kosan to automate YAML and model generation.
  • Documentation: Critical for maintaining transparency and understanding of the project.
  • Flexibility: DVT macros and packages allow for cross-database compatibility and advanced testing.

Example YAML Snippet for Tests

version: 2

models:
  - name: yellow_trip_data
    columns:
      - name: trip_id
        tests:
          - unique
          - not_null
      - name: pickup_location_id
        tests:
          - relationships:
              to: ref_taxi_lookup
              field: location_id
      - name: payment_type
        tests:
          - accepted_values:
              values: [1, 2, 3, 4, 5]

Enter fullscreen mode Exit fullscreen mode

Commands for Documentation

# Generate documentation
dbt docs generate

# Serve documentation locally
dbt docs serve

Enter fullscreen mode Exit fullscreen mode

By following these steps and utilizing the tools discussed, you can ensure your data models are robust, well-tested, and thoroughly documented.

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay