DEV Community

Cover image for Study Notes 4.1.1 - Analytics Engineering Basics
Pizofreude
Pizofreude

Posted on

Study Notes 4.1.1 - Analytics Engineering Basics

1. Introduction to Analytics Engineering

  • Analytics Engineering is a relatively new role that bridges the gap between Data Engineers and Data Analysts.
  • The evolution of cloud data warehouses (e.g., BigQuery, Snowflake, Redshift) has significantly reduced storage and computation costs.
  • Tools like Fivetran and Stitch have simplified the ETL (Extract, Transform, Load) process.
  • SQL-first BI tools like Looker and Mode have introduced version control into data workflows, improving self-service analytics and data governance.
  • The rise of these tools has changed how data teams operate and how stakeholders consume data, creating a need for the Analytics Engineer role.

2. Traditional Data Team Roles & the Role of Analytics Engineers

Traditional Data Roles

  1. Data Engineer
    • Prepares and maintains data infrastructure.
    • Ensures data pipelines are functional and efficient.
    • Focuses on the technical aspect of data storage and processing.
  2. Data Analyst
    • Works with data stored in the infrastructure to generate business insights.
    • Primarily uses SQL, BI tools, and dashboards to answer business questions.
  3. Data Scientist (not explicitly discussed but relevant)
    • Uses statistical and machine learning techniques to analyze data.
    • Writes code but is not necessarily a software engineer.

Challenges & the Emergence of Analytics Engineers

  • Data Scientists & Analysts are writing more code, but they are not trained as software engineers.
  • Data Engineers are skilled in software engineering but lack knowledge about business-specific data usage.
  • Analytics Engineers fill this gap by applying software engineering best practices (e.g., modularity, version control, CI/CD) to the work of Data Analysts and Scientists.

roles in data team

3. Tools & Responsibilities of an Analytics Engineer

Core Responsibilities

  • Integrating software engineering best practices into data transformation workflows.
  • Managing data pipelines efficiently.
  • Ensuring data quality, reliability, and accessibility.
  • Collaborating closely with Data Engineers, Analysts, and Business Teams.

Common Tools Used by Analytics Engineers

  1. Data Loading Tools: Fivetran, Stitch (ETL tools).
  2. Data Storage Management: Cloud Data Warehouses (BigQuery, Snowflake, Redshift).
  3. Data Transformation & Modeling: dbt (Data Build Tool), Dataform.
  4. Data Visualization & BI Tools: Google Data Studio, Looker, Mode.

tooling


4. ETL vs. ELT: Key Differences & Use Cases

ETL (Extract, Transform, Load)

  • Process: Data is extracted from sources, transformed before loading into a warehouse.
  • Advantages:
    • Ensures clean and structured data before storage.
    • More compliance-friendly (suitable for regulated industries).
  • Disadvantages:
    • Slower implementation due to upfront transformation.
    • Requires defined schema before data is loaded.

ELT (Extract, Load, Transform)

  • Process: Data is extracted and directly loaded into the warehouse before transformation.
  • Advantages:
    • Faster and more flexible (data is readily available for analysis).
    • Leverages cheap cloud storage and computation.
  • Disadvantages:
    • Data might be less structured initially, requiring on-the-fly transformations.
    • Can lead to messy datasets if not managed properly.

ETL vs ELT


5. Data Modeling Concepts (Kimball’s Dimensional Modeling)

Goals of Dimensional Modeling

  • Make data understandable to business users.
  • Optimize for fast query performance.
  • Unlike 3rd Normal Form (3NF), redundancy is acceptable for better usability.

Fact Tables vs. Dimension Tables

1. Fact Tables

  • Contain measurements, metrics, or facts related to business processes.
  • Represent actions or events (e.g., sales, orders).
  • Can be thought of as verbs in data modeling.

2. Dimension Tables

  • Provide context to fact tables.
  • Contain descriptive attributes (e.g., customer, product, location).
  • Can be thought of as nouns in data modeling.

Star Schema

  • A widely used data modeling approach where a fact table is surrounded by multiple dimension tables, forming a star-like structure.

6. Data Processing Analogy: The Kitchen Model

Kimball’s book presents an analogy comparing data processing to a restaurant workflow:

The Kitchen Model

  1. Staging Area (Raw Data) → Storage Room
    • Similar to raw ingredients in a restaurant.
    • Not accessible to everyone, only to data engineers who know how to handle raw data.
  2. Processing Area (Data Models) → Kitchen
    • Like a kitchen where raw ingredients are turned into prepared meals.
    • Handled by Analytics Engineers using data transformation tools (e.g., dbt, SQL transformations).
    • Focus on efficiency, standardization, and consistency.
  3. Presentation Area (Final Data) → Dining Hall
    • The final stage where data is presented to business users.
    • Similar to serving prepared food to restaurant customers.
    • Data is structured, visualized, and easy to interpret using BI tools.

7. Summary & Key Takeaways

  • Analytics Engineers bridge the gap between Data Engineers and Analysts by incorporating software engineering practices into data analytics.
  • They use modern tools like dbt, Fivetran, Snowflake, and BI tools to model, transform, and present data.
  • Understanding ETL vs. ELT is crucial for deciding how data should be processed efficiently.
  • Kimball’s Dimensional Modeling helps make data user-friendly and performant, prioritizing usability over strict normalization.
  • The Kitchen Model is a helpful analogy to understand the stages of data processing, from raw data to business-ready insights.

Sentry image

Hands-on debugging session: instrument, monitor, and fix

Join Lazar for a hands-on session where you’ll build it, break it, debug it, and fix it. You’ll set up Sentry, track errors, use Session Replay and Tracing, and leverage some good ol’ AI to find and fix issues fast.

RSVP here →

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay