DEV Community

Rakan
Rakan

Posted on

Data Lake Vs Data Warehouse Vs Data Mart

Well, data is everywhere every second of every day.As a backend developer, I used to dodge terms related to data engineering. However, due to a recent project, I've started learning more about it.


how-docker-works


So, I came across these terms: Data Lake, Data Warehouse, and Data Mart. I will break them down into simple terms that I can understand.

The format will be as follows:
- Definition: (Definition)
- Characteristics: (Characteristics)
- Why it exists: (Why it exists)
- Tools: (Tools that can be used to implement it)

  1. Data Lake:
    • Definition: A huge storage space for all raw data (For example: JSON, Videos, Database dumps, etc) where everything is dumped without organization.
    • Characteristics:
      • Stores raw data without modification.
      • Store structured, semi-structured, and unstructured data.
      • Can be Used for the entire data lifecycle.
    • Why it exists: Data is valuable nowdays and it can be used for many things. So, store it and you can use it later when you need it.
    • Tools:
      • Free: Hadoop Distributed File System (HDFS)
      • Paid: Amazon S3, Azure Data Lake Storage, Google Cloud Storage
  2. Data Warehouse:
    • Definition: An organized storage place where data is structured and cleaned.
    • Characteristics:
      • Stores data in a structured way.
      • Requires transformed and cleaned data.
      • Time-variant data, meaning any existing data will be archived after perid of time (Example: 1 year) and stored in the Data Lake.
    • Why it exists: Since data is stored in a structured way, it can be used for reporting and analysis.
    • Tools:
      • Free: PostgreSQL, MySQL, MariaDB (limitions: not scalable for HUGE data and not optimized for analytics purposes)
      • Paid: Amazon Redshift, Google BigQuery
  3. Data Mart:
    • Definition: A subset of a Data Warehouse, with a focus on specific topics.
    • Characteristics:
      • Users don't need advanced technical knowledge.
      • Subset of a Data Warehouse, smaller and topic-focused.
      • Users have read-only access to specific information.
    • Why it exists: Provides users a quick and easy access to data for specific topics.
    • Tools:
      • Free: Microsoft Power BI (limited features)
      • Paid: Microsoft Power BI, Tableau, QlikView, Looker

Resources:

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay