Definition
A central repository of information that can be analyzed to make more informed decisions.
A data warehouse is specially designed for data analytics, which involves reading large amounts of data to understand relationships and trends across the data.
A data warehouse requires that the data be organized in a tabular format.
Architectures
Data Source Layer: Operational systems, CRM, ERP, and external files (Excel, JSON, SQL) that provide raw data.
Data Staging Area: A temporary storage area where data is cleaned, transformed, and integrated before loading into the main warehouse.
Storage Layer (Warehouse/Data Marts): The central database (RDBMS or Cloud) holding historical and structured data. It may use a Star or Snowflake Schema for organizing data.
Data Access/Analytics Layer: Front-end tools for reporting, dashboards, and BI (e.g., SQL tools, Tableau).
Metadata Layer: Data that describes the structure, relationships, and context of the stored data.
Benefits
- Informed decision making
- Consolidated data from many sources
- Historical data analysis
- Data quality, consistency, and accuracy
- Separation of analytics processing from transactional databases, which improves the performance of both systems
ETL (Extract Transform Load)
Definition:
is the process of combining data from multiple sources into a large, central repository called a data warehouse, By applying the process of extract, transform, and load (ETL).
OLAP (Online analytical processing)
Definition:
is a technology used to analyze data stored in a data warehouse by organizing it into multiple dimensions, enabling fast queries and meaningful business insights.
Data modeling
Data modeling: is the representation of data in data warehouses or online analytical processing (OLAP) databases as a star or snowflake schema.
Operations
- Roll up
- Drill down
- Slice
- Dice
- Pivot
Top comments (0)