QN : Ingest and transform data in a lakehouse

#architecture #database #dataengineering #sql

lakehouse has two storage areas ; Files and Tables
Files
- Store structured, queryable data by sql
- Supports schema definitions and ACID transactions
Tables
- Stores Raw or semi-structured data(CSV, parquet, JSON)
- No schema support
- Flexible for data explorations
Schema allows for logical ordering of data on business functions or domain (sales,marketing etc)
A dbo schema is enabled by default once a lakehouse is created
Schema-enabled lakehouses also support schema-level permissions and cross-workspace queries using the four-part namespace
Lakehouse mode : Lakehouse Explorer and SQL analytics endpoint
- Lakehouse Explorer: Allows managing, Update, create, upload of data.You can switch between tables in the lakehouse
- SQL anlytics endpoit : Does not allow modifying of the underlying data. You can query using TSQL at read only mode.
Loading data into lakehouse:

-Shortcuts allow you to reference external data reducing copies. Access is managed by One Lake.

Schema shortcuts map an entire schema to a folder of Delta tables in another lakehouse.
SQL analytics endpoint provides read-only access to lakehouse tables using T-SQL queries.
SQL USE CASES : adhoc queries, BI connections to power bi or azure data studio, Data validation
You can use SQL views to store reusable query logic. Views are useful when you need to apply business rules, simplify complex joins, or provide curated data for downstream consumers.
You can use Spark SQL for SQL-like queries or PySpark for programmatic data manipulation in Notebooks.
Spark SQL works well for familiar SQL patterns. PySpark provides greater flexibility for complex transformations and integration with Python libraries.
Power BI is the business intelligence and reporting layer in Fabric. It serves as the consumption layer where business users access data through interactive reports and dashboards.
Power BI can connect to lakehouse data in two ways:
- Query the SQL analytics endpoint
- Create a semantic model

DEV Community