- lakehouse has two storage areas ; Files and Tables
- Files
- Store structured, queryable data by sql
- Supports schema definitions and ACID transactions
- Tables
- Stores Raw or semi-structured data(CSV, parquet, JSON)
- No schema support
- Flexible for data explorations
- Schema allows for logical ordering of data on business functions or domain (sales,marketing etc)
- A dbo schema is enabled by default once a lakehouse is created
- Schema-enabled lakehouses also support schema-level permissions and cross-workspace queries using the four-part namespace
- Lakehouse mode : Lakehouse Explorer and SQL analytics endpoint
- Lakehouse Explorer: Allows managing, Update, create, upload of data.You can switch between tables in the lakehouse
- SQL anlytics endpoit : Does not allow modifying of the underlying data. You can query using TSQL at read only mode.
- Loading data into lakehouse:
- Upload data into files/ folders on the explorer
- Load into delta tables (no code)
- Transform using power query in dataflow gen2
- INgest into notebooks using apache spark (programmatically)
- Use Copy data to move data into differnt sources using data factory pipelines
-Shortcuts allow you to reference external data reducing copies. Access is managed by One Lake.
- Schema shortcuts map an entire schema to a folder of Delta tables in another lakehouse.
- SQL analytics endpoint provides read-only access to lakehouse tables using T-SQL queries.
- SQL USE CASES : adhoc queries, BI connections to power bi or azure data studio, Data validation
- You can use SQL views to store reusable query logic. Views are useful when you need to apply business rules, simplify complex joins, or provide curated data for downstream consumers.
- You can use Spark SQL for SQL-like queries or PySpark for programmatic data manipulation in Notebooks.
- Spark SQL works well for familiar SQL patterns. PySpark provides greater flexibility for complex transformations and integration with Python libraries.
- Power BI is the business intelligence and reporting layer in Fabric. It serves as the consumption layer where business users access data through interactive reports and dashboards.
- Power BI can connect to lakehouse data in two ways:
- Query the SQL analytics endpoint
- Create a semantic model
Top comments (0)