- Dataflows are a type of cloud-based ETL (Extract, Transform, Load) tool for building and executing scalable data transformation processes.
- Dataflows offer a wide variety of transformations, and can be run manually, on a refresh schedule, or as part of a data pipeline orchestration
- A dataflow includes all of the transformations to reduce data prep time and then can be loaded into a new table, included in a data pipeline, or used as a data source by data analysts.
- Dataflows can be horizontally partitioned as well. Once you create a global dataflow, data analysts can use dataflows to create specialized semantic models for specific needs.
- Dataflows allow you to promote reusable ETL logic that prevents the need to create more connections to your data source.
- Benefits:
- Extend data with consistent data, such as a standard date dimension table.
- Allow self-service users access to a subset of data warehouse separately.
- Optimize performance with dataflows, which enable extracting data once for reuse, reducing data refresh time for slower sources.
- Simplify data source complexity by only exposing dataflows to larger analyst groups.
- Ensure consistency and quality of data by enabling users to clean and transform data before loading it to a destination.
- Simplify data integration by providing a low-code interface that ingests data from various sources.
Limitations:
- Dataflows aren't a replacement for a data warehouse.
- Row-level security isn't supported.
- Fabric capacity workspace is required
- you can create a Dataflow Gen2 in the Data Factory workload or Power BI workspace, or directly in the lakehouse. Since our scenario is focused on data ingestion, let's look at the Data Factory workload experience. Dataflows Gen2 use Power Query Online to visualize transformations
Top comments (0)