Data engineering in Microsoft Fabric enables users to design, build, and maintain infrastructures and systems that enable their organizations to collect, store, process, and analyze large volumes of data.
Fabric data engineering: enables you to;
- Create and manage your data using a lakehouse
- Design data pipelines to copy data into your lakehouse
- use spark job definitions to submit batch/streaming job to spark cluster
- use notebooks to write code for ELT processes
What is a Lakehouse:
Data architectures that enables organizations to store and manage structures data in a single location, using tools and frameworks to process and analyze that data e.g. SQL queries on
the SQL endpoint.
What is an Apache Spark job definition:
These are sets of instructions that define how to execute a job on a spark cluster.
For instance: input/output data source, the transformation and the configuration settings for the spark application.
Spark job definition allows data engineers to submit batch/streaming job to spark cluster, perform transformations on the data hosted in the lakehouse etc.
What is a notebook:
An interactive compute environment that allows users to create and share documents containing live code, equations visualizations, and narrative text.
Users can write code in Python, R, and Scala to perform data ingestion, preparation, analysis, and other data-related tasks.
What is a data pipeline:
Series of steps that are used to collect, process, and transform raw data to a format that can be used for analysis and decision-making.
data pipelines are crucial in that they help move data from its source to its destination in a reliable, scalable and efficient way.
Reference, Data Engineering in Microsoft Fabric.
Top comments (0)