-Data pipeline is a sequences of activities that orchestrate an overall process; extraction, loading and transformation.
-Pipelines automate ETL processes. These processes run through control flow activities that manage branching, looping etc
- Graphical pipeline canvas : UI for pipelines building , minimal or no coding.
ACTIVITIES
Executable tasks in a pipeline.The outcome of a particular activity can be success, failure, competition.
- Data transformation activities: acty that encapsulate data transfer operations
- Copy Data : extract data from source and load destination
- Data Flow Acty: Transformations as data is being transferred
- Notebook Activities to run Spark Code
- Stored Procedure Actys: Run SQL code
- Delete data Actys: delete existing data
CONTROL FLOW ACTIVITIES
Activities that implement loops, conditional branching, manage variables and parameter values. These help implement complex pipeline logic
PARAMETERS
Pipelines can be parameterized to provide specific values to run pipeline. Using parameters increases reusability and flexibility of data.
PIPELINE RUNS
Each time a pipeline is executed a data pipeline run is initiated. Runs can be on demand.
Top comments (0)