DEV Community

Alec Dutcher
Alec Dutcher

Posted on • Edited on

DP-203 Study Guide - Manage batches and pipelines

Study guide

Azure Batch

  • Azure Batch
    • Platform to run high-performance computing jobs in parallel at large scale
    • Manages cluster of machines and supports autoscaling
    • Allows you to install applications that can run as a job
    • Schedule and run jobs on cluster machines
    • Pay per minute for resources used
  • How it works
    • Pool = cluster of machines/nodes
    • Slot = set of resources used to execute a task
    • Define number of slots per node
      • Increase slots per node to improve performance without increasing cost
    • Job assigns tasks to slots on nodes
    • Application is installed on each node to execute the tasks
    • Specify application packages at pool or task level

Configure the batch size

  • In the portal (Batch account)
    • Choose Pools in the left-side panel
    • Add a new pool and name it
    • Define the OS image (publisher and sku)
    • Choose VM size (determines cores and memory))
    • Choose fixed or auto scale for nodes
      • If fixed, select number of nodes
    • Choose application packages and versions, uploading files if necessary
    • Use Mount configuration to mount storage file shares, specifying the account name and access key of the storage account

Trigger batches

  • In the portal (Batch)
    • Confirm that the pool is in steady state and the nodes are in idle state
    • Choose Jobs in the left-side panel and add a new job
    • Name the job and select the pool
    • Open the job and select Tasks in the left-side panel
    • Define name and description
    • Enter the command in the command line box that will run on each machine
      • Reference installed packages with %AZ_BATCH_APP_PACKAGE_#%
      • Reference path to input fileshare with -i S:<file_path>
      • Reference path to output with S:<file_path>
    • Submit task
  • In Azure Data Factory and Azure Synapse
    • To run a single task in ADF
      • Create linked service to Azure Batch
        • Need Batch account name, account endpoint, and primary access key from the Keys section in the Batch portal
        • Also need the name of the pool
      • Create pipeline to run Custom Batch activity
        • Select linked service under the Azure Batch option in the activity settings
      • Define command to execute utility
        • Enter in the Command box under Settings for the activity
    • To run multiple tasks in parallel
      • Get list of files using Get Metadata activity in the General option
        • Configure data set and linked service with Azure File Storage
        • Use the Field list to select Child items
      • Use a ForEach activity to iterate through the Child items
        • Use dynamic content in the Command to add the filename for each file

Handle failed batch loads

  • Failure types
    • Infrastructure - pool and node errors
    • Application - job and task errors
  • Pool errors
    • Resizing failure - pool is unable to provision a node within the resize timeout window (default is 15 mins)
    • Insufficient quota - account has limited number of core quotas, and if allocation exceeds this number then it fails (raise support ticket to increase quota)
    • Scaling failures - formula is used to determine autoscaling, and formula evaluation can fail (check logs to find issue)
  • Node issues
    • App package download failure - node set to unusable, needs to be reimaged
    • Node OS updates - tasks can be interrupted by updates, auto update can be disabled
    • Node in unusable state - even if pools is ready pool can be in unusable state (VM crash, firewall block, invalid app package), needs to be re-imaged
    • Node disk is full
  • Rebooting and re-imaging can be done in the Batch portal under Pools
  • The Connect option in portal allows you to use RDP/SSH to connect to the VM
    • Define user details
    • Set as Admin
    • Download RDP file and enter user credentials
    • This opens Server Manager window where you can navigate the file system to check application package installations

Validate batch loads

  • Job errors
    • Timeout
      • Max wall clock time defines max time allowed for job to run from the time it was created
      • Default value is unlimited
      • If max is reached, running tasks are killed
      • Increase max wall clock value to prevent timeout
    • Failure of job-related tasks
      • Each job has job-related preparation tasks that run once for the job
      • Job prep task runs on each node as soon as job is created
      • Job release task runs on each node when job terminates
      • Failures can occur in these tasks
  • Task errors
    • Task waiting - dependency on another task
    • Task timeout- check max wall clock time
    • Missing app packages or resource files
    • Error in command defined in the task
    • Check stdout and stderr logs for details
  • In the Batch portal under node details, you can specify a container where log files are stored for future reference

Configure batch retention

  • Retention time defines how long to keep task directory on node once task is complete
  • Configure at Job level or Task level
    • Retention time field in advanced settings
    • Default is 7 days unless removed or deleted

Manage data pipelines in Azure Data Factory or Azure Synapse Pipelines

  • Ways to run pipelines
    • Debug Run
      • Don't need to save changes
      • Directly run pipelines with draft changes
      • Manual, can't be scheduled
    • Trigger Run
      • Need to publish changes first
      • Only runs published version of pipeline
      • Can be manual or scheduled

Schedule data pipelines in Data Factory or Azure Synapse Pipelines

  • Trigger types
    • Scheduled - run on wall-clock schedule
    • Tumbling window - run at periodic intervals while maintaining state
    • Storage event - run pipeline when file is uploaded or deleted from a storage account
    • Custom event trigger - runs pipeline when event is raised by Azure Event Grid
  • Scheduled vs tumbling triggers
    • Scheduled
      • Only supports future-dated loads
      • Does not maintain state, only fire and forget
    • Tumbling
      • Can run back-dated and future-dated loads
      • Maintains state (completed loads)
      • Passes start and end timestamps of window as parameters
      • Can be used to add dependency between pipelines, allowing complex scenarios

Implement version control for pipeline artifacts

  • Authoring modes
    • Live mode (default)
      • Authoring directly against pipelines
      • No option to save draft changes
      • Need to publish to save valid changes
      • Need manually created ARM templates to deploy pipelines to other environments
    • Git Repo mode
      • Repo can be in ADO or GitHub
      • All artifacts can be stored in source control
      • Draft changes can be saved even if not valid
      • Autogenerates ARM templates for deployment in other environments
      • Enables DevOps features (PRs, reviews, collab)

Manage Spark jobs in a pipeline

  • Pipeline activities for Spark
    • Synapse - Spark notebook, Spark job
    • Databricks - notebook, Jar file, Python file
    • HDInsight activities - Spark Jar/script
  • Monitoring Spark activities
    • Monitoring built in to ADF
    • Platform monitoring (Synapse, Databricks)
      • In ADF/Synapse, go to Montior --> Apache Spark applications and select a specific run for details
    • Spark UI

Image of Datadog

The Future of AI, LLMs, and Observability on Google Cloud

Datadog sat down with Google’s Director of AI to discuss the current and future states of AI, ML, and LLMs on Google Cloud. Discover 7 key insights for technical leaders, covering everything from upskilling teams to observability best practices

Learn More

Top comments (0)

AWS GenAI LIVE image

How is generative AI increasing efficiency?

Join AWS GenAI LIVE! to find out how gen AI is reshaping productivity, streamlining processes, and driving innovation.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay