Sospeter Mong'are

Posted on Jul 2

Understanding Integration Runtimes in Azure Data Factory (ADF)

#azure #azuredatafactory #beginners #programming

When people begin learning Azure Data Factory (ADF), one concept that often causes confusion is the Integration Runtime (IR). Pipelines, datasets, and linked services are fairly intuitive, but many beginners wonder:

"If Azure Data Factory orchestrates everything, what exactly is an Integration Runtime?"

The simplest answer is this:

An Integration Runtime is the compute infrastructure that Azure Data Factory uses to move data, transform data, and execute activities.

Think of Azure Data Factory as the brain that plans the work, while the Integration Runtime is the muscle that carries it out.

Why Do We Need an Integration Runtime?

Imagine you're a logistics manager.

You create a schedule for transporting goods from Nairobi to Mombasa. You know what should be transported, when it should leave, and where it should arrive.

However, without a truck, nothing actually moves.

Azure Data Factory works the same way.

A pipeline defines what should happen, but the Integration Runtime is responsible for actually performing the work.

It is responsible for:

Moving data between different data sources
Running data transformation jobs
Connecting securely to cloud and on-premises systems
Executing activities within your pipeline

Without an Integration Runtime, your pipeline has instructions but no execution engine.

Types of Integration Runtime

Azure Data Factory provides three types of Integration Runtime, each designed for different scenarios.

1. Azure Integration Runtime

This is the default Integration Runtime provided and managed by Microsoft.

You don't install or maintain any servers. Azure automatically provisions and scales the compute resources needed to execute your workloads.

It is ideal when your data resides entirely in the cloud.

Common use cases include:

Copying data from Azure SQL Database to Azure Data Lake
Moving data between cloud storage accounts
Running Mapping Data Flows
Executing cloud-native activities

A typical architecture looks like this:

Azure SQL Database
        │
        ▼
Azure Integration Runtime
        │
        ▼
Azure Data Lake Storage

Advantages

Fully managed by Microsoft
Automatically scales
No infrastructure to maintain
Highly available

For most cloud-to-cloud data movement, Azure Integration Runtime is the recommended choice.

2. Self-hosted Integration Runtime (SHIR)

What happens if your data is not in Azure?

Suppose your organization has an on-premises SQL Server inside a corporate network protected by a firewall.

Azure Data Factory cannot directly access it.

This is where the Self-hosted Integration Runtime comes in.

The Self-hosted IR is software that you install on a Windows machine, virtual machine, or server within your network.

It acts as a secure bridge between Azure and your private environment.

Example architecture:

On-Prem SQL Server
        │
        ▼
Self-hosted Integration Runtime
        │
        ▼
Azure Blob Storage

Because the runtime resides inside your network, it can securely communicate with your local databases while also communicating with Azure Data Factory.

Common use cases

SQL Server
Oracle Database
MySQL
PostgreSQL
SAP
Local file servers
Private APIs
Any data source behind a firewall

This is one of the most commonly used Integration Runtimes in enterprise environments.

3. Azure SSIS Integration Runtime

Many organizations have existing ETL processes built using SQL Server Integration Services (SSIS).

Rather than rebuilding those packages from scratch, Azure allows you to run them in the cloud using Azure SSIS Integration Runtime.

Example:

SSIS Package
      │
      ▼
Azure SSIS Integration Runtime
      │
      ▼
Azure SQL Database

This option is mainly used during cloud migration projects where businesses want to continue using their existing SSIS investments.

A Real-World Example

Imagine you work for a bank.

Every night, customer transactions stored in an on-premises SQL Server need to be copied into Azure Data Lake for reporting and analytics.

The architecture would look something like this:

ADF Pipeline
      │
      ▼
Self-hosted Integration Runtime
      │
      ▼
On-Prem SQL Server
      │
      ▼
Azure Data Lake Storage

Here's what happens:

The pipeline starts according to a schedule.
Azure Data Factory sends the job to the Self-hosted Integration Runtime.
The Self-hosted IR connects to the SQL Server.
It reads the required data.
It securely transfers the data into Azure Data Lake.
The pipeline completes successfully.

Notice that Azure Data Factory never directly connects to the SQL Server. The Self-hosted Integration Runtime performs that task.

Which Integration Runtime Should You Use?

The choice depends on where your data resides and what type of workload you're running.

Scenario	Recommended Integration Runtime
Azure SQL Database to Azure Blob Storage	Azure Integration Runtime
Azure SQL Database to Azure Data Lake	Azure Integration Runtime
SQL Server on-premises to Azure	Self-hosted Integration Runtime
Oracle behind a firewall	Self-hosted Integration Runtime
Local files to Azure Storage	Self-hosted Integration Runtime
Running Mapping Data Flows	Azure Integration Runtime
Running existing SSIS packages	Azure SSIS Integration Runtime

Understanding the Difference Between Pipelines and Integration Runtime

One common misconception among beginners is assuming that pipelines execute the work.

They don't.

A pipeline is simply an orchestration layer.

Think of the relationship this way:

Pipeline decides what should happen.
Activity defines the specific task.
Linked Service stores the connection information.
Dataset represents the data being used.
Integration Runtime performs the actual work.

Without an Integration Runtime, the pipeline has instructions but no execution engine.

A Simple Analogy

Imagine you're running a courier company.

The customer places an order.
The operations manager plans the delivery.
The address tells the driver where to collect and deliver the package.
The package is the data.
The truck transports the package.

In Azure Data Factory:

Customer order = Business requirement
Pipeline = Delivery plan
Activity = Delivery task
Linked Service = Pickup and destination addresses
Dataset = The package
Integration Runtime = The truck

Without the truck, nothing gets delivered.

Final Thoughts

Integration Runtime is one of the most important concepts in Azure Data Factory because it is responsible for executing your data integration workloads.

Choosing the correct Integration Runtime depends on your environment:

Use Azure Integration Runtime for cloud-to-cloud data movement and transformations.
Use Self-hosted Integration Runtime when accessing on-premises or private network resources.
Use Azure SSIS Integration Runtime when migrating or running existing SSIS packages in Azure.

Once you understand the role of the Integration Runtime, Azure Data Factory becomes much easier to visualize. Instead of thinking of it as just another Azure service, you'll see it as a complete orchestration platform where pipelines coordinate the work, and Integration Runtimes provide the execution power that makes everything happen.

DEV Community