Arunkumar Panneerselvam

Posted on Sep 23

Azure Data Factory: Building a Simple Pipeline - Step-by-Step Guide

Azure Data Factory (ADF) makes it easy to move, transform, and automate data workflows in the cloud. In this post, I will walk through creating a simple ADF pipeline from setting up core resources and GitHub integration to copying data between storage containers and monitoring the entire process.

1. Creating a Data Factory Instance

To get started, I created a new data factory resource in Azure named coredata-datafactory1.

Navigation Steps:

Log in to the Azure Portal.
Select “Create a resource” > “Analytics” > “Data Factory.”
Fill in the resource details: subscription, resource group, unique factory name, region, and version (V2) [leaving other tabs to defaults]
Refer Step 2 for Git Configuration (this can be done later too !)

Proceed to the “Review + create” tab and confirm.

After deployment, hit “Go to resource.”

2. Setting Up GitHub Integration

Source control is key for managing changes and collaborating with ease. I set up my GitHub account, created a private repository named coredata-azuredatafactory.

Authorize ADF to access the repo,

3. Exploring Azure Data Factory Studio

Next, I launched the Data Factory Studio. The homepage provides a simple navigation pane, making it easy to design and monitor your data flows.

4. Creating a Storage Account

For this demo, I created a storage account named coredatadatastorage1. The account serves as both the data source and destination.

Navigation Steps:

In the Azure Portal, click “Create a resource” > “Storage” > “Storage account.”

Specify the required configuration (resource group, region, account name, redundancy).

On the “Review + create” screen, review the settings and click “Create.”

Once deployed, go to the resource overview page.

5. Designing the Data Pipeline

I created a simple pipeline called data_copy_pipeline to copy data from an input directory to an output directory in Blob Storage.

Open Data Factory Studio and click on “Author.”

Select “Pipeline” and click “New pipeline.”

Name your pipeline (data_copy_pipeline).

Go to "Activities" and under "Move and transform," select the "Copy data"

6. Creating Datasets

For this pipeline, I needed datasets to represent the source (input file) and sink (output file).

Source Dataset:

Choose Azure Blob Storage as the source.

As my file is (.txt) I selected binary format,

Sink Dataset:

Set Azure Blob Storage as the destination (same as source).

Name the output file (e.g., test_data_out.log).

7. Configuring Linked Services

Each dataset requires a linked service that defines how ADF connects to the underlying storage.

Create a linked service for the source by providing storage account credentials and selecting the container or directory.

Select the test input file (e.g., test_data.txt).

It should look like,

Repeat for the sink (output container and path).

Test the connection to ensure setup is correct.

8. Running & Monitoring the Pipeline

With everything set, I manually triggered the pipeline using the “Trigger now” button.

After execution, navigate to “Monitor” and check the status under “Pipeline runs.”

Review activity details in “Activity runs”,

Inspect the details tab for run-specific metadata and logs,

9. Verifying the Output

Finally, I verified that the output file was written to the specified sink directory in Blob Storage (test_data_out.log).

Summary

By following these steps, you can quickly spin up your own Azure Data Factory, connect it with source control, design simple data pipelines, and move data between Azure storage resources. The visual tools and straightforward setup make it incredibly friendly—even for those just getting started with cloud data engineering.

Conclusion

Azure Data Factory empowers users to orchestrate cloud-scale data movement and transformation with minimal setup or code. Its integration with GitHub gives you confidence in version control, and the monitoring features keep you in the loop at every stage. Next, you can explore scheduling, data transformations, and integrating with other services—but even a basic pipeline gives you a strong foundation for bigger data projects.

Thank you for reading! If you found this post helpful or inspiring, please leave a comment below with your thoughts or questions. I would love to hear your feedback and experiences. Feel free to share this article with friends or colleagues who might benefit too.

Keep transforming and exploring new data possibilities with Azure Data Factory!

DEV Community

Azure Data Factory: Building a Simple Pipeline - Step-by-Step Guide

1. Creating a Data Factory Instance

Navigation Steps:

2. Setting Up GitHub Integration

3. Exploring Azure Data Factory Studio

4. Creating a Storage Account

Navigation Steps:

5. Designing the Data Pipeline

6. Creating Datasets

Source Dataset:

Sink Dataset:

7. Configuring Linked Services

8. Running & Monitoring the Pipeline

9. Verifying the Output

Summary

Conclusion

Top comments (0)