DEV Community

Cover image for Azure & Python : List container blobs
Shashank Mishra
Shashank Mishra

Posted on

Azure & Python : List container blobs

Recently, I had come across a project requirement where I had to list all the blobs present in a Storage Account container and store the blob names in a CSV file.

I would like to share the Python script which I had created for this task keeping this tutorial as simple as possible.

I would be dividing this tutorial in 4 parts:

  1. Prerequisites
  2. Connection to Azure Storage Container
  3. Listing container blobs
  4. Writing the blob names to a CSV file

Prerequisites

  • Python (and PIP)
  • A code editor (I use VS Code)
  • A Microsoft Azure account (with storage account created)

Some packages/modules which would be required, these can be installed by running the following command on PowerShell, Command Prompt or Terminal (if on a Linux system):

  1. Azure blob storage module | Read more

pip install azure-storage-blob

Connection to Azure Storage Container

There are many ways to connect to a container. I would be covering connecting using a Connection String, SAS Token and SAS URL. Managed Identity and Key Vault connection methods require some configuration on Azure as well which would be beyond the scope of this tutorial (I would discuss it in another tutorial).

Get Connection String/SAS Token via Azure Portal

  • Connection string

Go to your storage account via the portal, on the left hand panel scroll down, click on Access keys and on the right hand side you will find a pair of Account keys and Connection strings.

Connection string

  • SAS Token/URL

Go to your storage account via the portal, on the left hand panel scroll down and click on Shared access signature. You will have to generate the tokens by selecting the appropriate check boxes according to your requirements. See the below screenshots for reference.

SAS URL

SAS URL

Note: The connection string generated here can be also be used. The only difference between this string and the one generated in the above section is that the string (token and URL as well) generated here has an expiry date.

Code

  • Connection String

Description

In line 5 & 6 the code asks for the connection string and the container name respectively. The reason behind is this is that we would want to establish a connection to a particular container.

blob_source_service_client = BlobServiceClient.from_connection_string(source_container_connection_string)

In the above snippet, in blob_source_service_client the connection instance to the storage account is stored.

source_container_client = blob_source_service_client.get_container_client(source_container_name)

Here using the connection instance of the storage account, we are establishing a connection to a specific container and storing the instance as well as returning it via source_container_client.

  • SAS Token

Description

In this the requirements are a but different, along with the SAS Token, the storage account URL would be required as well. This would change the function parameter list as well as the function call in main(). The significant difference would be

blob_source_service_client = BlobServiceClient(account_url = account_url, credential = token)

  • SAS URL

Description

blob_source_service_client = BlobServiceClient(source_container_sas_token)

The only major difference here is in line 5. We are passing the SAS URL directly to BlobServiceClient. Rest all is same as in Connection String section.

Read more at docs.microsoft.com

Listing container blobs

In the above section we have seen how to establish and return a connection instance. Let's jump right in to the next section.

Description

This function accepts two arguments, first the connection instance which we had created earlier and secondly the path for which the blobs have to be listed. So, the above function will print the blobs present in the container for a particular given path.

One important thing to take note of is that source_blob_list is an iterable object. The exact type is: <iterator object azure.core.paging.ItemPaged>, and yes, list_blobs() supports pagination as well.

In line 8, I am appending the blob names in a list. Now this list would be passed to the create_csv(blob_list) function. Discussed in the next section.

If all the container blobs are to be listed then an empty string (i.e. '') can be set to blob_path or the parameter can itself be omitted and the argument also can be removed from list_blobs().

Writing the blob names to a CSV file

Coming to the last part, this should be relatively simple and self explanatory.

A new CSV file would be generated at the location of the script, with the following contents,

Generated CSV

break;

I have created a script which comprises of all of the above code neatly jammed up.

Find the script here.

I hope that this tutorial was helpful. Reach out to me/comment below for any suggestions or queries.

Thanks for reading.

😁

Top comments (0)