Recently, I had come across a project requirement where I had to list all the blobs present in a Storage Account container and store the blob names in a CSV file.
I would like to share the Python script which I had created for this task keeping this tutorial as simple as possible.
I would be dividing this tutorial in 4 parts:
- Prerequisites
- Connection to Azure Storage Container
- Listing container blobs
- Writing the blob names to a CSV file
Prerequisites
- Python (and PIP)
- A code editor (I use VS Code)
- A Microsoft Azure account (with storage account created)
Some packages/modules which would be required, these can be installed by running the following command on PowerShell, Command Prompt or Terminal (if on a Linux system):
- Azure blob storage module | Read more
pip install azure-storage-blob
Connection to Azure Storage Container
There are many ways to connect to a container. I would be covering connecting using a Connection String, SAS Token and SAS URL. Managed Identity and Key Vault connection methods require some configuration on Azure as well which would be beyond the scope of this tutorial (I would discuss it in another tutorial).
Get Connection String/SAS Token via Azure Portal
- Connection string
Go to your storage account via the portal, on the left hand panel scroll down, click on Access keys and on the right hand side you will find a pair of Account keys and Connection strings.
- SAS Token/URL
Go to your storage account via the portal, on the left hand panel scroll down and click on Shared access signature. You will have to generate the tokens by selecting the appropriate check boxes according to your requirements. See the below screenshots for reference.
Note: The connection string generated here can be also be used. The only difference between this string and the one generated in the above section is that the string (token and URL as well) generated here has an expiry date.
Code
- Connection String
Description
In line 5 & 6 the code asks for the connection string and the container name respectively. The reason behind is this is that we would want to establish a connection to a particular container.
blob_source_service_client = BlobServiceClient.from_connection_string(source_container_connection_string)
In the above snippet, in blob_source_service_client
the connection instance to the storage account is stored.
source_container_client = blob_source_service_client.get_container_client(source_container_name)
Here using the connection instance of the storage account, we are establishing a connection to a specific container and storing the instance as well as returning it via source_container_client
.
- SAS Token
Description
In this the requirements are a but different, along with the SAS Token, the storage account URL would be required as well. This would change the function parameter list as well as the function call in main(). The significant difference would be
blob_source_service_client = BlobServiceClient(account_url = account_url, credential = token)
- SAS URL
Description
blob_source_service_client = BlobServiceClient(source_container_sas_token)
The only major difference here is in line 5. We are passing the SAS URL directly to BlobServiceClient
. Rest all is same as in Connection String section.
Read more at docs.microsoft.com
Listing container blobs
In the above section we have seen how to establish and return a connection instance. Let's jump right in to the next section.
Description
This function accepts two arguments, first the connection instance which we had created earlier and secondly the path for which the blobs have to be listed. So, the above function will print the blobs present in the container for a particular given path.
One important thing to take note of is that source_blob_list
is an iterable object. The exact type is: <iterator object azure.core.paging.ItemPaged>
, and yes, list_blobs()
supports pagination as well.
In line 8, I am appending the blob names in a list. Now this list would be passed to the create_csv(blob_list)
function. Discussed in the next section.
If all the container blobs are to be listed then an empty string (i.e. '') can be set to blob_path
or the parameter can itself be omitted and the argument also can be removed from list_blobs()
.
Writing the blob names to a CSV file
Coming to the last part, this should be relatively simple and self explanatory.
A new CSV file would be generated at the location of the script, with the following contents,
break;
I have created a script which comprises of all of the above code neatly jammed up.
Find the script here.
I hope that this tutorial was helpful. Reach out to me/comment below for any suggestions or queries.
Thanks for reading.
😁
Top comments (0)