DEV Community

Cover image for Azure function to generate excel file using CSV
Madhu Sharma
Madhu Sharma

Posted on

Azure function to generate excel file using CSV

Prerequisites:

  • Azure function app with blob trigger setup

NOTE This article will not explain how to set up azure function with blob trigger but here is one post that might help
Azure Functions - Creating a new function

In this article, we will see how to read and process the CSV file uploaded to Azure Blob Storage using Azure Functions. We will be using python as programming language.

This article will cover:

  1. Reading CSV from azure blob storage using python
  2. Adding python matplotlib plot without saving graph image locally
  3. Add data to excel using xlsxwriter
  4. Uploading BytesIO/blob to Azure

Lets get started..

1. Reading CSV from azure blob:

# Read CSV
file_content = myblob.read().decode("utf-8")

# Load csv data to dataframe
df = pd.read_csv(StringIO(file_content)) 
Enter fullscreen mode Exit fullscreen mode

2. Add image as stream to excel file to blob:

#Image stream to add image
image_stream=BytesIO()  

# Insert image stream into excel
sheet=writer.sheets[sheet_name]
sheet.insert_image('F6', 'graph.png', { 'image_data': image_stream}) 
Enter fullscreen mode Exit fullscreen mode

3. Add data to excel using xlsxwriter :

# Create Excel 
xlsx_bytes = BytesIO()
writer = pd.ExcelWriter(xlsx_bytes, engine='xlsxwriter', mode='A')
Enter fullscreen mode Exit fullscreen mode

4. Uploading Blob:

# Uploading generated output excel sheet
blob_client.upload_blob(xlsx_bytes.getvalue(), blob_type="BlockBlob", overwrite=True)
Enter fullscreen mode Exit fullscreen mode

Complete Code:

#Import libraries  
import logging
from azure.storage.blob import BlobServiceClient
import io
from io import BytesIO, StringIO
import azure.functions as func
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import xlsxwriter

# Function that creates bar chart and add to excel sheet
def add_sheet(sheet_name, cost_key_label, cost_value_label, file_content, writer):
    logging.info(f'add_sheet: {sheet_name}')

    # Calculate data from csv
    logging.info('Calculate data from csv')
    df = pd.read_csv(StringIO(file_content)) # read csv data as string 
    cost=df.groupby(by=cost_key_label)[cost_value_label].sum()

    # Calculate data from csv
    df_data=df.groupby(by=cost_key_label)[cost_value_label].sum()
    df_data.to_excel(writer, sheet_name=sheet_name) #writer 

    # Create Image
    logging.info('Create image')
    image_stream=BytesIO() #image stream to add image as stream
    plt.clf() # clear the plt object to avoid image overlap 

    # Label for x and y axis of bar chart
    plt.xlabel(cost_key_label, fontsize=10)
    plt.ylabel(cost_value_label, fontsize=10)


    cost_keys=cost.keys().tolist()
    cost_values=cost.tolist()
    plt.bar(cost_keys, cost_values)
    plt.xticks(fontsize=5.5)
    plt.yticks(fontsize=5.5)
    # save image as stream
    plt.savefig(image_stream, dpi = 100)

    # Insert Image into excel
    logging.info('Insert image into excel')
    sheet=writer.sheets[sheet_name]
    sheet.insert_image('F6', 'graph.png', { 'image_data': image_stream})

# main function 
def main(myblob: func.InputStream):
    logging.info(f"File Name: {myblob.name}")

    if not myblob.name.endswith(".csv"):
        return

    # Read CSV
    logging.info("Read csv file")
    file_content = myblob.read().decode("utf-8")

    # Create Excel 
    logging.info('Create excel')
    xlsx_bytes = BytesIO()
    writer = pd.ExcelWriter(xlsx_bytes, engine='xlsxwriter', mode='A')

    #Add sheet function call
    add_sheet('cost_analysis', 'InvoiceSectionName', 'CostInBillingCurrency', file_content, writer)
    add_sheet('subscription_cost', 'SubscriptionName', 'CostInBillingCurrency', file_content, writer)

   # Closing writer object
    writer.save()

    # Upload excel to Azure Blob Storage
    logging.info("Upload excel to Azure Blob Storage")
    blob_service_client = BlobServiceClient.from_connection_string(CONNECTION_STRING)
    blob_client = blob_service_client.get_blob_client(container=CONTAINER_NAME, blob=myblob.name + '.xlsx')
    blob_client.upload_blob(xlsx_bytes.getvalue(), blob_type="BlockBlob", overwrite=True)

Enter fullscreen mode Exit fullscreen mode

.

Hope this Helps! :)

Top comments (0)