When Generative AI meets AWS Community Builders : Unveiling the Ultimate AWS Cost Advisor

#aws #ai #hackathon #community

Building a Retrieval Augmented Generation (RAG) Pipeline with AWS SageMaker, Hugging Face Transformer and Langchain

Introduction

AWS Community Builders gave us an interesting opportunity to work on Integrating AI tools and solve AWS use-cases.

This unique collaboration sparked our collective imagination, leading us to brainstorm the (perfect) solution. After countless hours of innovation and collaboration, we are thrilled to unveil this groundbreaking tool that will redefine AWS Cost Savings and FinOps landscape: a Q&A Chatbot powered by Sagemaker, HuggingFace, LangChain, and supercharged with Retrieval Augmented Generation. Join us on this captivating journey as we present our revolutionary solution that harmonizes the worlds of generative AI and AWS Solution Architect skills

AWS Cost Advisor is an AI tool that is designed to help users optimize their cloud costs by providing cost recommendations, identifying cost anomalies, and offering cost-saving suggestions. The tool is built using Retrieval-Augmented Generation (RAG), a combination of retrieval-based and generation-based techniques. The knowledge database of the AWS Cost Advisor is built using AWS Cost Management FAQs as a vector database that is integrated into the RAG pipeline. In this blog post, we will demonstrate the Cost Advisor tool and its improved performance as compared to a generic text generation Q&A assistant.

Problem Statement

The AWS Cost Advisor tool was designed to help users optimize their cloud costs by providing cost recommendations, identifying cost anomalies, and offering cost-saving suggestions. However, the tool was facing a problem of accuracy and relevancy of the answers provided by the tool. The generic text generation Q&A assistant would often provide irrelevant answers or not answer the user's question at all. To solve this problem, we implemented the RAG pipeline.

No more Unnecessary Cash Burn

Available Solutions:

Before we delve into our groundbreaking solution, let's explore the existing options. Conventional cost optimization tools often lack the sophistication to comprehensively analyze complex billing data and provide granular insights. Additionally, chatbots typically struggle to handle the intricacies of AWS services and generate meaningful recommendations. These limitations hinder Solution Architect's progress, making it challenging to unlock the full potential of savings in the Cloud.

Existing solutions like Cost Explorer, AWS Budgets, can feel lot like DIY

Tools, tools, and lot of knobs and levers everywhere

Retrieval-Augmented Pipeline

The RAG pipeline uses a combination of retrieval-based and generation-based techniques to provide more relevant and accurate answers. The pipeline consists of three components: a retriever, a reader, and a generator. The retriever retrieves the most relevant documents from the knowledge database, the reader extracts the answer from the retrieved documents, and the generator generates a more concise and accurate answer.

Solution Architecture

High level solution overview

Huggingface Open-Source LLM Models

The Huggingface open-source large language models (LLMs) are used by AWS Cost Advisor for the generation-based component of the Retrieval-Augmented Generation (RAG) pipeline. These models are pre-trained on large amounts of text data and can generate high-quality responses to natural language questions. Huggingface LLM models are hosted as Sagemaker endpoints.

MODEL_CONFIG = {
    "huggingface-text2text-flan-t5-xxl": {
        "instance type": "ml.g4dn.xlarge",
        "env": {"SAGEMAKER_MODEL_SERVER_WORKERS": "1", "TS_DEFAULT_WORKERS_PER_MODEL": "1"},
        "parse_function": parse_response_model_flan_t5,
        "prompt": """Answer based on context:\n\n{context}\n\n{question}""",
    },
    "huggingface-textembedding-gpt-j-6b": {
        "instance type": "ml.g4dn.2xlarge",
        "env": {"SAGEMAKER_MODEL_SERVER_WORKERS": "1", "TS_DEFAULT_WORKERS_PER_MODEL": "1"},
    },
}

newline, bold, unbold = "\n", "\033[1m", "\033[0m"

for model_id in _MODEL_CONFIG_:
    endpoint_name = name_from_base(f"aws-costadvisor-rag-{model_id}")
    inference_instance_type = _MODEL_CONFIG_[model_id]["instance type"]

    # Retrieve the inference container uri. This is the base HuggingFace container image for the default model above.
    deploy_image_uri = image_uris.retrieve(
        region=None,
        framework=None,  # automatically inferred from model_id
        image_scope="inference",
        model_id=model_id,
        model_version=model_version,
        instance_type=inference_instance_type,
    )
    # Retrieve the model uri.
    model_uri = model_uris.retrieve(
        model_id=model_id, model_version=model_version, model_scope="inference"
    )
    model_inference = Model(
        image_uri=deploy_image_uri,
        model_data=model_uri,
        role=aws_role,
        predictor_cls=Predictor,
        name=endpoint_name,
        env=_MODEL_CONFIG_[model_id]["env"],
    )
    model_predictor_inference = model_inference.deploy(
        initial_instance_count=1,
        instance_type=inference_instance_type,
        predictor_cls=Predictor,
        endpoint_name=endpoint_name,
    )
    print(f"{bold}Model {model_id} has been deployed successfully.{unbold}{newline}")
    _MODEL_CONFIG_[model_id]["endpoint_name"] = endpoint_name

Langchain Retrieval QA

The Langchain Retrieval QA is used by AWS Cost Advisor as the retrieval-based component of the RAG pipeline. This tool is designed to retrieve the most relevant documents from the knowledge database based on the user's query. Langchain Retrieval QA uses a combination of semantic search and keyword matching to retrieve the most relevant documents, ensuring that the retrieved documents are highly relevant to the user's query.

AWS Cost Management FAQs

Question	Answer
Who should use the AWS Cost Management products?	We have yet to meet a customer who does not consider cost management a priority. AWS Cost Management tools are used by IT professionals, financial analysts, resource managers, and developers across all industries to access detailed information related to their AWS costs and usage, analyze their cost drivers and usage trends, and take action on their insights.
How do I get started with the AWS Cost Management tools?	The quickest way to get started with the AWS Cost Management tools is to access the Billing Dashboard. From there, you can access a number of products that can help you to better understand, analyze, and control your AWS costs, including, but not limited to, AWS Cost Explorer, AWS Budgets, and the AWS Cost & Usage Report.
What are the benefits of using AWS Cost Explorer?	AWS Cost Explorer lets you explore your AWS costs and usage at both a high level and at a detailed level of analysis, and empowering you to dive deeper using a number of filtering dimensions (e.g., AWS Service, Region, Member Account, etc.) AWS Cost Explorer also gives you access to a set of default reports to help you get started, while also allowing you to create custom reports from scratch.
What kinds of default reports are available?	AWS Cost Explorer provides a set of default reports to help you get familiar with the available filtering dimensions and types analyses that can be done using AWS Cost Explorer. These reports include a breakdown of your top 5 cost-accruing AWS services, and an analysis of your overall Amazon EC2 usage, an analysis of the total costs of your member accounts, and the Reserved Instance Utilization and Coverage reports.
Can I create and save custom AWS Cost Explorer reports?	Yes. You can currently save up to 50 custom AWS Cost Explorer reports.

FAISS Vector Database

The FAISS vector database is used by AWS Cost Advisor to store the vector representations of the documents in the knowledge database. These vector representations are used by the retriever component of the RAG pipeline to retrieve the most relevant documents based on the user's query. The FAISS vector database allows for fast and efficient retrieval of documents, ensuring that the user receives a response in real-time.

index_creator = VectorstoreIndexCreator(
    vectorstore_cls=FAISS,
    embedding=embeddings,
    text_splitter=CharacterTextSplitter(chunk_size=300, chunk_overlap=0),
)

AWS Cost Advisor leverages HuggingFace open-source large language models (LLMs) for the generation-based component of the RAG pipeline. The Langchain Retrieval QA is used for the retrieval-based component. The FAISS vector database is used to store the vector representations of the documents in the knowledge database, which are used by the retriever component to retrieve the most relevant documents.

Talk is cheap. Show me the code. ~ Linus Torvalds

Without any further ado, check out our Solution on Github:
https://github.com/bismillahkani/aws-cost-advisor/tree/main

Performance Comparison with Examples

To demonstrate the improved performance of the AWS Cost Advisor tool, we compared it to a generic text generation Q&A assistant. We asked both tools the same question: "How to get started with AWS cost and usage report?" The generic text generation Q&A assistant provided an irrelevant answer, while the AWS Cost Advisor tool provided a relevant answer with cost-saving suggestions.

Answer generated by LLM
Create an AWS account for a company or an individual. Navigate to the report section, then click on Create report. Select the Report template, and click Create Report.

Answer generated by RAG
The quickest way to get started with the AWS Cost Management tools is to access the Billing Dashboard.

How we're different?

Enter AMAC (Ask Me Anything/AWS Cost Advisor), a fusion of cutting-edge technologies poised to redefine the AWS FinOps experience. Powered by HuggingFace, the industry-leading natural language processing (NLP) platform, and LangChain, an innovative language understanding tool, our chatbot possesses unrivaled contextual comprehension and robust question-answering abilities. By integrating FinOps and AWS Solution Architect expertise, we empower the chatbot with the knowledge to provide insightful cost optimization recommendations specific to participants' AWS deployments.

But that's not all – our Q&A Chatbot goes beyond traditional approaches by incorporating Retrieval Augmented Generation. This groundbreaking technique seamlessly blends retrieval-based models and generative models to deliver the most accurate and contextually rich responses. The chatbot efficiently scans massive billing datasets, extracts pertinent information, and then generates coherent and concise answers, providing participants with actionable insights tailored to their specific cost optimization needs.

Real-life use-cases

The AWS Cost Advisor AI Hackathon Scenario: Imagine a team of Cloud/DevOps/FinOps/Solution Architects/Engineers eager to optimize costs in their AWS deployments. Empowered by our Q&A Chatbot, they effortlessly navigate through complex billing reports, analyzing costs and identifying areas for improvement. The chatbot comprehends their queries, retrieves relevant cost data, and provides instant, accurate recommendations, enabling the team to make informed decisions that result in substantial cost savings.

Conclusion (Why we should win)

Our solution is a game-changer for the AWS Cost Management landscape and We're just getting started as this is a side-project for an AI Hackathon (imagine what this will turn out into as a Full-blown project!!!), providing AWS customers with a powerful and efficient tool that significantly enhances their ability to optimize costs in their AWS deployments. By combining the cutting-edge technologies of Sagemaker, HuggingFace, LangChain, and Retrieval Augmented Generation with the expertise of AWS Solution Architects, our Q&A Chatbot offers unparalleled comprehension, lightning-fast retrieval, and human-like responses.

Meet the team behind this: