DEV Community: selvakumar palanisamy

AWS re:Invent 2025 Key Announcements: What They Mean for the Future of Cloud, AI & Enterprise Tech

selvakumar palanisamy — Thu, 04 Dec 2025 12:42:35 +0000

AWS re:Invent 2025 made one thing very clear: AWS is fully committing to an AI-first, agent-driven, and hybrid-ready future.

With major announcements across generative AI, infrastructure, privacy, cost optimisation, and hybrid cloud, the keynote set the direction for how modern applications will be built and scaled over the next decade.

Below is a summary of the key announcements and what this means for developers, enterprises, and the cloud industry as a whole.

#	Announcement	Area	Why it matters
1	P6e GB300 NVIDIA GPU instances	Compute / AI	New EC2 instances using NVIDIA GB200 NVL72 systems, offering ~20× the compute of prior P5en for huge training and inference jobs - aimed at frontier-scale AI and agents.
2	AWS AI Factories	Hybrid AI infra	Lets customers deploy AWS AI infrastructure (Ultra servers, Trainium, Bedrock) directly into their own data centres, giving "private AWS-like regions" for regulated or on-prem workloads.
3	Trainium 3 Ultra servers GA	AI accelerators	Third-gen Trainium Ultra servers become generally available with big boosts in compute, memory bandwidth and energy efficiency, turning racks into AI supercomputers for training LLMs.
4	Trainium 4 preview	AI accelerators	Next-gen Trainium is announced with large jumps in compute and bandwidth over Trn3, targeting future "absurdly large" frontier models and long-term AI roadmap planning.
5	Mistral Large & Mistral 3 open-weights in Bedrock	Models / Inference	High-performance open-weights models from Mistral are added to Amazon Bedrock, giving customers more choice and flexibility for both heavy reasoning and efficient edge/latency-sensitive use cases.
6	Amazon Nova 2 model family (Lite, Pro, Sonic)	Models / Inference	New foundation models optimised for cost and quality: Lite for fast, cheap tasks; Pro for complex reasoning and agent workflows; Sonic for low-latency speech-to-speech conversations - AWS's main answer to other frontier models.
7	Amazon Nova 2 Omni	Multimodal AI	A unified multimodal model that ingests text, images, audio, and video and can output both text and images, simplifying scenarios like watching a video presentation and generating summaries plus visuals in one shot.
8	Amazon Nova Forge ("Novella" training)	Custom model training	A training service that lets customers start from Nova checkpoints and train "open training models" with their own data (mid-training, not just fine-tuning), producing custom "Novella" models tailored to their domain.
9	Policy in AgentCore	Agentic AI / Governance	A policy engine for Amazon Agent Core that defines what agents are allowed to do, with which tools and under what conditions - similar to IAM for agents - giving deterministic safety controls beyond prompts.
10	AgentCore Evaluations	Agentic AI / QA	Built-in evaluation tools for Agent Core with pre-made and custom metrics (correctness, safety, usefulness) to continuously score and monitor agent behaviour in production - essentially QA for agents.
11	Kiro Autonomous Agent	Dev tooling / Agents	An autonomous development agent (built on Kiro) that can take a goal, plan work, update code across repos, write tests, and open PRs with minimal supervision - targeted at long-running engineering tasks like big refactors.
12	AWS Security Agent	Security / Agents	A specialised security agent on AgentCore that reviews code and configs, flags policy/security violations and suggests fixes, integrating with pipelines to act as a continuous compliance assistant.
13	AWS DevOps Agent	Ops / Agents	An AgentCore-based DevOps agent that automates provisioning, CI/CD changes, config checks and rollbacks, effectively serving as an always-on extra DevOps team member.
14	X8i memory-optimised instances (Intel Xeon 6)	Compute	New X-family EC2 instances with custom Intel Xeon 6 chips, delivering up to ~50% more memory for big in-memory workloads like SAP HANA or large databases.
15	Next-gen AMD EPYC memory instances (3TB RAM)	Compute	New AMD-based instances with up to 3 TB RAM, giving another option for very large memory-bound applications at competitive price-performance.
16	C8a instances (AMD EPYC)	Compute	CPU-optimised instances using latest AMD EPYC, promising around 30% better performance for compute-heavy tasks such as game servers or batch processing.
17	C8iNE instances (Intel + Nitro v6)	Compute / Networking	Network-enhanced compute instances combining Intel Xeon 6 with Nitro v6 to deliver about 2.5× better packet performance per vCPU, aimed at security appliances, firewalls and network-intensive services.
18	M8 AZN high-clock instances	Compute / Low-latency	New M-family instances with very high single-threaded clock speeds, aimed at latency-critical workloads like gaming, real-time analytics and trading systems.
19	EC2 M3 Ultra Mac	Apple / Dev	One of two new Mac instances, providing Apple Silicon-based environments for building and testing macOS/iOS apps in the cloud with more power and scale.
20	EC2 M4 Max Mac	Apple / Dev	The second new Mac instance type using the latest Apple chips, giving even higher performance for iOS/macOS CI pipelines and multi-platform app shops.
21	Lambda Durable Functions	Serverless	Lambda is extended to support long-running, stateful functions that can run for hours or days with resumability and retries - ideal for workflows waiting on agents, human approvals or long processes.
22	S3 max object size increased to 50 TB	Storage	S3's individual object limit jumps from 5 TB to 50 TB, simplifying storage of huge datasets, high-res media, and very large model checkpoints without chunking.
23	S3 Batch Operations 10× faster	Storage / Data ops	S3 Batch Operations are significantly sped up, reducing time and cost for bulk tasks like tagging, copying, and transforming data at petabyte scale.
24	Intelligent tiering for S3 Tables (Iceberg)	Storage / Analytics	S3 Tables (Apache Iceberg) gain intelligent tiering, automatically moving colder table data to cheaper storage classes and potentially cutting costs by up to ~80%.
25	S3 Table replication across regions/accounts	Storage / DR	S3 Tables can now be replicated across regions and accounts, enabling globally consistent query performance and simpler multi-region analytics setups.
26	S3 Access Points for FSx for NetApp ONTAP	Storage / Integration	S3 Access Points are extended to FSx for NetApp ONTAP so ONTAP file data can be accessed like native S3 objects, easing hybrid file/object workflows.
27	S3 Vectors GA	Storage / Vector DB	Native vector storage in S3 (S3 Vectors) becomes generally available, designed to hold and search trillions of embeddings with much lower cost than many bespoke vector databases.
28	GPU-accelerated vector indexing for OpenSearch	Search / AI	OpenSearch adds GPU acceleration for building vector indices, shrinking indexing time by around 10× and cost by ~75%, which is important for large-scale semantic search.
29	EMR Serverless - no local storage config needed	Analytics / Big data	EMR Serverless clusters no longer require you to provision local storage, removing a major configuration hassle and making EMR closer to "pure" serverless big-data processing.
30	GuardDuty support for ECS	Security / Threat detection	Amazon GuardDuty's threat detection expands to ECS workloads, enabling managed anomaly and malware detection for containerised apps.
31	Security Hub GA with new analytics	Security / Posture mgmt	AWS Security Hub becomes generally available with real-time risk analytics, trend views and cleaner pricing, centralising security findings across services.
32	Unified CloudWatch log store	Observability	CloudWatch introduces a unified data store that aggregates logs from AWS services and third-party tools (like Okta, CrowdStrike) into one searchable, analytics-ready location.
33	RDS storage expansion for SQL Server & Oracle	Databases	Amazon RDS lifts storage limits up to 256 TB for SQL Server and Oracle, increasing capacity and I/O throughput for very large enterprise databases.
34	Configurable vCPU counts for RDS SQL Server	Databases / Licensing	You can now set custom vCPU configurations for RDS SQL Server, helping tune instance sizing to optimise Microsoft licence spending.
35	RDS support for SQL Server Developer Edition	Databases / Dev & test	RDS adds support for SQL Server Developer Edition at zero licence cost, making it easier and cheaper to build and test SQL Server-backed apps in the cloud.
36	Database Savings Plans	Databases / Pricing	New Savings Plans for databases offer up to about 35% discounts across multiple engines, finally giving a unified, predictable cost model for long-running DB workloads.

Building an AI-Powered Terraform Assistant

selvakumar palanisamy — Wed, 13 Aug 2025 11:17:50 +0000

Infrastructure as Code (IaC) has revolutionized how we manage cloud resources, and Terraform is at the forefront of this shift. However, writing, validating, and debugging HCL (HashiCorp Configuration Language) can still be a time-consuming process. What if you could simply describe your desired infrastructure in plain English and have an AI generate, validate, and even correct the code for you?
This post breaks down a powerful Streamlit application that does exactly that. This "Terraform Code Assistant" leverages the OpenAI API to create a seamless workflow for generating IaC for AWS, Azure, or Google Cloud.

Core Architecture: How It Works
The application is built with a clear, three-step workflow in mind: Generate, Validate, and Correct. It combines the user-friendly interface of Streamlit with the power of OpenAI's language models and the reliability of the Terraform CLI.

Step 1: Setting Up the Environment Automatically

Before we can do anything with Terraform, we need to have the Terraform executable available. In a local environment, this is a simple download. But for a web app deployed on a platform like Streamlit Community Cloud, we need a more robust solution.
The get_terraform_executable function handles this automatically.

Platform Detection: It first checks the user's operating system (Linux, Windows, etc.) and architecture (amd64, arm64) to determine the correct Terraform version to download.

Downloading and Unzipping: It fetches the appropriate zip file from HashiCorp's official releases page, extracts it, and makes the terraform binary executable.

Caching for Efficiency: The @st.cache_resource decorator is crucial here. It ensures that Terraform is downloaded only once when the app first starts. For all subsequent user sessions, the cached executable is used, making the app much faster and more efficient.

Step 2: The User Interface and Configuration

A good tool needs an intuitive interface. The app is divided into a sidebar for configuration and a main area for interaction.
The Sidebar

The sidebar contains all the necessary setup options:

1. Cloud Provider Selection:

A simple dropdown lets the user choose between AWS, Azure, and Google Cloud. This choice is passed to the AI to ensure it generates the correct provider-specific code.

2. API Key Management:

The app securely loads the OPENAI_API_KEY from Streamlit's secrets management. It provides clear instructions for the user if the key is not found, ensuring a smooth setup process.

3. Instructions:

A "How to use" section guides the user through the app's workflow.

Step 3: The AI-Powered Workflow in Action

This is where the magic happens. The three main buttons—Generate, Validate, and Correct—drive the entire process.

Generate with AI
When a user describes their infrastructure (e.g., "An S3 bucket for logging and a t3.small EC2 instance") and clicks this button, the app:

Constructs a detailed system prompt for the OpenAI API. This prompt instructs the AI to act as a Terraform expert for the selected cloud provider and to return only a clean block of HCL code.
Sends the user's request to the gpt-4o model.
Parses the AI's response to extract the HCL code block and displays it in the code editor.

Validate

Once the code is generated, the user can validate it. Clicking this button triggers a background process:

A temporary directory is created, and the generated Terraform code is saved to a main.tf file.
The app runs terraform init to download the necessary provider plugins.
It then runs terraform validate to check the syntax and configuration.
The output (success or error) is captured and displayed in the results area.

Correct with AI

If the validation fails, the user doesn't have to debug the code manually. The Correct with AI button becomes active, and clicking it will:

Create a new prompt for the AI that includes both the incorrect code and the specific validation error message from Terraform.
Ask the AI to act as a code correction expert and fix the error.
The corrected code is then returned and placed back into the editor, ready for re-validation.

This Terraform Code Assistant is a powerful example of how AI can be integrated into DevOps workflows to boost productivity and lower the barrier to entry for managing cloud infrastructure. By combining a smart, automated setup, a clean user interface, and a powerful generate-validate-correct loop, this Streamlit app transforms a complex task into a simple, conversational experience.

Try this terraform code gen ai agent app

https://terraformcodegen-selvapal-poc.streamlit.app

Repo : https://github.com/selvakumarsai/terraformcodegen

Happy Terraforming!

A Deep Dive into CrewAI and Agentic Design

selvakumar palanisamy — Wed, 11 Jun 2025 23:31:37 +0000

Mastering Mock Interviews with AI:

A Deep Dive into CrewAI and Agentic Design

Are you preparing for a technical interview and wishing you had a personalized, intelligent interviewer to practice with? Look no further! My latest project, ai_mock_interview demonstrates a powerful application of AI agents using the CrewAI framework to create a dynamic and realistic mock interview experience.

This blog post will walk you through the core components of the ai_mock_interview project, highlighting how specific Python functions are designed to act as intelligent agents and how they collaborate within a "crew" to deliver a comprehensive mock interview and feedback session.

The Power of Agentic AI with CrewAI
Before dive into the code, let's briefly understand the underlying magic. CrewAI is an open-source framework for orchestrating role-playing autonomous AI agents. It allows you to define agents with specific roles, backstories, and goals, and then assign tasks to them. These agents can communicate, delegate, and collaborate to achieve a common objective, mimicking a real-world team.
In ai_mock_interview project, there are several agents, each responsible for a distinct phase of the interview process.

Refer my GitHub repo https://github.com/selvakumarsai/ai_mock_interview

https://github.com/selvakumarsai/ai_mock_interview/blob/main/interview_practice_system.py

Let's break down the key Python functions and classes

Preparation: Research the company and role, then generate a primary interview question.
Concurrent Follow-up Generation: While the user answers, a follow-up question is already being prepared in the background.
User Interaction: The main question is presented, and the user's answer is captured.
Initial Evaluation: The user's first answer is evaluated against the model answer, providing immediate feedback.
Dynamic Follow-up: The pre-generated follow-up question is presented, allowing for a deeper assessment.
Final Evaluation: The follow-up answer is also evaluated, completing the mock interview cycle.

This multi-stage, agent-driven approach provides a robust, interactive, and highly valuable tool for anyone looking to sharpen their technical interview skills. By leveraging AI agents, we've created a system that is not only automated but also intelligent and adaptable, simulating the dynamic nature of real-world interviews.
Check out the script to see this sophisticated agentic design in action!

Defining the Output Structure: QuestionAnswerPair
Before delving into agents, we define a Pydantic BaseModel to structure the output of our question-generating agents.
Python

class QuestionAnswerPair(BaseModel):
    """Schema for the question and its correct answer."""
    question: str = Field(..., description="The technical question to be asked")
    correct_answer: str = Field(..., description="The correct answer to the question")

This QuestionAnswerPair class ensures that when an agent generates a question, it also provides the correct_answer in a standardized format, which is crucial for the evaluation phase.

1. The Company Research Specialist Agent (company_researcher)

This agent is the intelligence gatherer, laying the groundwork for relevant questions.

Python Object: company_researcher (an instance of Agent).
Role: "Company Research Specialist"
Backstory: "You are an expert in researching companies and creating technical interview questions. You have deep knowledge of tech industry hiring practices and can create relevant questions that test both theoretical knowledge and practical skills."
Goal: "Gather information about the company and create interview questions with answers"
Tools: It's equipped with SerperDevTool(), allowing it to perform web searches to gather company-specific information.

search_tool = SerperDevTool()

company_researcher = Agent(
    role="Company Research Specialist",
    goal="Gather information about the company and create interview questions with answers",
    backstory="""You are an expert in researching companies and creating technical interview questions.
    You have deep knowledge of tech industry hiring practices and can create relevant
    questions that test both theoretical knowledge and practical skills.""",
    tools=[search_tool],
    verbose=True,
)

Integration: This agent is assigned the create_company_research_task, which uses its research capabilities to provide a summary of the company's technical requirements and interview process. This output then informs the question_preparer.

def create_company_research_task(company_name: str, role: str, difficulty: str) -> Task:
    return Task(
        description=f"""Research {company_name} and gather information about:
        1. Their technical interview process
        2. Common interview questions for {role} positions at {difficulty} difficulty level
        3. Technical stack and requirements

        Provide a summary of your findings.""",
        expected_output="A report about the company's technical requirements and interview process",
        agent=company_researcher,
    )

2. The Question and Answer Preparer Agent (question_preparer)

This agent is the content creator, responsible for crafting the interview questions.

Python Object: question_preparer (an instance of Agent).
Role: "Question and Answer Preparer"
Backstory: "You are an experienced technical interviewer who knows how to create challenging yet fair technical questions and provide detailed model answers. You understand how to assess different skill levels and create questions that test both theoretical knowledge and practical problem-solving abilities."
Goal: "Prepare comprehensive questions with model answers"

question_preparer = Agent(
    role="Question and Answer Preparer",
    goal="Prepare comprehensive questions with model answers",
    backstory="""You are an experienced technical interviewer who knows how to create
    challenging yet fair technical questions and provide detailed model answers.
    You understand how to assess different skill levels and create questions that
    test both theoretical knowledge and practical problem-solving abilities.""",
    verbose=True,
)

Integration: This agent is assigned the create_question_preparation_task. It takes the research from the company_researcher and then generates a technical question at the specified difficulty, along with a comprehensive model answer, adhering to the QuestionAnswerPair Pydantic schema for structured output.

def create_question_preparation_task(difficulty: str) -> Task:
    return Task(
        description=f"""Based on the company research, create:
        1. A technical question at {difficulty} difficulty level that tests both theory and practice
        2. A comprehensive model answer that covers all key points
        3. Key points to look for in candidate answers

        The question should be appropriate for {difficulty} difficulty level - challenging but fair, and the answer should be detailed.""",
        expected_output="A question and its correct answer",
        output_pydantic=QuestionAnswerPair,
        agent=question_preparer,
    )

3. The Answer Evaluator Agent (answer_evaluator)

This agent is the critic, providing crucial feedback on the candidate's answers.
Python Object: answer_evaluator (an instance of Agent).
Role: "Answer Evaluator"
Backstory: "You are a senior technical interviewer who evaluates answers against the expected solution. You know how to identify if an answer is technically correct and complete."
Goal: "Evaluate if the given answer is correct for the question"

answer_evaluator = Agent(
    role="Answer Evaluator",
    goal="Evaluate if the given answer is correct for the question",
    backstory="""You are a senior technical interviewer who evaluates answers
    against the expected solution. You know how to identify if an answer is
    technically correct and complete.""",
    verbose=True,
)

Integration: This agent is central to the create_evaluation_task. It receives the original question, the user's answer, and the correct answer, then provides a detailed evaluation, including correctness, key points covered/missing, and an explanation.

def create_evaluation_task(
    question: str, user_answer: str, correct_answer: str
) -> Task:
    return Task(
        description=f"""Evaluate if the given answer is correct for the question:
        Question: {question}
        Answer: {user_answer}
        Correct Answer: {correct_answer}
        Provide:
        1. Whether the answer is correct (Yes/No)
        2. Key points that were correct or missing
        3. A brief explanation of why the answer is correct or incorrect""",
        expected_output="Evaluation of whether the answer is correct for the question with feedback",
        agent=answer_evaluator,
    )

4. The Follow-up Question Specialist Agent (follow_up_questioner)

This agent simulates a dynamic interview by generating follow-up questions.
Python Object: follow_up_questioner (an instance of Agent).
Role: "Follow-up Question Specialist"
Backstory: "You are an expert technical interviewer who knows how to create meaningful follow-up questions that probe deeper into a candidate's knowledge and understanding. You can create questions that build upon previous answers and test different aspects of the candidate's technical expertise."
Goal: "Create relevant follow-up questions based on the context"

follow_up_questioner = Agent(
    role="Follow-up Question Specialist",
    goal="Create relevant follow-up questions based on the context",
    backstory="""You are an expert technical interviewer who knows how to create
    meaningful follow-up questions that probe deeper into a candidate's knowledge
    and understanding. You can create questions that build upon previous answers
    and test different aspects of the candidate's technical expertise.""",
    verbose=True,
)

Integration: This agent is tasked by create_follow_up_question_task. It takes the original question, company context, role, and difficulty to craft a new question that deepens the assessment, also returning its output as a QuestionAnswerPair.

def create_follow_up_question_task(
    question: str, company_name: str, role: str, difficulty: str
) -> Task:
    return Task(
        description=f"""Based on the following context, create a relevant follow-up question:
        Original Question: {question}
        Company: {company_name}
        Role: {role}
        Difficulty Level: {difficulty}

        Create a follow-up question that:
        1. Builds upon the original question
        2. Tests deeper understanding of the topic
        3. Is appropriate for the specified difficulty level
        4. Is relevant to the company and role

        The follow-up question should be challenging but fair, and should help
        assess the candidate's technical depth and problem-solving abilities.""",
        expected_output="A follow-up question that builds upon the original question",
        output_pydantic=QuestionAnswerPair,
        agent=follow_up_questioner,
    )

Orchestrating the Interview: This system
intelligently uses two main crews, one for preparation and one for evaluation, with an additional crew for generating follow-up questions concurrently.

A. The preparation_crew (Question Preparation)

Agents: company_researcher, question_preparer
Tasks:
create_company_research_task: company_researcher researches the company, role, and difficulty.
create_question_preparation_task: question_preparer uses the research to generate the primary technical question and its correct answer.
Process: Process.sequential – the tasks are executed one after another.
Outcome: This crew's kickoff() method returns a QuestionAnswerPair object containing the main question and its correct answer, ready to be presented to the user.

    preparation_crew = Crew(
        agents=[company_researcher, question_preparer],
        tasks=[
            create_company_research_task(company_name, role, difficulty),
            create_question_preparation_task(difficulty),
        ],
        process=Process.sequential,
        verbose=True,
    )

B. The evaluation_crew (Answer Evaluation)

Agents: answer_evaluator
Tasks:
create_evaluation_task: answer_evaluator assesses the user's provided answer against the expected correct answer for the initial question.
Process: Process.sequential
Outcome: This crew's kickoff() provides a detailed textual evaluation of the user's response.

    evaluation_crew = Crew(
        agents=[answer_evaluator],
        tasks=[
            create_evaluation_task(
                question=preparation_result.pydantic.question,
                user_answer=user_answer,
                correct_answer=preparation_result.pydantic.correct_answer,
            )
        ],
        process=Process.sequential,
        verbose=True,
    )

C. The follow_up_crew (Follow-up Question Generation)

This crew is created and kicked off asynchronously using asyncio, meaning it runs in the background while the user answers the main question.
Agents:follow_up_questioner
Tasks:
create_follow_up_question_task: The follow_up_questioner generates a relevant follow-up question based on the initial question and context.
Process: Process.sequential
Outcome: By the time the user finishes answering the first question, the follow_up_question_task (which is an asyncio.Task) is awaited, and its result (a QuestionAnswerPair for the follow-up) is retrieved. This allows for a more seamless interview flow.

    follow_up_question_task = asyncio.create_task(
        generate_follow_up_question(
            question=preparation_result.pydantic.question,
            company_name=company_name,
            role=role,
            difficulty=difficulty,
        )
    )

The start_interview_practice function orchestrates these crews. It first runs the preparation_crew, then concurrentlyinitiates the follow_up_crew while prompting the user for an answer to the main question. Once the user answers, the evaluation_crew assesses the first answer. Finally, it presents the pre-generated follow-up question and evaluates the user's second response.

MCP — Azure CLI integration

selvakumar palanisamy — Sun, 06 Apr 2025 07:28:11 +0000

You’ve probably heard of Model Context Protocol (MCP), which has recently attracted a lot of attention from the AI community.

We will discuss what MCP is and why it is important in this post.
Think of MCP as “USB-C for AI integrations,” an open standard that enables consistent connections between AI models and a wide range of applications and data sources.

Simply put, MCP eliminates the need for separate adapters or unique code for each software application by allowing an AI assistant to communicate with them all using a standard language.

MCP functions similarly to providing your AI assistant with a universal remote control for all of your electronic gadgets and services.

Using an AI assistant with external tools without MCP is similar to having a number of appliances, each with its own plug and no universal outlet. Everywhere, developers had to contend with disjointed integrations. Your AI IDE may, for instance, employ one technique to pull code from GitHub, another to retrieve information from a database, and a third to automate a design tool; each of these integrations requires a unique adapter.

This is not only time-consuming, but it is also fragile and non-scalable.MCP is based on a client-server architecture, which allows a host programme to communicate with several servers.

Three main parts form the framework of the protocol:

Host
Client
Server

Before we get into detail on each, here is a high-level summary.

MCP Server : The server functions as an internal translator for the app; it may interpret a natural language request (from an AI) and carry out the corresponding action within the app.
The functionality (or “services”) of the application is made uniformly available by an MCP server.
These adapters are small and lightweight and work with a particular programme or service. The functionality of such programme (its “services”) is made readily accessible via an MCP server.

Tool Discovery: They can describe what actions/capabilities the application offers

Command parsing: They convert incoming AI instructions into exact API calls or application commands.

Response formatting: Involves taking the data, confirmation messages, and other output from the app and formatting it so the AI model can interpret it. Typically, this is done as text or structured data.

Error Handling : Handle exceptions and erroneous requests and provide helpful error messages.

MCP client: AI assistant includes an MCP client component. This client maintains a 1:1 connection to an MCP server.

I have played with Azure CLI MCP Server ,MCP Server that wraps the Azure CLI, adds a nice prompt to improve how it works, and exposes it.

It has access to the full Azure CLI, so it can do anything the Azure CLI can do. This MCP server currently only works with the stdio transport, so it should run locally on your machine, using your Azure CLI credentials.
This server can run as a Java application or inside a Docker container.

Install MCP Claude desktop
https://claude.ai/download

Install and configure the server with Java

Install the Azure CLI: you can do this by following the instructions here (https://learn.microsoft.com/en-us/cli/azure/install-azure-cli)

Authenticate to your Azure account. You can do this by running az login in your terminal.

Make sure you have Java 17 or higher installed

Download azure-cli-mcp

https://github.com/jdubois/azure-cli-mcp/releases

Download the latest release: gh release download — repo jdubois/azure-cli-mcp — pattern=’azure-cli-mcp.jar’

 % gh release download --repo jdubois/azure-cli-mcp --pattern='azure-cli-mcp.jar'
To get started with GitHub CLI, please run:  gh auth login
Alternatively, populate the GH_TOKEN environment variable with a GitHub API authentication token.

 % gh auth login
? Where do you use GitHub? GitHub.com
? What is your preferred protocol for Git operations on this host? HTTPS
? Authenticate Git with your GitHub credentials? Yes
? How would you like to authenticate GitHub CLI? Login with a web browser

! First copy your one-time code: xxxxxxxx
Press Enter to open https://github.com/login/device in your browser... 
✓ Authentication complete.
- gh config set -h github.com git_protocol https
✓ Configured git protocol
✓ Logged in as xxxxxxxxxxxxx

 % gh release download --repo jdubois/azure-cli-mcp --pattern='azure-cli-mcp.jar'

To use the server from Claude Desktop, add the server to your claude_desktop_config.json file. Please note that you need to point to the location where you downloaded the azure-cli-mcp.jar file.

% open ~/Library/Application\ Support/Claude

% touch ~/Library/Application\ Support/Claude/claude_desktop_config.json

% vi ~/Library/Application\ Support/Claude/claude_desktop_config.json

{
    "mcpServers": {
        "azure-cli": {
            "command": "java",
            "args": [
                "-jar",
              "~/Downloads/azure-cli-mcp.jar"
            ]
        }
    }
}

Create azure resource group and storage blob using using Claude desktop

Open Cluade desktop

Is it Secured : No, This MCP server executes az commands for you; an attacker might use it to execute any other command. The current implementation only supports the stio transport, like the majority of MCP servers now in use. It is designed to operate locally on your computer using your Azure CLI credentials, much like you would if you were doing it yourself.
It is entirely feasible to have this MCP server support the http transport and Azure token authentication in the future, allowing for remote use by various users. The second step will be completed after the SDK and MCP standard are more stable.

AWS Q Developer

selvakumar palanisamy — Fri, 28 Mar 2025 09:54:59 +0000

AWS Q Developer is an AI-powered coding assistant designed to help developers write, debug, and optimize code more efficiently. It offers real-time code suggestions, automatic documentation generation, and AWS service integrations, making cloud development more intuitive and productive.

Unlike generic AI code assistants, AWS Q Developer is specifically tailored for AWS environments, providing optimized recommendations for AWS SDKs, APIs, and cloud services.

Key Features

AI-Driven Code Suggestions

Offers real-time, context-aware code completions.
Provides function recommendations based on natural language comments.
Supports multiple programming languages, including Python, Java, JavaScript, TypeScript, and C#.

AWS Service Integrations

Helps developers efficiently integrate AWS services like Lambda, S3, DynamoDB, EC2, and API Gateway.
Suggests optimal configurations for AWS resources.

Security Scanning and Best Practices

Identifies vulnerabilities and misconfigurations in code.
Provides security best practices based on AWS Well-Architected Framework.

Automated Code Refactoring and Optimisation

Improves existing code by suggesting optimizations.
Helps migrate legacy code to modern AWS cloud architectures.

Natural Language Query Support

Developers can ask questions in plain English, and AWS Q Developer retrieves relevant AWS documentation, code snippets, or troubleshooting steps

Seamless IDE Integration

Works with VS Code, IntelliJ IDEA, AWS Cloud9, and JetBrains IDEs.
Provides in-line AI assistance within the developer's workflow.

Setting Up AWS Q Developer

Step 1: Install AWS Q Developer in Your IDE

Open VS Code, IntelliJ IDEA, or AWS Cloud9.
Install the AWS Toolkit extension.
Enable AWS Q Developer within the AWS Toolkit settings.
Sign in with your AWS IAM credentials.

Step 2: Start Using AWS Q Developer for Code Assistance
Write comments describing a function, and AWS Q Developer will generate the corresponding code.
Type incomplete functions, and AWS Q will autocomplete them based on best practices.

Step 3: Use AWS Q Developer for Security and Best Practices
Run security scans to detect vulnerabilities in AWS SDK usage.
Get AWS Well-Architected Framework recommendations for cloud applications.

Step 4: Automate Code Documentation
Generate API documentation by running AWS Q Developer’s documentation feature.

Using AWS Q Developer in Python
Below is an example of AWS Q Developer generating a function to upload a file to Amazon S3:
import boto3

def upload_file_to_s3(bucket_name, file_path, object_name):
"""Uploads a file to an S3 bucket."""
s3 = boto3.client('s3')
try:
s3.upload_file(file_path, bucket_name, object_name)
print(f"File {file_path} uploaded to {bucket_name}/{object_name}")
except Exception as e:
print(f"Error uploading file: {e}")
upload_file_to_s3("my-bucket", "localfile.txt", "uploaded-file.txt")

AWS Q Developer can generate this function automatically from a
simple comment like:

Function to upload a file to Amazon S3

AWS Bedrock Guardrails: Enhancing AI Security and Compliance

selvakumar palanisamy — Fri, 14 Mar 2025 09:42:27 +0000

AWS Bedrock Guardrails is a set of security and content moderation tools designed to help organizations govern the use of generative AI models within Amazon Bedrock. It enables developers to define policies that restrict harmful, biased, or non-compliant content generation while maintaining the flexibility to build customized AI-driven applications.

Key Features

Content Filtering and Moderation

Automatically detects and blocks harmful, offensive, or inappropriate content in AI-generated responses.
Supports configurable thresholds to determine the severity of filtered content.

Customisable Policy Enforcement

Allows businesses to define domain-specific restrictions to prevent AI-generated content that violates organizational policies.
Policies can be fine-tuned based on industry-specific compliance needs.

Bias and Ethical AI Governance

Detects and mitigates potential biases in AI-generated outputs.
Supports ethical AI principles by ensuring fair and unbiased content generation.

Logging and Monitoring

Integrates with AWS CloudWatch and AWS Audit Manager to log AI model responses for compliance auditing.
Enables tracking and review of AI-generated outputs for continuous improvement.

Seamless Integration with AWS Services

Works with Amazon Bedrock models such as Anthropic Claude, AI21, Stability AI, and others.
Can be integrated with Amazon Lex, Amazon Kendra, and AWS Lambda for extended use cases.

Benefits

Improved Security: Prevents harmful content generation in AI applications.
Regulatory Compliance: Helps organizations adhere to industry regulations (e.g., GDPR, HIPAA).
Trust and Transparency: Builds user trust by ensuring AI-generated content is ethical and safe.
Scalability: Works across various AWS services, making it easy to scale AI governance.
Customisation: Tailor policies based on specific organizational requirements.

Use Cases

Enterprise AI Chatbots
Prevents chatbots from generating inappropriate, misleading, or harmful responses.

Content Moderation in Social Media and E-commerce
Filters user-generated content for offensive language or policy violations.
Healthcare and Finance Applications
Ensures AI-generated responses comply with industry regulations and ethical standards.
Legal and Compliance Review
Logs AI interactions for auditability and compliance checks.

How to Configure

Step 1: Enable AWS Bedrock Guardrails

Log in to the AWS Management Console.
Navigate to Amazon Bedrock.
Under Guardrails, enable the service for your AI models.

Step 2: Define Content Policies

Create a new Guardrails policy.
Specify restricted topics, language filters, and severity levels.
Apply predefined compliance templates if needed.

Step 3: Integrate Guardrails with AI Models

Attach the Guardrails policy to a specific AI model in Amazon Bedrock.
Configure API access for AI applications.

Step 4: Monitor and Audit AI Responses

Use AWS CloudWatch to monitor flagged content.
Enable logging in AWS Audit Manager for compliance tracking.

Below is a sample AWS Lambda function that integrates with Amazon Bedrock and enforces Guardrails policies

import boto3

def moderate_ai_response(prompt):
    bedrock_client = boto3.client("bedrock")

    response = bedrock_client.invoke_model(
        modelId="anthropic-clause-v1",
        body={"prompt": prompt}
    )

    # Apply Guardrails filtering
    guardrails_client = boto3.client("bedrock-guardrails")
    moderation_result = guardrails_client.moderate_content(
        content=response["body"]
    )

    if moderation_result["flagged"]:
        return "Content blocked due to policy violations."
    else:
        return response["body"]

# Example usage
user_input = "Tell me something controversial."
print(moderate_ai_response(user_input))

AWS VPC Lattice

selvakumar palanisamy — Thu, 13 Mar 2025 07:43:15 +0000

AWS VPC Lattice is a fully managed application networking service that simplifies service-to-service communication across VPCs and AWS accounts. It enables users to connect, secure, and monitor service-to-service communications without managing complex network topologies.

Key Features

Cross-VPC and Cross-Account Communication: Enables seamless communication between services across multiple VPCs and AWS accounts.
Service Discovery and Load Balancing: Provides automatic service discovery and distributes traffic across service instances.
Centralized Access Management: Integrates with IAM to define access policies.
Observability and Monitoring: Offers built-in monitoring via AWS CloudWatch and AWS X-Ray.
Security and Compliance: Supports TLS encryption, authentication, and authorization mechanisms.

AWS VPC Lattice setup consists of:

Services: Applications deployed in different VPCs.
VPC Lattice Service Network: A logical boundary to define services and connectivity.
Target Groups: Define the endpoints where requests are routed.
Listeners and Rules: Define how traffic is managed and directed

AWS Accounts: service, central, and consumer

Service Account – VPC Lattice service

# VPC Lattice Module
module "vpc_lattice_service" {
  source  = "aws-ia/amazon-vpc-lattice-module/aws"
  version = "0.0.2"

  services = {
    lambdaservice = {
      name        = "lambda-service"
      auth_type   = "AWS_IAM"
      auth_policy = local.auth_policy

      listeners = {
        http_listener = {
          name     = "httplistener"
          port     = 80
          protocol = "HTTP"
          default_action_forward = {
            target_groups = {
              lambdatarget = { weight = 100 }
            }
          }
        }
      }
    }
  }

  target_groups = {
    lambdatarget = {
      type = "LAMBDA"
      targets = {
        lambdafunction = { id = aws_lambda_function.lambda.arn }
      }
    }
  }
}

# VPC Lattice service Auth Policy
locals {
  auth_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action    = "*"
        Effect    = "Allow"
        Principal = "*"
        Resource  = "*"
      }
    ]
  })
}

*Resource share *

# Resource Share
resource "aws_ram_resource_share" "resource_share" {
  name                      = "Amazon VPC Lattice service"
  allow_external_principals = true
}

# Principal Association
resource "aws_ram_principal_association" "principal_association" {
  principal          = var.central_aws_account
  resource_share_arn = aws_ram_resource_share.resource_share.arn
}

# Resource Association - VPC Lattice service
resource "aws_ram_resource_association" "lattice_service_share" {
  for_each = module.vpc_lattice_service.services

  resource_arn       = each.value.attributes.arn
  resource_share_arn = aws_ram_resource_share.resource_share.arn
}

Central Account – VPC Lattice service network

# VPC Lattice Module
module "vpclattice_service_network" {
  source  = "aws-ia/amazon-vpc-lattice-module/aws"
  version = "0.0.2"

  service_network = {
    name        = "centralized-service-network"
    auth_type   = "AWS_IAM"
    auth_policy = local.auth_policy
  }

  services = { for k, v in var.lattice_services: k => { identifier = v } }

  depends_on = [aws_ram_resource_share_accepter.share_accepter]
}

# VPC Lattice service network Auth Policy
locals {
  auth_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action    = "*"
        Effect    = "Allow"
        Principal = "*"
        Resource  = "*"
      }
    ]
  })
}

# Accepting VPC Lattice services from Service AWS Account
resource "aws_ram_resource_share_accepter" "share_accepter" {
  share_arn = local.lattice_services.ram_share
}

*Resource Share *

# Resource Share
resource "aws_ram_resource_share" "resource_share" {
  name                      = "Amazon VPC Lattice service network"
  allow_external_principals = true
}

# Principal Association
resource "aws_ram_principal_association" "principal_association" {
  principal          = var.consumer_aws_account
  resource_share_arn = aws_ram_resource_share.resource_share.arn
}

# Resource Association - VPC Lattice service network
resource "aws_ram_resource_association" "lattice_service_network_share" {
  resource_arn       = module.vpclattice_service_network.service_network.arn
  resource_share_arn = aws_ram_resource_share.resource_share.arn
}

Consumer Account – VPC Lattice VPC association

module "vpc_lattice_vpc_association" {
  source  = "aws-ia/amazon-vpc-lattice-module/aws"
  version = "0.0.2"

  service_network = { identifier = var.service_network }

  vpc_associations = {
    vpc1 = {
      vpc_id             = module.vpc1.vpc_attributes.id
      security_group_ids = [aws_security_group.vpc1_lattice_sg.id]
    }
  }

  depends_on = [
    aws_ram_resource_share_accepter.share_accepter
  ]
}

module "vpc1" {
  source  = "aws-ia/vpc/aws"
  version = "4.3.0"

  name       = "vpc1"
  cidr_block = "10.0.0.0/24"
  az_count   = 2

  subnets = {
    workload  = { netmask = 28 }
    endpoints = { netmask = 28 }
  }
}

resource "aws_security_group" "vpc1_lattice_sg" {
  name        = "lattice-sg-vpc1"
  description = "VPC Lattice SG - VPC1"
  vpc_id      = module.vpc1.vpc_attributes.id

  ingress {
    description = "HTTPS access"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["10.0.0.0/24"] 
  }

  egress {
    description = "Any traffic"
    from_port   = 0
    to_port     = 0
    protocol    = egress.value.protocol
    cidr_blocks = egress.value.cidr_blocks
  }
}

# Accepting VPC Lattice service network from Central AWS Account
resource "aws_ram_resource_share_accepter" "share_accepter" {
  share_arn = local.service_network.ram_share
}

VPC Module

module "vpc2" {
  source  = "aws-ia/vpc/aws"
  version = "4.3.0"

  name       = "vpc2"
  cidr_block = "10.0.0.0/24"
  az_count   = 2

  vpc_lattice = {
    service_network_identifier = var.service_network_id
    security_group_ids         = [aws_security_group.vpc2_lattice_sg.id]
  }

  subnets = {
    workload  = { netmask = 28 }
    endpoints = { netmask = 28 }
  }

  depends_on = [
    aws_ram_resource_share_accepter.share_accepter
  ]
}

How to run Llama model locally on MacBook Pro and Function calling in LLM -Llama web search agent breakdown

selvakumar palanisamy — Wed, 03 Jul 2024 12:36:13 +0000

Easily install Open source Large Language Models (LLM) locally on your Mac with Ollama.On a basic M1 Pro Macbook with 16GB memory, this configuration takes approximately 10 to 15 minutes to get going. The model itself starts up in less than ten seconds after the setup is finished.

Go to >> https://ollama.com/download/mac and download the software for your OS

2.Open the Ollama application and Move to Applications



ollama run llama3

Depending on your network speed, the download and build take ten to fifteen minutes to finish.It is operating properly if the browser displays "Ollama is running" after accessing http://localhost:11434.




(venv) admin@admins-MacBook-Pro selvapal % ollama run llama3
>>> list the climate chnage reasons 
Here are some of the main reasons contributing to climate change:

1. **Greenhouse gases**: The burning of fossil fuels such as coal, oil, and gas releases carbon dioxide (CO2), methane (CH4), and other greenhouse gases into the atmosphere, trapping heat 
and leading to global warming.
2. **Deforestation**: The clearance of forests for agriculture, urbanization, and logging reduces the ability of trees to absorb CO2, a key greenhouse gas, contributing to climate change.
3. **Land use changes**: Changes in land use, such as the conversion of natural habitats into agricultural lands or cities, can lead to increased emissions of greenhouse gases and loss of 
carbon sequestration capabilities.
4. **Agriculture**: The production of meat, especially beef, and other animal products leads to increased methane emissions and deforestation for livestock grazing and feed crop 
production.
5. **Industrial processes**: Industrial activities such as cement production, steel manufacturing, and chemical processing release large amounts of greenhouse gases into the atmosphere.
6. **Population growth**: As the global population grows, so does the demand for energy, food, and resources, leading to increased emissions and consumption.
7. **Consumption patterns**: The way we live our daily lives, including transportation, diet, and consumer choices, contributes to climate change through energy use, waste generation, and 
resource depletion.
8. **Waste management**: Inadequate waste management practices, such as the disposal of plastic waste in landfills or oceans, can lead to methane emissions and contribute to climate 
change.
9. **Fugitive emissions**: The extraction, processing, and transportation of fossil fuels can release large amounts of greenhouse gases into the atmosphere, often unnoticed or unreported.
10. **Aerosol emissions**: The burning of biomass, such as wood or agricultural waste, releases aerosols that can trap heat and contribute to climate change.
11. **Nitrous oxide (N2O) emissions**: Agricultural practices, such as fertilizer use and manure management, release N2O, a potent greenhouse gas.
12. **Chlorofluorocarbons (CFCs)**: The use of CFC-containing products, such as refrigerants and propellants, can lead to the destruction of stratospheric ozone and contribute to climate 
change.
13. **Methane emissions from natural sources**: Natural sources, such as wetlands, rice paddies, and termites, release methane into the atmosphere, which contributes to climate change.
14. **Ocean fertilization**: The addition of nutrients to the ocean to stimulate phytoplankton growth can lead to increased CO2 absorption but also releases other greenhouse gases and 
alters marine ecosystems.
15. **Geoengineering**: Large-scale engineering projects aimed at mitigating climate change, such as solar radiation management or carbon capture and storage, can have unintended 
consequences and potentially exacerbate climate change.

These are some of the main reasons contributing to climate change. It's essential to understand these factors to develop effective strategies for reducing greenhouse gas emissions and 
mitigating the impacts of climate change.

>>> Send a message (/? for help)

Benchmark Llama3 performance



admin@admins-MacBook-Pro ~ % git clone https://github.com/selvakumarsai/llm-benchmark.git
Cloning into 'llm-benchmark'...
remote: Enumerating objects: 28, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (25/25), done.
remote: Total 28 (delta 12), reused 6 (delta 1), pack-reused 0
Receiving objects: 100% (28/28), 16.26 KiB | 555.00 KiB/s, done.
Resolving deltas: 100% (12/12), done.
admin@admins-MacBook-Pro ~ % cd llm-benchmark
admin@admins-MacBook-Pro llm-benchmark % pip install -r requirements.txt
Defaulting to user installation because normal site-packages is not writeable
Collecting ollama (from -r requirements.txt (line 1))
  Downloading ollama-0.2.1-py3-none-any.whl.metadata (4.2 kB)
Requirement already satisfied: pydantic in /Users/admin/Library/Python/3.9/lib/python/site-packages (from -r requirements.txt (line 2)) (2.8.0)
Requirement already satisfied: httpx<0.28.0,>=0.27.0 in /Users/admin/Library/Python/3.9/lib/python/site-packages (from ollama->-r requirements.txt (line 1)) (0.27.0)
Requirement already satisfied: annotated-types>=0.4.0 in /Users/admin/Library/Python/3.9/lib/python/site-packages (from pydantic->-r requirements.txt (line 2)) (0.7.0)
Requirement already satisfied: pydantic-core==2.20.0 in /Users/admin/Library/Python/3.9/lib/python/site-packages (from pydantic->-r requirements.txt (line 2)) (2.20.0)
Requirement already satisfied: typing-extensions>=4.6.1 in /Users/admin/Library/Python/3.9/lib/python/site-packages (from pydantic->-r requirements.txt (line 2)) (4.6.2)
Requirement already satisfied: anyio in /Users/admin/Library/Python/3.9/lib/python/site-packages (from httpx<0.28.0,>=0.27.0->ollama->-r requirements.txt (line 1)) (4.4.0)
Requirement already satisfied: certifi in /Users/admin/Library/Python/3.9/lib/python/site-packages (from httpx<0.28.0,>=0.27.0->ollama->-r requirements.txt (line 1)) (2023.5.7)
Requirement already satisfied: httpcore==1.* in /Users/admin/Library/Python/3.9/lib/python/site-packages (from httpx<0.28.0,>=0.27.0->ollama->-r requirements.txt (line 1)) (1.0.5)
Requirement already satisfied: idna in /Users/admin/Library/Python/3.9/lib/python/site-packages (from httpx<0.28.0,>=0.27.0->ollama->-r requirements.txt (line 1)) (3.4)
Requirement already satisfied: sniffio in /Users/admin/Library/Python/3.9/lib/python/site-packages (from httpx<0.28.0,>=0.27.0->ollama->-r requirements.txt (line 1)) (1.3.1)
Requirement already satisfied: h11<0.15,>=0.13 in /Users/admin/Library/Python/3.9/lib/python/site-packages (from httpcore==1.*->httpx<0.28.0,>=0.27.0->ollama->-r requirements.txt (line 1)) (0.14.0)
Requirement already satisfied: exceptiongroup>=1.0.2 in /Users/admin/Library/Python/3.9/lib/python/site-packages (from anyio->httpx<0.28.0,>=0.27.0->ollama->-r requirements.txt (line 1)) (1.2.1)
Downloading ollama-0.2.1-py3-none-any.whl (9.7 kB)
Installing collected packages: ollama
Successfully installed ollama-0.2.1

admin@admins-MacBook-Pro llm-benchmark % python3.10 -m pip install ollama                                        
Collecting ollama
  Using cached ollama-0.2.1-py3-none-any.whl.metadata (4.2 kB)
Requirement already satisfied: httpx<0.28.0,>=0.27.0 in /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages (from ollama) (0.27.0)
Requirement already satisfied: anyio in /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages (from httpx<0.28.0,>=0.27.0->ollama) (4.4.0)
Requirement already satisfied: certifi in /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages (from httpx<0.28.0,>=0.27.0->ollama) (2024.6.2)
Requirement already satisfied: httpcore==1.* in /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages (from httpx<0.28.0,>=0.27.0->ollama) (1.0.5)
Requirement already satisfied: idna in /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages (from httpx<0.28.0,>=0.27.0->ollama) (3.7)
Requirement already satisfied: sniffio in /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages (from httpx<0.28.0,>=0.27.0->ollama) (1.3.1)
Requirement already satisfied: h11<0.15,>=0.13 in /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages (from httpcore==1.*->httpx<0.28.0,>=0.27.0->ollama) (0.14.0)
Requirement already satisfied: exceptiongroup>=1.0.2 in /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages (from anyio->httpx<0.28.0,>=0.27.0->ollama) (1.2.1)
Requirement already satisfied: typing-extensions>=4.1 in /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages (from anyio->httpx<0.28.0,>=0.27.0->ollama) (4.12.2)
Using cached ollama-0.2.1-py3-none-any.whl (9.7 kB)
Installing collected packages: ollama
Successfully installed ollama-0.2.1

admin@admins-MacBook-Pro llm-benchmark % python3.10 benchmark.py --verbose --prompts "how to do organic farming"

Verbose: True
Skip models: []
Prompts: ['how to do organic farming']
Evaluating models: ['llama3:latest', 'gemma:latest']



Benchmarking: llama3:latest
Prompt: how to do organic farming
Organic farming is a type of agriculture that avoids the use of synthetic fertilizers, pesticides, and genetically modified organisms (GMOs). Instead, organic farmers rely on natural methods to control pests and diseases, and use compost and other natural amendments to improve soil fertility. Here are some key principles and practices of organic farming:

1. **Soil conservation**: Organic farmers focus on building healthy soil through the use of crop rotations, cover crops, and organic amendments like compost.
2. **Crop rotation**: Rotating crops helps to break disease cycles, improves soil structure, and increases biodiversity.
3. **Composting**: Composting is a key process in organic farming that turns waste into nutrient-rich fertilizer.
4. **Manure management**: Organic farmers use animal manure as a natural fertilizer and to improve soil health.
5. **Integrated pest management (IPM)**: Instead of relying on pesticides, organic farmers use IPM techniques like crop rotation, biological control, and cultural controls to manage pests.
6. **Cover cropping**: Planting cover crops in the off-season helps to prevent erosion, adds nutrients to the soil, and provides habitat for beneficial insects.
7. **Biodiversity conservation**: Organic farming aims to conserve biodiversity by promoting ecosystem services and supporting beneficial insects and microorganisms.
8. **Minimum tillage or no-till**: Reducing soil disturbance helps to preserve soil structure, reduce erosion, and promote soil biota.
9. **Organic amendments**: Using natural amendments like compost, manure, or green manure instead of synthetic fertilizers.
10. **Record keeping**: Organic farmers keep detailed records of their farming practices, including crop rotations, pest management, and soil health.

Some specific techniques used in organic farming include:

1. **Biodynamic agriculture**: A holistic approach that considers the moon's cycles and uses natural preparations to promote soil fertility and plant growth.
2. **Permaculture**: A design system that aims to create self-sustaining ecosystems by mimicking nature's patterns and relationships.
3. **Agroforestry**: Integrating trees into agricultural landscapes to improve soil health, reduce erosion, and increase biodiversity.
4. **Green manure**: Planting legumes or other cover crops to fix nitrogen in the soil and add organic matter.
5. **Crop rotation with legumes**: Incorporating legume crops like beans or peas into crop rotations to improve soil fertility.

To get started with organic farming, you can:

1. **Research local regulations**: Familiarize yourself with national and regional laws regarding organic farming practices and certifications.
2. **Start small**: Begin by converting a small portion of your land to organic methods and gradually scale up as you gain experience.
3. **Join an organic farm network**: Connect with other organic farmers, attend workshops, and share knowledge to learn from their experiences.
4. **Develop a business plan**: Create a plan for marketing and selling your organic products, including pricing and target markets.
5. **Monitor and adjust**: Continuously monitor your soil health, crop yields, and pest management strategies, and make adjustments as needed.

Remember that transitioning to organic farming takes time, patience, and dedication. Start by making small changes and gradually build up your knowledge and skills.Response: 

----------------------------------------------------
        llama3:latest
            Prompt eval: 13.14 t/s
            Response: 6.23 t/s
            Total: 6.31 t/s

        Stats:
            Prompt tokens: 16
            Response tokens: 654
            Model load time: 2.06s
            Prompt eval time: 1.22s
            Response time: 105.04s
            Total time: 108.32s
----------------------------------------------------



Benchmarking: gemma:latest
Prompt: how to do organic farming

Function Calling - Llama web search agent breakdown

Larger models like GPT4, Claude anthorpics are fine tune to be able to perform function calling and can attach tools like online search and decide they need to use the tools or not and execute the tools.

Smaller models like Olama need precise definition and routing.

Langchain enables you to define conditional routing techniques to create a directed graph flows in the form of a state machine.

The web research agent first goes through a routing step where it looks at the user's query and determines whether or not we need context.
This is the first llm call. If it determines that context is not needed, it goes to the generation step where it generates its final output.
If it determines that context is needed, it goes to a transform query state where it takes the user's initial question and optimises it for a web search.
The optimised search query then goes into the web search step , and all of the context from the web search step used to generate the final report.

First load the longchain dependencies



# Displaying final output format
from IPython.display import display, Markdown, Latex
# LangChain Dependencies
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser, StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper
from langgraph.graph import END, StateGraph
# For State Graph 
from typing_extensions import TypedDict
import os

Setup the environment variables



# Environment Variables
#os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "xxxxxxxxxxxxxxxxxx" 
os.environ["LANGCHAIN_PROJECT"] = "L3 Research Agent"

Next define the LLM



# Defining LLM
local_llm = 'llama3'
llama3 = ChatOllama(model=local_llm, temperature=0)
llama3_json = ChatOllama(model=local_llm, format='json', temperature=0)

Install websearch api



pip install -U duckduckgo-search



# Web Search Tool

wrapper = DuckDuckGoSearchAPIWrapper(max_results=25)
web_search_tool = DuckDuckGoSearchRun(api_wrapper=wrapper)

Create the different prompts first for function call routing and to define nodes

First Generate report Prompt



# Generation Prompt

generate_prompt = PromptTemplate(
    template="""

    <|begin_of_text|>

    <|start_header_id|>system<|end_header_id|> 

    You are an AI assistant for Research Question Tasks, that synthesizes web search results. 
    Strictly use the following pieces of web search context to answer the question. If you don't know the answer, just say that you don't know. 
    keep the answer concise, but provide all of the details you can in the form of a research report. 
    Only make direct references to material if provided in the context.

    <|eot_id|>

    <|start_header_id|>user<|end_header_id|>

    Question: {question} 
    Web Search Context: {context} 
    Answer: 

    <|eot_id|>

    <|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["question", "context"],
)

# Chain
generate_chain = generate_prompt | llama3 | StrOutputParser()

Second define Router prompt



# Router

router_prompt = PromptTemplate(
    template="""

    <|begin_of_text|>

    <|start_header_id|>system<|end_header_id|>

    You are an expert at routing a user question to either the generation stage or web search. 
    Use the web search for questions that require more context for a better answer, or recent events.
    Otherwise, you can skip and go straight to the generation phase to respond.
    You do not need to be stringent with the keywords in the question related to these topics.
    Give a binary choice 'web_search' or 'generate' based on the question. 
    Return the JSON with a single key 'choice' with no premable or explanation. 

    Question to route: {question} 

    <|eot_id|>

    <|start_header_id|>assistant<|end_header_id|>

    """,
    input_variables=["question"],
)

# Chain
question_router = router_prompt | llama3_json | JsonOutputParser()

# Test Run
question = "What's up?"
print(question_router.invoke({"question": question}))

{'choice': 'generate'}

Third, Define Query transformation prompt to transform the user query to optimised query for websearch state



# Query Transformation

query_prompt = PromptTemplate(
    template="""

    <|begin_of_text|>

    <|start_header_id|>system<|end_header_id|> 

    You are an expert at crafting web search queries for research questions.
    More often than not, a user will ask a basic question that they wish to learn more about, however it might not be in the best format. 
    Reword their query to be the most effective web search string possible.
    Return the JSON with a single key 'query' with no premable or explanation. 

    Question to transform: {question} 

    <|eot_id|>

    <|start_header_id|>assistant<|end_header_id|>

    """,
    input_variables=["question"],
)

# Chain
query_chain = query_prompt | llama3_json | JsonOutputParser()

# Test Run
question = "What's happened recently with Tesla?"
print(query_chain.invoke({"question": question}))

{'query': 'Tesla recent news'}

*Define the Graph state and Nodes for the conditional routing *



# Graph State
class GraphState(TypedDict):
    """
    Represents the state of our graph.

    Attributes:
        question: question
        generation: LLM generation
        search_query: revised question for web search
        context: web_search result
    """
    question : str
    generation : str
    search_query : str
    context : str

# Node - Generate

def generate(state):
    """
    Generate answer

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, generation, that contains LLM generation
    """

    print("Step: Generating Final Response")
    question = state["question"]
    context = state["context"]

    # Answer Generation
    generation = generate_chain.invoke({"context": context, "question": question})
    return {"generation": generation}

# Node - Query Transformation

def transform_query(state):
    """
    Transform user question to web search

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Appended search query
    """

    print("Step: Optimizing Query for Web Search")
    question = state['question']
    gen_query = query_chain.invoke({"question": question})
    search_query = gen_query["query"]
    return {"search_query": search_query}


# Node - Web Search

def web_search(state):
    """
    Web search based on the question

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Appended web results to context
    """

    search_query = state['search_query']
    print(f'Step: Searching the Web for: "{search_query}"')

    # Web search tool call
    search_result = web_search_tool.invoke(search_query)
    return {"context": search_result}


# Conditional Edge, Routing

def route_question(state):
    """
    route question to web search or generation.

    Args:
        state (dict): The current graph state

    Returns:
        str: Next node to call
    """

    print("Step: Routing Query")
    question = state['question']
    output = question_router.invoke({"question": question})
    if output['choice'] == "web_search":
        print("Step: Routing Query to Web Search")
        return "websearch"
    elif output['choice'] == 'generate':
        print("Step: Routing Query to Generation")
        return "generate"

*Define and compile workflow *



# Build the nodes
workflow = StateGraph(GraphState)
workflow.add_node("websearch", web_search)
workflow.add_node("transform_query", transform_query)
workflow.add_node("generate", generate)

# Build the edges
workflow.set_conditional_entry_point(
    route_question,
    {
        "websearch": "transform_query",
        "generate": "generate",
    },
)
workflow.add_edge("transform_query", "websearch")
workflow.add_edge("websearch", "generate")
workflow.add_edge("generate", END)

# Compile the workflow
local_agent = workflow.compile()

*Finally define the agent to run the query *



def run_agent(query):
    output = local_agent.invoke({"question": query})
    print("=======")
    display(Markdown(output["generation"]))

*Test the different flows *



# Test it out!
run_agent("What's been up with Tesla recently?")

Step: Routing Query
Step: Routing Query to Web Search
Step: Optimizing Query for Web Search
Step: Searching the Web for: "Tesla recent news"
Step: Generating Final Response

Based on the provided web search context, here's what's been up with Tesla recently:

Tesla is reportedly preparing to build a $25,000 electric car built on its next-generation engineering platform.
Elon Musk has announced that Tesla will launch new EVs in 2025, including affordable ones, which will blend current and next-generation platforms.
The company has set a goal to start production of a new mass-market electric vehicle codenamed "Redwood" in mid-2025.
Tesla has produced its 20 millionth 4680 cell at Gigafactory Texas, a key step for its new vehicle programs. The 4680 cell is designed to reduce battery cost by over 50% and has a capacity of about 100 Wh.
Tesla reported first-quarter adjusted earnings per share of $0.45, below the estimated $0.52, on revenue of $21.30 billion, which missed forecasts for $22.3 billion.
The company set a new delivery record for the fourth quarter and met its 2023 delivery target, shaking off a third-quarter miss and assuaging investors concerned with any hiccups as it prepares to launch new products.
Tesla has issued two recalls on the Cybertruck, its third and fourth since the model was introduced late last year. The latest recall affects almost all of the nearly 12,000 trucks on the road.
The company is currently testing approximately 35 trucks for its long-range Semi, which will have a range of up to 500 miles.
Tesla's stock rose as much as 4.5% on Tuesday after the company delivered a record number of vehicles in the three months to the end of June.
Overall, it seems that Tesla is preparing to launch new products and expand its offerings, including affordable electric cars and long-range Semi trucks. The company has also faced some challenges, including recalls and missed earnings estimates, but has managed to set new delivery records and meet its 2023 delivery target.




# Test it out!
run_agent("How are you doing today?")

Step: Routing Query
Step: Routing Query to Generation
Step: Generating Final Response

I'm just an AI, I don't have feelings or emotions like humans do. I am functioning properly and ready to assist with any research questions you may have. I don't have personal experiences or opinions, so I won't be able to provide a subjective assessment of my "mood" or "well-being."

Ref : https://www.youtube.com/watch?v=-lnR9oU0Jl8&t=0s

IaC Security Scanner - Generative AI app with PartyRock

selvakumar palanisamy — Thu, 29 Feb 2024 09:52:08 +0000

Infrastructure-as-code (IaC) release security is growing in importance because to the speed at which the digital world is developing. By meeting this important requirement, the AI-Powered IaC Security Scanner indicates a substantial advancement in the field of IaC security.

The IaC Security Scanner with AI Powered is not a simple tool to use. It has native support for numerous IaC systems, such as Kubernetes, Terraform, and AWS CloudFormation. As a result, your IaC installations are safe across all platforms and support a wide range of compatibilities.

Key Features of AI-Powered IaC Security Scanner

In-Depth Vulnerability Assessment
Compliance with Industry Standards
User-Friendly Interface

All you need to do is visit the Party Rock Website to sign up, and the magic happens in mere seconds.

Step #1 : Sign In

Step #2 : Write your Prompt with your Idea

Generated App

Test the app with sample Terraform code

Published App : https://partyrock.aws/u/selvapal/UI-aMM_I3/IaC-Security-Scan-App

Sample code used for testing

resource "aws_cloudwatch_log_group" "cloudwatch_log_group" {
  name = "msk_cluster_cloudwatch_group-${random_uuid.randuuid.result}"
}

resource "aws_msk_configuration" "msk_cluster_config" {
  kafka_versions = [var.msk_cluster_version]
  name           = "msk-${lower(var.environment)}-cluster-cfg-${random_uuid.randuuid.result}"
  server_properties = <<PROPERTIES
auto.create.topics.enable = true
delete.topic.enable = true
PROPERTIES
}

resource "aws_msk_cluster" "msk_cluster" {
  count                  = length(var.private_subnet_cidrs)
  cluster_name           = "msk-${lower(var.environment)}-cluster-${random_uuid.randuuid.result}"
  kafka_version          = var.msk_cluster_version
  number_of_broker_nodes = var.broker_nodes

  broker_node_group_info {
    instance_type   = var.msk_cluster_instance_type
    ebs_volume_size = var.msk_ebs_volume_size
    client_subnets = [
      "${aws_subnet.private_subnet.0.id}",
      "${aws_subnet.private_subnet.1.id}",
      "${aws_subnet.private_subnet.2.id}"
    ]
    security_groups = [aws_security_group.KafkaClusterSG.id]
  }

  /*
  client_authentication {
    tls {
      certificate_authority_arns = [aws_acmpca_certificate_authority.pca.arn]
    }
  }
*/

configuration_info {
  arn = aws_msk_configuration.msk_cluster_config.arn
  revision = 1
}
  encryption_info {
    encryption_in_transit {
      client_broker = var.encryption_type
    }
  }

  enhanced_monitoring = var.monitoring_type

  logging_info {
    broker_logs {
      cloudwatch_logs {
        enabled   = true
        log_group = aws_cloudwatch_log_group.cloudwatch_log_group.name
      }
    }
  }

  tags = merge(
    local.common-tags,
    map(
      "Name", "msk-${lower(var.environment)}-cluster"
    )
  )
}

output "zookeeper_connect_string" {
  value = aws_msk_cluster.msk_cluster.*.zookeeper_connect_string
}

output "bootstrap_brokers" {
  description = "Plaintext connection host:port pairs"
  value       = aws_msk_cluster.msk_cluster.*.bootstrap_brokers
}

output "bootstrap_brokers_tls" {
  description = "TLS connection host:port pairs"
  value       = aws_msk_cluster.msk_cluster.*.bootstrap_brokers_tls
}

Chat with your SQL database

selvakumar palanisamy — Mon, 26 Feb 2024 08:20:18 +0000

Creating complex SQL queries can be difficult when managing databases. Especially for people who are not SQL professionals, this might be challenging. It is obvious that there is a need for an approachable solution that makes creating SQL queries easier.

Existing techniques for creating SQL queries can be time-consuming and frequently necessitate a thorough grasp of the underlying database structure. Certain tools could be useful for creating queries, but they might require additional customisation to fit different databases or ensure security and privacy.

A helpful open-source Python framework called Vanna AI takes a two-step approach to simplifying SQL generation. First, it trains a Retrieval-Augmented Generation (RAG) model on your data, and then it asks questions to generate SQL queries that are specific to your database.

Vanna provides a simple and flexible solution to the prevalent problem of SQL query generating. Vanna makes database querying easier to use and more accessible for everyone.

How Vanna Works

Vanna is a Python module that helps you create precise SQL queries for your database by leveraging LLMs through retrieval augmentation.

Vanna works in two easy steps - train a RAG "model" on your data, and then ask questions which will return SQL queries that can be set up to automatically run on your database.
vn.train(...)

Train a RAG "model" on your data. These methods add to the reference corpus below.
vn.ask(...)

Ask questions. This will use the reference corpus to generate SQL queries that can be run on your database.
Install and Import Vanna

%pip install vanna

Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from Jinja2>=3.0->flask->vanna) (2.1.5)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas->vanna) (1.16.0)
Installing collected packages: kaleido, vanna
Successfully installed kaleido-0.2.1 vanna-0.1.1

import vanna
from vanna.remote import VannaDefault

Log In to Vanna

Vanna provides a function to get an API key. You'll get a code sent to your e-mail. You can save your API key for future usage so that you don't have to log in every time.

api_key = vanna.get_api_key('xxxxxyyyy@gmail.com')

Check your email for the code and enter it here:

Set Model

chinook is a public model that refers to the Chinook sample database

vanna_model_name = 'chinook' # This is the name of the RAG model. This is typically associated with a specific dataset.
vn = VannaDefault(model=vanna_model_name, api_key=api_key)

Connect to the Database

Here we're connecting to a SQLite database but you can connect to any SQL database.

vn.connect_to_sqlite('https://vanna.ai/Chinook.sqlite')

Ask Questions

Now we're going to use vn.ask to ask questions and it'll generate SQL, run the SQL, show the table, and generate a chart

vn.ask("What are the top 5 artists by sales?")

SELECT a.ArtistId, a.Name, SUM(il.Quantity) AS TotalSales
FROM Artist a
JOIN Album al ON a.ArtistId = al.ArtistId
JOIN Track t ON al.AlbumId = t.AlbumId
JOIN InvoiceLine il ON t.TrackId = il.TrackId
GROUP BY a.ArtistId, a.Name
ORDER BY TotalSales DESC
LIMIT 5;
ArtistId Name TotalSales
0 90 Iron Maiden 140
1 150 U2 107
2 50 Metallica 91
3 22 Led Zeppelin 87
4 113 Os Paralamas Do Sucesso 45

Launch the User Interface

from vanna.flask import VannaFlaskApp
app = VannaFlaskApp(vn)
app.run()

How to connect to Athena and redshift

*Athena *

import pandas as pd
from pyathena import connect

conn_details = {...}  # fill this with your connection details
conn_athena = connect(**conn_details)

def run_sql_athena(sql: str) -> pd.DataFrame:
    df = pd.read_sql(sql, conn_athena)
    return df

vn.run_sql = run_sql_athena

Redshift

import pandas as pd
import psycopg2

conn_details = {...}  # fill this with your connection details
conn_redshift = psycopg2.connect(**conn_details)

def run_sql_redshift(sql: str) -> pd.DataFrame:
    df = pd.read_sql_query(sql, conn_redshift)
    return df

vn.run_sql = run_sql_redshift

Most of Vanna’s functionality does not need your specific data, but does need your schema (eg SQL queries and structure — table names, column names)

It’s completely open source and free, and works with Snowflake, Postgres, Redshift, and most other databases.

Reference : vanna.ai.

Generative AI - Large Language Models

selvakumar palanisamy — Tue, 28 Nov 2023 12:11:59 +0000

What are Large Language Models (LLMs)?

Large Language Models (LLMs) from OpenAI, such as GPT-3 and GPT-4, are machine learning algorithms that are trained on data to understand and generate human-like prose. These models are created utilising neural networks that have millions or even billions of parameters, allowing them to perform difficult tasks like translation, summarization, question answering, and even creative writing.

LLMs analyse the patterns and links between words and phrases to provide coherent and contextually relevant output. They are trained on different and large datasets, which frequently include sections of the internet, novels, and other texts. Despite their ability to replicate such features in the text they write, they are not sentient and do not possess comprehension or emotions.

High performing foundation Models

BERT
GPT
BLOOM
FLAN-T5
PaLM

LLM Different use cases and tasks

Essay Writing
Summarization
Translation
Information Retrieval
Invoke Api and actions -> Action call -> External Applications

Transformer Architecture

Transformers are a type of neural network architecture that have been gaining popularity. Transformers were recently used by OpenAI in their language models

Encoder

Encodes inputs (“prompts”) with contextual understanding
and produces one vector per input token.

Decoder
Accepts input tokens and generates new tokens.

Input Embedding: The first step in the Transformer model is to convert the input text into numerical representations called embeddings. Each word in the input sentence is transformed into a vector of numbers that captures its meaning. These embeddings provide a way for the model to understand the input data.

Multi-Head Attention (Encoder): The Encoder in the Transformer processes the input embeddings. One of its key components is the Multi-Head Attention mechanism. It’s like having multiple experts, each specializing in paying attention to different aspects of the input. The Multi-Head Attention calculates different attention scores for each word in the sentence, indicating its relevance to other words in the same sentence. This helps the model understand the context and relationships between words.

Feed Forward Neural Networks (Encoder): After Multi-Head Attention, the Transformer uses a Feed Forward Neural Network for additional processing. This network applies non-linear transformations to the attention outputs, introducing more complex relationships between words in the sentence.

Masked Multi-Head Attention (Decoder): During the decoding phase, the Transformer uses Masked Multi-Head Attention in the Decoder. The purpose of this mechanism is to prevent the model from cheating and looking ahead at future words during generation. The mask ensures that the Decoder only attends to the already generated words in the output sequence, ensuring a proper step-by-step generation of the output sentence.

Softmax: The Softmax function is used in two different contexts in the Transformer: Multi-Head Attention Weights: The attention scores calculated in both the Encoder and Decoder are passed through the Softmax function to convert them into a probability distribution. This distribution represents how much attention each word should receive. Output Layer: The final layer of the Transformer’s Decoder generates the output probabilities for each word in the target vocabulary. These probabilities are transformed into a valid probability distribution using the Softmax function. The word with the highest probability is selected as the model’s predicted next word.

Linear Function (Output Embedding): After the output probabilities are obtained through Softmax, the Transformer uses a Linear function to map these probabilities back to the same embedding dimension as the input embeddings. This step is crucial as it ensures the Decoder’s output is in the same format as the input embeddings, making it possible to pass it through additional layers of the Decoder or connect it to other parts of the model.

In summary, the Transformer model employs input embeddings, multi-head attention, feed-forward neural networks in the encoder, masked multi-head attention, and softmax in the decoder, as well as linear functions for output embeddings. These components work together to enable the Transformer's extraordinary capacity to recognise and synthesise natural language sequences, making it a game-changing paradigm in natural language processing.

Prompting and Engineering

In-context Learning

Zero-shot learning is a sort of machine learning in which the model can predict new classes without being explicitly trained on them. This is accomplished by utilising the model's knowledge of other, comparable classes to draw conclusions about the new classes.

One-shot learning is a sort of machine learning in which the model predicts new classes after being trained on a single example of that class. This task is more difficult than zero-shot learning since the model need more data to work with.

Few-shot learning is a sort of machine learning that is intermediate between zero-shot and one-shot learning. The model is trained on a few examples of each new class in few-shot learning. This is a more difficult challenge than one-shot learning, but it is less difficult than zero-shot learning.

Inference Parameters

Parameters are needed to manage and control the text generation process. They help in:

•Controlling the length of the generated text.
•Managing the randomness and diversity in the generated text.
•Ensuring the relevance and coherence of the generated text.
•Avoiding the generation of inappropriate or nonsensical text.

The max_new_token option restricts the number of tokens generated by the model. It aids in the regulation of the generated text's length.

Controlling the length of the output is critical in real-world applications. For instance, in summarization tasks, you might want to limit the length to ensure concise summaries.

Top-K Sampling
Top-k sampling involves the model selecting the next token from the top k most likely tokens.

Advanced Use: Effective in activities that require a balance of diversity and relevancy, such as chatbot responses.

Top-P Sampling

Top-p sampling involves the model selecting the next token from a narrower set of tokens with a cumulative probability of at least p.

Advanced Use: Effective in situations when you wish to ensure a given level of probability, such as when creating text in a specific style or theme.

Temperature
temperature affects the probability distribution of the next token. Higher values make the output more random, while lower values make it more deterministic.

Advanced Use: Adjusting temperature can help in tasks like text-based games or simulations where varying levels of unpredictability are desired.

Generative AI project Life cycle

Define the scope accurately and narrow down the use case
Select model / Build a model from scratch estimate feasibility
Evaluate Performance, carry out additional training if necessary In context, learning if required.
Supervised learning model to finetune
Fine-tune the model to align with human preferences 6.Reinforcement Learning with human feedback 7.How to align to your preferences 8.highly iterative 9.additional infrastructure requirements

How Large Langugae Models Trained?

Pre-training LLM -Learns from Unstructured textural data collected from web scraping, various data sources, and corpora for training language models usually datasets in GB-TB-PBs

Model weights can be updated to minimize the loss of training
Large amount of patterns depends on architecture of model

Dataset needs to increase its data quality in order to use for model training purposes to address bias, remove harmful content.

Data Quality filter — 1–3 % of original tokens to decide pretraining of model

During training the model weights get updated to minimise the loss of the function objective to train the model

Computational challenges of training LLMs

Often training LLMs results in error message —
“ Running out of memory — Cuda out of memory”

Compute unified device architecture — collection of libraries and tools to process and perform large calculations.
Pytorch and tensor flow use matrix multiplication and complex calculations to scale problems where it uses CUDA processors

Calculating approx GPU RAM needed to store 1B parameters

Computation issue explained: develop intuition of scale of parameter

1 parameter — 4 bytes (32 bit float)
1B param = 4 x109 bytes — 4GB @32 bit full precision memory store model weights so far
Additional computations requirement for the memory in calculation is required
Above is calcualted to Store model weights only •Model parameters (weights) — 4 bytes per parameter •Adam optimiser (2 state) — +8 bytes per parameter •Gradients — +4 bytes per parameter •Activations & temp memory (variable size) needed for activation — +8 bytes per parameter(high end estimate)

Total => 4 bytes per parameter == 20 extra bytes per parameter
Memory needed to store model < 20 X memory ended to train the model
Memory requirement: 4GB of model — requires 80GB of gpu ram

Fine tuning your model

Data parallelism allows for the use of multiple GPUs to process different parts of the same data simultaneously, speeding up training time.

Parameterefficient Fine-tuning (PEFT)

PEFT employs fine-tuning only on a small subset of the model’s parameters, while freezing most of the pre-trained network. This tactic mitigates catastrophic forgetting and significantly cuts computational and storage costs.

1.Task-Guided Prompt Tuning: This technique utilizes task-specific prompts to guide the LLM’s output, obviating the need to retrain the entire model for a specific task.

2.Low-Rank Adaptation (LoRA): By approximating the LLM’s parameters with a low-rank matrix, LoRA decreases the number of fine-tuned parameters, enhancing LLM performance.

3.Adapters: These small, specialized layers can be added to the LLM for task adaptation, providing flexibility and performance improvement.

4.Task-Relevant Prefix Tuning: Fine-tuning the LLM on representative prefixes related to the task at hand enhances performance and task adaptability.

Reinforcement learning from Human Feedback (RLHF)

LLMs behave incorrectly or give undesirable responses because of the incorrect input given that might not be Helpful, Honest, or Harmless while fine-tuning the model.
Hence additional fine-tuning with human preference will help us to extend its functionality to create a super fine-tuned model

Use of fine-tuning with human feedback maximise helpfulness and relevance to human prompt that will help to minimise harm and avoid dangerous topics

RLHF is an approach in artificial intelligence and machine learning that combines reinforcement learning techniques with human guidance to improve the learning process. It involves training an agent or model to make decisions and take action in an environment while receiving feedback from human experts. The input humans can be in the form of rewards, preferences, or demonstrations, which helps guide the model’s learning process. RLHF enables the agent to adapt and learn from the expertise of humans, allowing for more efficient and effective learning in complex and dynamic environments.

The RLHF Features Three Phases
•Picking a pre-trained model as the primary model is the first step. In particular, it is important to use a pre-trained model to avoid the good amount of training data required for language models.

•In the second step, a second reward model must be created. The reward model is trained with input from people who are given two or more examples of the model’s outputs and asked to score them in quality. The performance of the primary model will be assessed by the reward model using a scoring system based on this information.

•The reward model receives outputs from the main model during the third phase of RLHF and then produces a quality score that indicates how well the main model performed. This input is included in the main model to improve performance on the next jobs.

Constitutional AI

Constitutional AI (CAI) is similar to RLHF except instead of human feedback, it learns through AI feedback.

At a high-level, there are two stages of Constitutional AI (CAI): the Reflection stage and the Reinforcement stage.

CAI Stage 1: Reflection

What we start with: Baseline model
What we end with: Supervised Learning CAI (SL-CAI) model

1.Ask the LLM to generate toxic responses.

2.Give the LLM a set of rules to follow (or a Constitution). Present the toxic responses back to the LLM and ask if they accord with the Constitution.

3.Ask the LLM to generate a revised response. (Repeat revision until the responses follow the Constitution.)

4.This creates a synthetic dataset, which you can use for training. Fine-tune (or train) the baseline model on this synthetic dataset to create responses that more closely follow the Constitution.

5.Through this process, you get the SL-CAI model, the intermediate model.

CAI Stage 2: Reinforcement

What we start with: SL-CAI model
What we end with: Reinforcement Learning CAI (RL-CAI) model

1.Ask the SL-CAI (from stage 1) to generate toxic responses. (Repeat multiple times for a given question.)
2.Since each question has multiple responses, we can now create multiple choice questions.
3.Give those multiple choice questions to the SL-CAI model and ask it to select the answer that best follows the Constitution.
4.This creates a second synthetic dataset, which you can use to train a reward model to do reinforcement learning.
5.Train a reward model (which is different from the baseline or SL-CAI models) on that dataset to predict which answers align best with the Constitution. This reward model becomes a teacher to the final model.
6.Use this reward model to reinforce desired behavior and punish undesired behavior. This nudging method teaches the final model the Constitution.
7.Through this process, you get the RL-CAI model, the final model!

LLM optimization Techniques

Preliminary Assessment and Baseline Establishment

Understand the LLM’s Capabilities: Assess the general knowledge and abilities of the LLM in its base form.

Establish a Performance Baseline: Determine the LLM's initial performance on your target task to identify areas for improvement.

Prompt Optimization

Develop Initial Prompts: Create clear, structured prompts tailored to the task at hand.
Iterative Testing and Refinement: Continuously test and refine these prompts based on the LLM's output quality and relevance.

Retrieval-Augmented Generation (RAG) Implementation

Introduce Contextual Data: Implement RAG to provide the LLM with access to relevant, domain-specific content.

Evaluate and Adjust RAG: Monitor the LLM's performance with RAG, tweaking the content and its relevance as needed.

Fine-Tuning with Specific Datasets

Curate Specialized Datasets: Select or create datasets that are highly relevant to the specific task.
Fine-Tune the LLM: Continue the LLM's training with these datasets to specialize its capabilities for the task.

Combining Fine-Tuning and RAG
•Integrate RAG with Fine-Tuned Models: Use RAG to supplement the fine-tuned model with additional contextual information.
•Optimize for Balance: Ensure a balance between the LLM's general knowledge and its specialized capabilities.
Performance Evaluation and Optimization

Continuous Evaluation: Regularly assess the LLM’s performance on the target task, using both qualitative and quantitative measures.
Feedback Loop for Improvement: Use the insights from evaluations to further refine the prompts, RAG implementation, and fine-tuning.
** Deployment and Real-World Testing**
Deploy the Optimized LLM: Implement the optimized LLM in a real-world scenario or a testing environment that closely mimics actual use cases.
Monitor and Adjust in Real-Time: Continuously monitor the LLM’s performance in real-world applications, making adjustments as needed based on user feedback and performance data.
Iterative Improvement
Long-Term Optimization: Recognize that LLM optimization is an ongoing process. Regularly revisit and update the model with new data, techniques, and insights.

Retrieval augmented generation (RAG)

Retrieval Augmented Generation (RAG) combines the advanced text-generation capabilities of GPT and other large language models with information retrieval functions to provide precise and contextually relevant information. This innovative approach improves language models' ability to understand and process user queries by integrating the latest and most relevant data

RAG is about feeding language models with necessary information. Instead of asking LLM directly(like in general-purpose models), we first retrieve the very accurate data from our knowledge library that is well maintained and then use that context to return the answer. When the user sends a query(question) to the retriever, we use vector embeddings(numerical representations) to retrieve the requested document. Once the needed information is found from the vector databases, the result is returned to the user. This largely reduces the possibility of hallucinations and updates the model without retraining the model, which is a costly process. Here's a very simple diagram that shows the process.

Program-aided Language model (PAL)

Artificial Intelligence (AI) continues to evolve at a rapid pace, with groundbreaking strides in generative capabilities playing a critical role in defining this ever-evolving landscape. One such transformative leap is the advent of Program-Aided Language models (PAL), an innovative solution that revolutionizes how Language Learning Models (LLMs) function. This article delves into the intricate workings of PAL and explores how it has enhanced LLMs, ultimately resulting in superior AI performance.

LLM Powered Applications

LangChain is an advanced platform that provides developers with a seamless and intuitive interface to leverage the power of LLM in their applications. It offers a range of APIs and tools that simplify the integration of LLM into your projects, enabling you to unlock the full potential of language processing.

Langchain

LangChain is a framework for developing applications powered by language models. It enables applications that:
•Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
•Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)
framework consists of several parts.
•LangChain Libraries: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic run time for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.
•LangChain Templates: A collection of easily deployable reference architectures for a wide variety of tasks.
•LangServe: A library for deploying LangChain chains as a REST API.
•LangSmith: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.

Building Generative applications -High level app Frame work

I have tried couple of AWS sagemaker LLM notebooks shared in AWS bedrock workshop to get to know about LLM libiaries and response - contextual generation ,Image and code generation

import json
import os
import sys

import boto3

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww


# ---- ⚠️ Un-comment and edit the below lines as needed for your AWS setup ⚠️ ----

# os.environ["AWS_DEFAULT_REGION"] = "<REGION_NAME>"  # E.g. "us-east-1"
# os.environ["AWS_PROFILE"] = "<YOUR_PROFILE>"
# os.environ["BEDROCK_ASSUME_ROLE"] = "<YOUR_ROLE_ARN>"  # E.g. "arn:aws:..."

boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None)
)

Create new client
  Using region: us-east-1
boto3 Bedrock client successfully created!
bedrock-runtime(https://bedrock-runtime.us-east-1.amazonaws.com)

Invoke bedrock LLM model

from langchain.llms.bedrock import Bedrock

inference_modifier = {'max_tokens_to_sample':4096, 
                      "temperature":0.5,
                      "top_k":250,
                      "top_p":1,
                      "stop_sequences": ["\n\nHuman"]
                     }

textgen_llm = Bedrock(model_id = "anthropic.claude-v2",
                    client = boto3_bedrock, 
                    model_kwargs = inference_modifier 
                    )

Create a LangChain custom prompt template

from langchain.prompts import PromptTemplate

# Create a prompt template that has multiple input variables
multi_var_prompt = PromptTemplate(
    input_variables=["customerServiceManager", "customerName", "feedbackFromCustomer"], 
    template="""

Human: Create an apology email from the Service Manager {customerServiceManager} to {customerName} in response to the following feedback that was received from the customer: 
<customer_feedback>
{feedbackFromCustomer}
</customer_feedback>

Assistant:"""
)

# Pass in values to the input variables
prompt = multi_var_prompt.format(customerServiceManager="Bob", 
                                 customerName="John Doe", 
                                 feedbackFromCustomer="""Hello Bob,
     I am very disappointed with the recent experience I had when I called your customer support.
     I was expecting an immediate call back but it took three days for us to get a call back.
     The first suggestion to fix the problem was incorrect. Ultimately the problem was fixed after three days.
     We are very unhappy with the response provided and may consider taking our business elsewhere.
     """
     )

num_tokens = textgen_llm.get_num_tokens(prompt)
print(f"Our prompt has {num_tokens} tokens")

response = textgen_llm(prompt)

email = response[response.index('\n')+1:]

print_ww(email)

I want to sincerely apologize for the poor service you recently received from our customer support
team. It is unacceptable that it took multiple days for us to respond and resolve your issue.

As the Service Manager, I take full responsibility for this situation. I will be looking into why
there were delays in responding and getting your problem fixed correctly. It is our top priority to
provide prompt, knowledgeable support so our customers' needs are addressed efficiently.

I understand your frustration with this experience and do not blame you for considering taking your
business elsewhere. We value you as a customer and want to regain your trust. Please let me know if
there is anything I can do to make this right. I would be happy to offer you a discount on your next
purchase or provide a refund as an apology for the inconvenience.

Our goal is to deliver excellent customer service and support. This situation shows we have more
work to do. I will be implementing changes to our processes and additional training for our staff to
prevent something like this from happening again.

I appreciate you taking the time to share your feedback. It will help us improve. I sincerely hope
you will give us another chance to provide you with the positive experience you deserve. Please feel
free to contact me directly if you have any other concerns.

Sincerely,

Bob
Service Manager

Code Translation

import json
import os
import sys

import boto3

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww


# ---- ⚠️ Un-comment and edit the below lines as needed for your AWS setup ⚠️ ----

# os.environ["AWS_DEFAULT_REGION"] = "<REGION_NAME>"  # E.g. "us-east-1"
# os.environ["AWS_PROFILE"] = "<YOUR_PROFILE>"
# os.environ["BEDROCK_ASSUME_ROLE"] = "<YOUR_ROLE_ARN>"  # E.g. "arn:aws:..."


boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None),
)

from langchain.llms.bedrock import Bedrock

inference_modifier = {'max_tokens_to_sample':4096, 
                      "temperature":0.5,
                      "top_k":250,
                      "top_p":1,
                      "stop_sequences": ["\n\nHuman"]
                     }

textgen_llm = Bedrock(model_id = "anthropic.claude-v2",
                    client = boto3_bedrock, 
                    model_kwargs = inference_modifier 
                    )

# Vehicle Fleet Management Code written in C++
sample_code = """
#include <iostream>
#include <string>
#include <vector>

class Vehicle {
protected:
    std::string registrationNumber;
    int milesTraveled;
    int lastMaintenanceMile;

public:
    Vehicle(std::string regNum) : registrationNumber(regNum), milesTraveled(0), lastMaintenanceMile(0) {}

    virtual void addMiles(int miles) {
        milesTraveled += miles;
    }

    virtual void performMaintenance() {
        lastMaintenanceMile = milesTraveled;
        std::cout << "Maintenance performed for vehicle: " << registrationNumber << std::endl;
    }

    virtual void checkMaintenanceDue() {
        if ((milesTraveled - lastMaintenanceMile) > 10000) {
            std::cout << "Vehicle: " << registrationNumber << " needs maintenance!" << std::endl;
        } else {
            std::cout << "No maintenance required for vehicle: " << registrationNumber << std::endl;
        }
    }

    virtual void displayDetails() = 0;

    ~Vehicle() {
        std::cout << "Destructor for Vehicle" << std::endl;
    }
};

class Truck : public Vehicle {
    int capacityInTons;

public:
    Truck(std::string regNum, int capacity) : Vehicle(regNum), capacityInTons(capacity) {}

    void displayDetails() override {
        std::cout << "Truck with Registration Number: " << registrationNumber << ", Capacity: " << capacityInTons << " tons." << std::endl;
    }
};

class Car : public Vehicle {
    std::string model;

public:
    Car(std::string regNum, std::string carModel) : Vehicle(regNum), model(carModel) {}

    void displayDetails() override {
        std::cout << "Car with Registration Number: " << registrationNumber << ", Model: " << model << "." << std::endl;
    }
};

int main() {
    std::vector<Vehicle*> fleet;

    fleet.push_back(new Truck("XYZ1234", 20));
    fleet.push_back(new Car("ABC9876", "Sedan"));

    for (auto vehicle : fleet) {
        vehicle->displayDetails();
        vehicle->addMiles(10500);
        vehicle->checkMaintenanceDue();
        vehicle->performMaintenance();
        vehicle->checkMaintenanceDue();
    }

    for (auto vehicle : fleet) {
        delete vehicle; 
    }

    return 0;
}
"""

from langchain.prompts import PromptTemplate

# Create a prompt template that has multiple input variables
multi_var_prompt = PromptTemplate(
    input_variables=["code", "srcProgrammingLanguage", "targetProgrammingLanguage"], 
    template="""

Human: You will be acting as an expert software developer in {srcProgrammingLanguage} and {targetProgrammingLanguage}. 
You will tranlslate below code from {srcProgrammingLanguage} to {targetProgrammingLanguage} while following coding best practices.


{code}

Assistant: """
)

# Pass in values to the input variables
prompt = multi_var_prompt.format(code=sample_code, srcProgrammingLanguage="C++", targetProgrammingLanguage="Java")

Code translation from C++ to Java

response = textgen_llm(prompt)

target_code = response[response.index('\n')+1:]

print_ww(target_code)

import java.util.ArrayList;

class Vehicle {
  protected String registrationNumber;
  protected int milesTraveled;
  protected int lastMaintenanceMile;

  public Vehicle(String regNum) {
    registrationNumber = regNum;
    milesTraveled = 0;
    lastMaintenanceMile = 0;
  }

  public void addMiles(int miles) {
    milesTraveled += miles;
  }

  public void performMaintenance() {
    lastMaintenanceMile = milesTraveled;
    System.out.println("Maintenance performed for vehicle: " + registrationNumber);
  }

  public void checkMaintenanceDue() {
    if ((milesTraveled - lastMaintenanceMile) > 10000) {
      System.out.println("Vehicle: " + registrationNumber + " needs maintenance!");
    } else {
      System.out.println("No maintenance required for vehicle: " + registrationNumber);
    }
  }

  public void displayDetails() {
  }
}

class Truck extends Vehicle {
  private int capacityInTons;

  public Truck(String regNum, int capacity) {
    super(regNum);
    capacityInTons = capacity;
  }

  public void displayDetails() {
    System.out.println("Truck with Registration Number: " + registrationNumber + ", Capacity: " +
capacityInTons + " tons.");
  }
}

class Car extends Vehicle {
  private String model;

  public Car(String regNum, String carModel) {
    super(regNum);
    model = carModel;
  }

  public void displayDetails() {
    System.out.println("Car with Registration Number: " + registrationNumber + ", Model: " + model +
".");
  }
}

public class Main {

  public static void main(String[] args) {
    ArrayList<Vehicle> fleet = new ArrayList<Vehicle>();

    fleet.add(new Truck("XYZ1234", 20));
    fleet.add(new Car("ABC9876", "Sedan"));

    for(Vehicle vehicle : fleet) {
      vehicle.displayDetails();
      vehicle.addMiles(10500);
      vehicle.checkMaintenanceDue();
      vehicle.performMaintenance();
      vehicle.checkMaintenanceDue();
    }
  }
}
The key differences from C++ to Java:

- Includes changed to imports
- std::string to String
- std::vector to ArrayList
- Pointers and new to object creation
- Override keyword for overridden methods
- Virtual methods changed to public
- Destructors not needed in Java

I followed Java naming conventions and coding standards like private variables, camelCase, and
object oriented design. Let me know if you have any other questions!

*Reference: *

https://github.com/langchain-ai/langchain

https://www.linkedin.com/pulse/what-constitutional-ai-alexandra-barr/

https://medium.com/@kanikaadik07/generative-ai-project-life-cycle-55ce9092e24a

https://dev.to/pavanbelagatti/a-beginners-guide-to-building-llm-powered-applications-with-langchain-2d6e

https://www.superannotate.com/blog/rag-explained

https://www.coursera.org/learn/generative-ai-with-llms
https://github.com/aws-samples/amazon-bedrock-workshop

AWS - AI Q&A APP using Kendra , Bedrock LLM and Streamlit(UI)

selvakumar palanisamy — Mon, 30 Oct 2023 12:05:30 +0000

For the last two weeks, I've been learning about generative AI and various use cases, and here is my first technical blog about how to build your own QA AI app utilising AWS Kendra and the generative AI service.

This blog is for you if you want to learn more about the power of Generative AI on Amazon AWS.

I'll show you how to create a Q&A app with Amazon Bedrock, the Kendra database, and streamlit (UI).

Let's look at how the LLM responds to a query on the topics before we start developing the app.

There are two approaches to enable the LLM model to understand and answer enquiries.

Fine-tune the LLM on text data addressing the topic.
Using Retrieval Augmented generating (RAG), a technique that incorporates a retrieval component into the generating process. Allows you to retrieve relevant information and feed it into the generating model as a secondary source of data.

We will go with option 2.

RAG requires an external "knowledge database" to store and retrieve essential information.Consider this database to be our LLM's external long-term memory.

A semantic search database will be used to get information that is semantically connected to our query.

Database for semantic search
A semantic search database is one that uses semantic technology to understand the meanings and relationships between words and phrases in order to provide highly relevant search results.

Semantic search is a sort of search that makes use of natural language processing algorithms to understand the meaning and context of words and phrases in order to provide more accurate search results.

This strategy is based on the idea that search engines should aim to understand the user's purpose as well as the relationships between the words used, rather than simply matching keywords in a query.

Semantic search, rather than merely matching phrases, is designed to give more particular and meaningful search results that better represent the user's intent. This makes it particularly useful for sophisticated queries such as scientific research, medical information, and legal papers.

AWS services
For the Generative AI LLMs:

AWS Bedrock

For the knowledge database:

AWS Kendra
AWS S3

Diagram shows how the AWS services are going to interact between them:

How the Q&A App Work?

The personal documents are kept in an S3 bucket.

The Kendra Index is connected with a s3 connector. Every N minutes, the Index scans the s3 bucket for new data. When new content is uploaded in the bucket, it is automatically processed and saved to the Kendra database.

When a user runs a query using the Streamlit app, the app performs the following actions:

Kendra's relevant information for the supplied query is returned.
The prompt is put together.
Sends the prompt to one of the available Bedrock LLMs and outputs the response.

One of the best aspects of utilising AWS Kendra (in conjunction with AWS S3) as our knowledge database is that the "Ingest Process" (as shown in the diagram above) is totally automated, so you don't have to do anything.

When we add, update, or delete a document from the S3 bucket, the content is automatically processed and saved in Kendra.

Prerequisites

By default, in Bedrock you will have access only to the Amazon Titan LLM. To utilize any of the third-party LLMs (Anthropic and AI21 Labs LLM models), you must register for access separately.

Deploy the required AWS services

To make the app work we need to deploy the following AWS services:

An s3 bucket for uploading our private docs.
A Kendra index with an s3 connector.
An IAM role with the required permissions to make everything work.

Use the terraform files in the github repository to create the required services on your AWS account

https://github.com/selvakumarsai/ai-qa-app-awskendra-benrock-streamli.git

admin@192-168-1-191 infra % terraform apply
data.aws_caller_identity.current: Reading...
data.aws_region.current: Reading...
data.aws_region.current: Read complete after 0s [id=us-east-1]
.
.
.
.
aws_kendra_index.kendra_docs_index: Creating...
aws_kendra_index.kendra_docs_index: Still creating... [10s elapsed]
aws_kendra_index.kendra_docs_index: Still creating... [20s elapsed]
aws_kendra_index.kendra_docs_index: Still creating... [30s elapsed]
aws_kendra_index.kendra_docs_index: Creation complete after 38s [id=f40139ce-f7fb-4ca9-a95f-759431c91fdb]
aws_kendra_data_source.kendra_docs_s3_connector: Creating...
aws_kendra_data_source.kendra_docs_s3_connector: Creation complete after 4s [id=cbfb3da7-660b-4f38-b7f0-b3964548609e/f40139ce-f7fb-4ca9-a95f-759431c91fdb]

Simple UI:

A text input field where the users can type the question they want to ask.
A numeric input where the users can set the LLM max tokens.
A numeric input where the users can set the LLM temperature.
A dropdown to select which AWS Bedrock LLM we want to use to generate the response.
And a submit button.

How to run the app

The repository has a.env file that contains the environment variables required for the app to execute successfully:

KENDRA_INDEX='<kendra-index>'
AWS_BEDROCK_REGION='<bedrock-region>'
AWS_KENDRA_REGION='<region-where-kendra-index-is-deployed>'

Restore dependencies

pip install -r requirements.txt

When you install Streamlit, you also get a command-line (CLI) utility. This tool's intended is to run Streamlit programmes.
Simply run the following command to launch the app:

streamlit run app.py

Retrieve the relevant information from Kendra

The LangChainAmazonKendraRetriever class will be used to obtain the appropriate docs from our knowledge database (AWS Kendra).

The AmazonKendraRetriever class makes use of Amazon Kendra's Retrieve API to query the Amazon Kendra index and retrieve the docs most relevant to the user query.

To construct the RAG pattern, the AmazonKendraRetriever class will be plugged into a LangChain chain.

The boto3 Kendra client's retrieve method allows us to retrieve relevant documents from our knowledge database.
We combine the papers we obtained from Kendra into a single "string" after retrieving them.

This "string" indicates the context that will be added to the prompt, indicating to the Bedrock LLM that it will only be able to answer using the information provided in this context. It cannot develop a solution to our question using data from outside this context.

Prepare the prompt

We build the prompt that will be sent to a Bedrock LLM.

The placeholders query and docs can be found within the prompt.

The programme will insert the user query into the query placeholder.The app will add the context acquired from Kendra to the documents placeholder.

The final step is to transmit the prompt to one of the Bedrock LLMs via the invoke_model method from the boto3 Bedrock client and receive the response.

Testing the Q&A app

Let’s test if the Q&A app works correctly.

Remember that the Microsoft.NET Microservices book was used to populate the knowledge base, therefore any questions we ask should be about that specific topic.

This repository has a Dockerfile in case you prefer to execute the app on a container

docker build -t aws-rag-app .