DEV Community: saikrishna1729

Understanding the Model Context Protocol (MCP) and using it with Amazon Q Developer CLI

saikrishna1729 — Tue, 06 May 2025 06:10:31 +0000

What is Model Context Protocol (MCP) ?

MCP is an open protocol developed by Anthropic.

Purpose: It is designed to standardize the way applications provide contextual information (like data from files, APIs, or other tools) to Large Language Models (LLMs). Think of it as giving LLMs controlled access to the specific information they need to perform complex tasks accurately.

Analogy: MCP can be compared to a USB-C port for AI applications. Just as USB-C offers a standard connection for various devices and peripherals, MCP provides a standardized interface for connecting AI models to different data sources and tools.

Benefits:

Aims to unlock more practical AI applications to enhance productivity.
Enables application vendors to develop "MCP servers".
Allows users to employ AI applications that consume services and interact with tools provided by these MCP servers.
In essence, MCP facilitates a standardized communication layer, making it easier for AI models to access and utilize external tools and data sources required for their tasks.

For further details, refer to the official introduction: https://modelcontextprotocol.io/introduction

This protocol has become popular these days as it provides significant values for developers using them in their daily workflow or for GenAI apps.

Basic Components of MCP

The Model Context Protocol relies on two fundamental components:
MCP Server:

This is the server-side implementation of the protocol.
It is responsible for exposing relevant tools or data sources.
The primary purpose of these tools is to provide additional context to the LLM when requested by a client.

Below is the official list of MCP servers available to consume - MCP Servers

MCP Client:

This is a designated application (like an AI agent or chatbot interface) that interacts with an MCP Server.
It requests information or tool execution from the server to augment the LLM's context.
Communication methods between the client and server can be:
STDIO (Standard Input/Output): For direct process communication.
SSE (Server-Sent Events): Often used via Web API calls for web-based applications.

Example : Claude Desktop , Amazon Q , VSCode Agent mode

In this post , we will see how to setup Amazon Q Developer and setup one of the MCP servers published by AWS.

Step 1 : Install Amazon Q Developer CLI

Instructions here -Setup

Step 2 : Login to Amazon Q cli

q login

Then you will be prompted to choose the Licence. For getting started you can choose the Free one.

Then, you will be redirected to Login to AWS Builder account. Please login using the credentials.
Then, allow Q Developer access to the command line as below.

Step 3: Try to interact with Amazon Q Developer on AWS related topics

Type below command to initiate Q developer app.

q chat

Example : Provide best practice to deploy a static website on AWS and services to use

Yeah! it generates the recommendation quite beautifully.

Now let's do some MCP fun

AWS is developing MCP servers to be used relevant to various use cases. List is here
AWS MCP Servers

So, lets load one of them "AWS Documentation MCP Server" , which reads the latest documentation and returns the context to LLM used by Amazon Q.

So below are the things to be done to configure MCP servers for Amazon Q to interact.

Step 1 : Install uv : Fast python package manager built on rust (similar to PIP) **

Steps to install UV

Step 2 : create mcp.json file and copy Server configuration

Refer to the link AWS Documentation MCP Server
Copy below JSON snippet

{ "mcpServers": { "awslabs.aws-documentation-mcp-server": { "command": "uvx", "args": ["awslabs.aws-documentation-mcp-server@latest"], "env": { "FASTMCP_LOG_LEVEL": "ERROR" }, "disabled": false, "autoApprove": [] } } }

create mcp.json file in the directory at the path ~/.aws/amazonq/mcp.json

touch ~/.aws/amazonq/mcp.json

then copy the above JSON snippet.

Step 3 : Launch Amazon Q and start fun with MCP

Let's ask a question about AWS service as below.

Suggest me a best service to use in AWS for relational database , this should be with low cost and highly available. Refer to the latest documentation available

It will prompt you to trust all the required tools , you trust them selectively.

You can explore other MCP servers provided by AWS or the wider community to further enhance Amazon Q's capabilities for different tasks.

Hope you got a good idea of what MCP is and how to start using it with Amazon Q Developer. Happy Q'ing!

Bedrock Cross-Region Inference: Tackling ratelimits and regional availability of inference

saikrishna1729 — Fri, 18 Apr 2025 14:36:08 +0000

Keeping your AI applications online and running smoothly, even when lots of people use them at once, is super important.

Good news! Bedrock has a cool feature called cross-region inference that makes building resilient and highly available GenAI applications much easier.

So, What Exactly is Cross-Region Inference?

Imagine you have a popular AI app powered by Bedrock.

Normally, all the AI thinking happens in one specific AWS Region. But what if that region gets really busy (as it is shared one)? Or what if there's a temporary issue? Your app might slow down or not work for some users.

Amazon Bedrock's cross-region inference is like having backup locations ready to help. It automatically sends your requests for AI processing (like asking the model a question or asking it to generate text) to other available AWS Regions within the same general area (like the US or Europe).

This means your application doesn't just rely on one spot. It can tap into resources from other spots nearby, helping ensure your users get a consistent and speedy experience.

Why This Makes Your AI App Resilient

The main reason cross-region inference is a game-changer is the resiliency it adds. By spreading the work across multiple AWS Regions, it gives you several key benefits that make your AI applications much more robust:

Handles Traffic Jumps Easily: When your app suddenly gets popular, cross-region inference helps handle the rush. It automatically sends requests to regions with available capacity. You don't have to guess how much traffic you'll get or build complex systems to manage it yourself. Bedrock checks your main region first and, if needed, smartly sends the request to another region that can handle it.
Better Model Availability: If one region is facing a temporary capacity crunch, your requests can be sent to another. This greatly increases the chances that your request will be completed successfully, helping keep your service running continuously for your users.
Increased Throughput: By using compute power from different regions, your application can process more requests overall. This means your app can handle a higher volume of activity without performance dropping.
Automatic Failover: If processing a request in the initial region fails for some reason, Bedrock will automatically try to send it to another working region within the group. This built-in safety net makes your AI applications much more reliable.
No Extra Cost for Resiliency: This is fantastic! There's no additional charge for using cross-region inference. You pay the same price per token (the units used for processing text) as you would if the request was handled only in your original region.

Getting Started:

It's Not Automatic, But It Is Simple!
So, is this cross-region magic automatic as soon as you use Bedrock? Not quite.

The routing of the request across regions is automatic once you tell Bedrock you want to use this feature, but you need to make a small change in your code to enable it.

The key to enabling cross-region inference is using something called inference profile IDs. Instead of telling Bedrock the specific, single-region address (ARN) of an AI model, you use a special ID that represents the model and tells Bedrock it can use the cross-region capability.

Here’s how you get started specifically:

Discover the Right Inference Profile IDs:

Amazon Bedrock provides special IDs called "system-defined" inference profiles for models that support cross-region inference. These IDs cover specific geographical areas (like a group of US regions or a group of EU regions).

How to find them:

AWS Console:
- Go to the Amazon Bedrock console.
- Look for an option like "Cross-region Inference" on the left menu. Here, you can browse the available profiles for your regions and easily copy the IDs.
AWS CLI:
- You can use a command-line tool.
- Run aws bedrock list-inference-profiles.
- Look for profiles listed with the type "SYSTEM_DEFINED". These IDs will often start with a prefix like us. or eu..

Boto3 (Python SDK):

If you're coding in Python, you can use the AWS SDK:

import boto3

# Replace "your-aws-region" with the region you're working from
bedrock_client = boto3.client("bedrock", region_name="your-aws-region")

response = bedrock_client.list_inference_profiles()
print(response)

This will list the available profiles, including the system-defined cross-region ones.

Use the Inference Profile ID in Your Code:

Once you have the inference profile ID (it will look something like us.anthropic.claude-3-5-sonnet-20240620-v1:0), you simply use this ID instead of the regular model ARN when you make your requests to the Bedrock API (using InvokeModel or Converse).

Example:

# Using a single-region ARN
# model_id = "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20240620-v1:0"
# ... call bedrock_runtime.invoke_model(modelId=model_id, ...)

# You would use:

# Using the cross-region inference profile ID
inference_profile_id = 'us.anthropic.claude-3-5-sonnet-20240620-v1:0' # Example ID

# Replace "your-source-region" with the region you are making the request from
bedrock_runtime = boto3.client("bedrock-runtime", region_name="your-source-region")

response = bedrock_runtime.converse(
    modelId = inference_profile_id, # Use the profile ID here!
    system = [{"text": "You are a helpful AI assistant."}],
    messages = [{"role": "user", "content": [{"text": "Tell me something interesting."}]}]
)
print(response['output']['message']['content'])

Bedrock sees the inference profile ID and knows it has the flexibility to route your request to any available region within that profile's set.

So, while you don't manage the traffic routing yourself (Bedrock does that automatically!), you do need to make the initial step of changing your code to use the inference profile ID instead of the standard model ARN. It's a straightforward change that unlocks powerful resiliency.

Monitoring Your Applications

Amazon Bedrock gives you visibility into how cross-region inference is working. If your request gets re-routed to another region, this information is recorded in your AWS CloudTrail logs and Amazon Bedrock Model Invocation Logs.

You'll find details like an inferenceRegion key which tells you where the request was actually processed. By looking at these logs (for example, in Amazon CloudWatch), you can see when requests are being served from different regions. This helps you understand your application's traffic flow and how effectively cross-region inference is handling demand spikes.

Important Things to Keep in Mind

While cross-region inference is great for resiliency, here are a few points to be aware of:

Latency: There might be a very slight delay (usually very small, like double-digit milliseconds in testing) if a request needs to be re-routed to another region.
Data Residency: Your main data stays in your source region. However, the input prompts and output results for a specific inference request might be processed in another region within the same geographical group (like US or EU). Make sure this fits with any data location rules or compliance requirements you have.

Rest assured, all data transfer between regions happens over AWS's secure network.

Supported Models and Regions: This feature works with a specific list of models and within defined geographical sets of regions (US, EU, etc.). Check the Bedrock documentation to confirm that the models you want to use and the regions you operate in are supported.

AWS Link

Get handson with Latest LLM models for Free with Github

saikrishna1729 — Wed, 16 Apr 2025 14:03:10 +0000

If you want to develop a generative AI application, you can use GitHub Models to find and experiment with AI models for free.

Yes, you heard it right for FREE..

Github has made it easy to use latest LLM models with a single inference endpoint. All that you need is a Github profile.

Create a github profile at github.com or use your existing github account.

GitHub now provides an easy way to use various LLM models through a single inference endpoint. All that you need is a GitHub profile to get started.

Prerequisites:

Create a GitHub profile at github.com or use your existing GitHub account.

Steps to Access the Playground:

Go to github.com/marketplace/models.
Click "Model": Select a Model, typically located at the top left of the page.

Choose a desired model from the dropdown menu.

Once you click on the model, you will be redirected to the playground. Here you can access the chat interface to test and experience the model directly.

Once you click on the model, you will be redirected to the playground where you can access the chat interface to experience the model.

Steps to Get a Developer Token for API Use:

If you want to use this in your code for experimentation, you can get a free API Key (Developer Token).

In the model's playground or page, click on the "Use this model" option.

Then, click on the "Get Developer Token" option as shown below. This token allows you to experiment with the model for free in your own code via the API.

Hope this post is helpful for you.

Do use this in your code with API Key for experimentation. Click on the "Use this model" option as shown below.

That's it. You now have access to experiment with the model.

This GitHub Models catalog includes various models, potentially including recent ones like GPT-4.1 (check the marketplace for current availability).

Unlocking Private Connections in Azure AI Foundry 🔐

saikrishna1729 — Tue, 15 Apr 2025 11:51:42 +0000

Microsoft has introduced Azure AI Foundry, rebranding the existing Azure AI Studio. This platform serves as a new central hub for designing, customizing, and managing AI applications and agents effectively at scale.

What is Azure AI Foundry?

With Azure AI Foundry, you can:

Explore a wide variety of models, services, and capabilities.
Build AI applications tailored to your specific goals.
Facilitate scalability, transforming proofs of concept into full-fledged production applications with ease.
Leverage continuous monitoring and refinement features to support long-term success.

Core Components

Azure AI Foundry primarily consists of two components:

Azure AI Foundry Hub: This is the core, region-specific infrastructure used to interface with AI models. You need to create a Hub first.
Azure AI Foundry Project: Built upon a Hub, Projects are where you deploy specific AI models (like Phi-3, DeepSeek, Mistral). These are also region-specific.

The native Azure AI Foundry SDK aims to provide a simpler and more unified experience for developers building Generative AI applications.

The Challenge: Private Connections

While the documentation for this service is comprehensive, configuring private network connections for model inference seemed a bit unclear based on my experience. This post aims to provide guidance on achieving this.

Step-by-Step: Setting Up Private Endpoints

To establish a private connection for inference to models deployed in an Azure AI Foundry Project, follow these steps:

Navigate to your Azure AI Foundry Hub resource within the Azure portal.
From the left-side menu, select Settings, then Networking. Click on the Private endpoint connections tab and select + Private endpoint.
When filling out the forms to create the private endpoint:
- On the Basics tab, ensure the selected Region matches the region of your virtual network.
- On the Resource tab, select amlworkspace as the Target sub-resource. (Internally, it leverages Azure Machine Learning Workspace infrastructure).
- On the Virtual Network tab, select the target Virtual network and Subnet you wish to connect from.
Configure any additional network settings as required, review your settings on the Review + create tab, and then click Create.

This process should automatically configure the necessary DNS records within your private DNS zones associated with the virtual network.

And that's it! No further configuration should be necessary. Attempts to reach your serverless model endpoint (e.g., xxx.region.models.ai.azure.com) should now resolve privately.

How it Works

The reason this works is that Azure automatically creates a CNAME DNS record for your endpoint, similar to:

<deploymentname>.<ProjectID>.models.<region>.privatelink.api.azureml.ms

The Private Endpoint you created specifically handles DNS resolution for this .privatelink address, ensuring traffic stays within your private network.

Hopefully, this guide clarifies the process for setting up private connections in Azure AI Foundry.

Documentation for reference : Azure AI Foundry Private Endpoints

AWS Bedrock : Interface Claude LLM using Python

saikrishna1729 — Sat, 10 Aug 2024 14:32:37 +0000

Introduction

Amazon Bedrock, a fully managed service by AWS, empowers developers to rapidly build and scale generative AI applications using foundational models (FMs). It offers a diverse selection of Large Language Models (LLMs) from leading providers like Amazon, Anthropic, A21 Labs, and Meta.

In this guide, I'll walk you through the simple steps to get started with Amazon Bedrock using the AWS Python SDK.

Prerequisites

Before we dive in, make sure you have the following:

AWS Credentials: Ensure your machine has properly configured AWS credentials with the necessary IAM policy:

{
    "Version": "2012-10-17",
    "Statement": [\n
        {
            "Sid": "BedrockFullAccess",
            "Effect": "Allow",
            "Action": ["bedrock:*"],
            "Resource": "*"
        }
    ]
}

Python with 3.10 minimum with pip installed.
Install python pip packages with below commands. boto3 for AWS SDK , chainlit for simple UI framework. More about Chainlit

pip install boto3
pip install chainlit

Enable Bedrock Foundational Models: Enable Bedrock Foundational models in AWS Console.Follow the instructions here

We will be using Claude 3 sonnet LLM for this application in
us-east-1 AWS region. ( refer to documentation for available regions for Bedrock if you want to use someother region )

Note: During setup, you'll be asked to provide company and use case information. Complete this step as required.

Please allow a few minutes for the model to become available after enabling it.

Interfacing with Python

Once the prerequisites are in place, create a Python file (e.g., main.py) with the following code:

Replace question variable in the code with any other sample question you want LLM to answer.

import boto3
import json

def chat_with_bedrock(p_message):
    bedrock = boto3.client(service_name="bedrock-runtime", region_name="us-east-1")

    messages = [{
        "role": "user",
        "content": p_message
    }]

    body = json.dumps({
    "max_tokens": 256,
    "messages": messages,
    "anthropic_version": "bedrock-2023-05-31"
    })

    response = bedrock.invoke_model(body=body, modelId="anthropic.claude-3-haiku-20240307-v1:0")

    response_body = json.loads(response.get("body").read())
    return response_body.get("content")

# replace the question with your query
question = "Who is US President in 2020"

response = chat_with_bedrock(question)

print(response[0].get("text"))

Sample response

The President of the United States in 2020 is Donald Trump. He was elected in 2016 and his current term runs until January 20, 2021.
Some key facts about Donald Trump's presidency in 2020:

He is the 45th President of the United States.

He is a member of the Republican Party.

His vice president is Mike Pence.

Major events in 2020 included the COVID-19 pandemic, economic crisis, protests against police brutality, and the 2020 presidential election.

He ran for re-election against Democratic candidate Joe Biden in the 2020 election.

The 2020 election took place on November 3, 2020. Biden won both the popular vote and electoral college.

However, Trump did not concede the election and made unsubstantiated claims of widespread voter fraud before leaving office. So in summary, Donald Trump served as president throughout 2020, but lost his bid for re-election to Joe Biden, who was inaugurated as the 46th president on January 20, 2021.

This example demonstrates how to interface with AWS Bedrock using the AWS SDK for Python. You can further enhance this by implementing a prompt template or experimenting with different models available in Bedrock.

Tip: To use a different model in Bedrock, request access and update the modelId in the code accordingly.

Thank you!