DEV Community: Viktor Ardelean

MCP: The REST Revolution of AI - Why This Protocol Changes Everything

Viktor Ardelean — Wed, 16 Apr 2025 17:20:30 +0000

You most probably looked up this post to clarify this new AI jargon, and I get you—it's not easy to keep up with all the terminology and know how everything fits together.

Good news! You're at the right place!
Follow along and you'll understand what all this MCP buzz is about!

Before MCP: The Evolution of LLM Apps

To better understand how MCP came to be, let's look at how LLM apps have evolved.

Simple LLM Interaction

First, we built AI apps interacting directly with an LLM model. This was great for getting answers to general questions based on the knowledge the model was trained on.

The downside? The LLM model couldn't take any actions in the real world.

For example, you could ask the LLM, "What are the best Caribbean islands with white sandy beaches?" and it might tell you about Turks and Caicos or Aruba. But if you then ask, "Can you book me a flight to Aruba for next weekend?" it would have to respond with something like "I don't have the ability to search or book flights" because it can't access real-time flight data or booking systems.

LLM with Tools

To solve this limitation, we equipped LLMs with tools.
Think of tools as separate services that execute specific actions (search flights, book hotels, etc.).

With tools ready, you could build an AI travel assistant where you first tell the LLM, "You have access to a flight_search tool that takes departure_city, destination_city, and dates as parameters, and a hotel_booking tool that takes location, check_in_date, check_out_date, and guests as parameters."

Now when a user asks, "I want to go to Aruba next weekend from Chicago. Can you help me plan this trip?", the LLM might respond: "I'll help you plan your Aruba trip! Let me check flight options first."

Then it would call:

flight_search(departure_city='Chicago', destination_city='Aruba', departure_date='2025-04-25', return_date='2025-04-27')

After receiving flight results, it might say: "I found several flights! Now let me check hotels."

Then call:

hotel_booking(location='Aruba', check_in_date='2025-04-25', check_out_date='2025-04-27', guests=1)

This works and enables the LLM to take actions, but there's a major downside: complexity!

For every tool you want to use, you need to:

Write custom code and maintain it
Handle API authentication and error cases
Process response formats
Update your code when APIs change

Behind the scenes, your AI app needs code that looks like this:

def flight_search(departure_city, destination_city, departure_date, return_date):
    # Connect to airline API with authentication
    api_key = os.environ.get("AIRLINE_API_KEY")
    headers = {"Authorization": f"Bearer {api_key}"}

    # Format request
    payload = {...}  # Format parameters correctly

    # Make API call
    response = requests.post("https://airline-api.com/search", json=payload, headers=headers)

    # Handle errors
    if response.status_code != 200:
        handle_error(response)

    # Process and format response
    flights = process_flight_data(response.json())
    return flights

And you need similar code for every single integration. When the airline API changes, you have to update your code. It's a maintenance nightmare!

Cloud Provider Solutions (Like AWS Bedrock Agents)

Cloud providers recognized this complexity and started offering solutions like AWS Bedrock Agents to simplify the process.

With these solutions, much of the orchestration is abstracted away. But you still need to write custom code to execute the actions. If the airline API changes its parameters or response format, you still need to update your code.

The Problem MCP Solves

Did you notice the pattern? There's a hard dependency between the AI app/agent and the tools it uses.

If something changes in an external API, your tool code breaks. Everything is tightly coupled.

MCP - A Paradigm Shift in AI Architecture

MCP (Model Context Protocol) introduces a fundamental change in paradigm.

Instead of AI developers writing custom tool code for every integration, MCP provides a standardized protocol for LLMs to communicate with external services.

With MCP, there's a clear separation:

LLMs focus on understanding user requests and reasoning
MCP servers handle the specific domain functionality

Let's see how our travel planning example changes with MCP:

User asks: "I want to go to Aruba next weekend from Chicago"
Your AI app connects to a travel MCP server
The LLM says to the MCP server: "I need flight options from Chicago to Aruba for next weekend"
The MCP server handles all the API calls, authentication, and formatting
The MCP server returns structured results to the LLM
The LLM presents the options to the user

You don't need to write any custom integration code. The MCP server handles all those details for you!

It's like the difference between building your own payment processing from scratch versus integrating with Stripe. MCP servers do for AI what API providers did for web development - they provide ready-made capabilities that you can simply plug into.

Why MCP is the New REST

Just like services communicate with each other through REST APIs, LLMs now communicate with servers through MCP.

Remember when REST revolutionized web development by providing a standard way for systems to talk to each other? MCP is doing the same thing for AI systems.

It creates a clean separation of concerns:

AI apps focus on reasoning and user interaction
MCP servers focus on providing specific capabilities

As an AI developer, you don't need to write and maintain custom tool code. And since the MCP server handles the integration details, you're insulated from changes in the underlying systems.

The Future: MCP Alongside REST, Then Beyond

Today, most digital services expose REST APIs. My prediction is that we'll soon see services offering both REST APIs and MCP servers side by side.

Just as REST largely replaced SOAP for most web service integrations, MCP could eventually replace traditional REST APIs for many AI-driven use cases.

Why? Because MCP is designed specifically for the way modern AI needs to interact with services—with rich context and semantic understanding rather than just rigid data structures.

The Big Picture

MCP servers are like interfaces. Your LLM only needs to know what the server offers and doesn't depend on the concrete implementation.

This is huge for AI development because:

Reduced complexity - No more custom tool code for every integration
Better maintenance - MCP servers handle API changes
Standardization - A common protocol that works across different AI systems
Specialization - LLMs can focus on what they do best

The next time you hear about MCP, remember: it's not just another AI buzzword. It's a fundamental architectural pattern that's reshaping how AI systems interact with the world—just like REST did for web services years ago.

Are you building AI applications? MCP might just be the abstraction you've been waiting for.

DynamoDB Transactions with AWS Step Functions

Viktor Ardelean — Sat, 19 Oct 2024 14:56:24 +0000

1. Overview

In this post, we'll explore how to leverage direct service integrations in AWS Step Functions to build a workflow for executing DynamoDB transactions. AWS Step Functions are an excellent tool for breaking down business workflows into individual steps, promoting separation of concerns and encapsulating discrete actions within each step.

2. The Use Case

Let's consider a real-life scenario to demonstrate this approach. We start with an object stored in Amazon S3. When the file is deleted, we must remove an item from two DynamoDB tables. To ensure data consistency, we'll wrap both delete operations inside a transaction, preventing a situation where one delete succeeds while the other fails.

Here's an example of an Amazon EventBridge rule that captures all delete events from a specific Amazon S3 bucket:

{
  "detail": {
    "bucket": {
      "name": ["bucket_name"]
    },
    "deletion-type": ["Permanently Deleted"]
  },
  "detail-type": ["Object Deleted"],
  "source": ["aws.s3"]
}

3. The Traditional Lambda Solution

A classic design would involve enabling Amazon S3 event notifications to Amazon EventBridge. Once the event reaches the event bus, an Amazon EventBridge rule would trigger a AWS Lambda function to execute the DynamoDB transaction. Here's what this architecture might look like:

Let's examine a potential AWS Lambda function implementation:

def lambda_handler(event, context):
    # Extract the S3 object key from the EventBridge event
    s3_key = event.get["detail"]["object"]["key"]

    # Construct your DynamoDB delete operations
    delete_item_in_table_A = {
        'Delete': {
            'TableName': "ddb_table_a",
            'Key': {
                'YourPrimaryKeyAttributeName': {'S': s3_key}
            }
        }
    }

    delete_item_in_table_B = {
        'Delete': {
            'TableName': "ddb_table_b",
            'Key': {
                'YourPrimaryKeyAttributeName': {'S': s3_key}
            }
        }
    }

    # Perform a DynamoDB transaction to ensure both deletes happen together
    response = dynamodb.transact_write_items(
        TransactItems=[delete_item_in_table_A, delete_item_in_table_B]
    )

    return {     
        'statusCode': 200,     
        'body': json.dumps('Delete transaction succeeded'  
    }

While this solution is concise and functional, it has some drawbacks. The AWS Lambda function merely receives an event and performs an API call, without any substantial business logic. It serves as a simple connector in a data pipeline chain, executing some delete operations.

AWS Lambda functions that primarily connect different services or transform events without complex business logic can often be replaced with service integrations.
Let's explore this alternative approach.

4. The AWS Step Functions Solution

Amazon EventBridge supports AWS Step Functions as a target, allowing us to replace the AWS Lambda function with a AWS Step Functions workflow. This approach enables us to build a no-code solution using DynamoDB direct service integrations within the workflow.

Here's an overview of this solution:

Now, let's dive into the implementation of the AWS Step Functions workflow.

4.1 AWS Step Functions Service Integrations

Since our use case doesn't involve complex logic, we can build our workflow using service integrations. AWS Step Functions offer two types of integrations with other AWS services:

AWS SDK Integrations: These cover over 200 services and are similar to API calls you'd make in a AWS Lambda function.
Optimized Integrations: Available for about 20 core services, these add convenience by automatically converting output to JSON and handling asynchronous tasks, eliminating the need for custom polling mechanisms.

We have two AWS SDK integration options, DynamoDB:TransactWriteItems and DynamoDB:ExecuteTransaction, to wrap both delete statements in a transaction.

Let's explore both implementations.

4.1 DynamoDB:TransactWriteItems

The TransactWriteItems API allows for synchronous, atomic write operations across multiple items. It supports up to 100 actions (Put, Update, Delete, or ConditionCheck) in different tables within the same AWS account and region. This API doesn't allow read operations within the transaction and ensures all actions either succeed or fail together.

Using this approach, we need just a single step in our workflow:

Here's the workflow's ASL (Amazon State Language) definition for the transaction:

{
  "Comment": "DynamoDB Transaction for Delete Statements",
  "StartAt": "DeleteTransaction",
  "States": {
    "DeleteTransaction": {
      "Type": "Task",
      "Parameters": {
        "TransactItems": [
          {
            "Delete": {
              "TableName": "ddb_table_a",
              "Key": {
                "PK": {
                  "S.$": "$.detail.object.key"
                }
              }
            }
          },
          {
            "Delete": {
              "TableName": "ddb_table_b",
              "Key": {
                "PK": {
                  "S.$": "$.detail.object.key"
                }
              }
            }
          }
        ]
      },
      "Resource": "arn:aws:states:::aws-sdk:dynamodb:transactWriteItems",
      "End": true
    }
  }
}

4.2 DynamoDB:ExecuteTranscation

The ExecuteTransaction API allows for transactional reads or writes using PartiQL statements. A transaction can contain up to 100 statements, but all operations must be either reads or writes, not a mix. It ensures that all statements in the transaction are executed atomically.

Our workflow would look like this:

Defining the delete statements for this approach can be tricky. Here's an example implementation:

{
  "Comment": "DynamoDB Transaction for Delete Statements",
  "StartAt": "ExecuteTransaction",
  "States": {
    "ExecuteTransaction": {
      "Type": "Task",
      "Parameters": {
        "TransactStatements": [
          {
            "Statement": "DELETE FROM \"ddb_table_a\" WHERE PK = ?",
            "Parameters": [
              {
                "S.$": "$.detail.object.key"
              }
            ]
          },
          {
            "Statement": "DELETE FROM \"ddt_table_b\" WHERE PK = ?",
            "Parameters": [
              {
                "S.$": "$.detail.object.key"
              }
            ]
          }
        ]
      },
      "Resource": "arn:aws:states:::aws-sdk:dynamodb:executeTransaction",
      "End": true
    }
  }
}

In both cases, we only need to define a single state with the delete statements. This approach eliminates the need for maintaining code, dealing with cold starts, or managing runtime updates.

Cost Considerations

When it comes to costs, choosing the Express workflow type is the most economical option for such a simple and fast workflow.

An often overlooked fact is that AWS Step Functions Express offers a minimum 64 MB configuration option, which is more cost-effective than the minimum 128 MB AWS Lambda function.

To illustrate, let's consider a scenario with 3 million invocations, each lasting 100ms:

A 128 MB AWS Lambda function in us-east-1 would cost $0.51
A 64 MB Express Step Function in the same region would cost $0.31

This demonstrates the potential for significant cost savings when using AWS Step Functions for simple workflows.

Conclusion

By leveraging AWS Step Functions with direct service integrations, we can create efficient, no-code solutions for executing DynamoDB transactions. This approach offers several advantages over traditional AWS Lambda-based implementations:

Simplified architecture with reduced code maintenance
Improved separation of concerns
Potential cost savings, especially for simple, high-volume workflows
Elimination of cold starts and runtime management

As we've seen, both TransactWriteItems and ExecuteTransaction APIs provide robust options for implementing transactional operations in DynamoDB through AWS Step Functions. The choice between them depends on your specific use case and whether you need to include read operations in your transactions.

By adopting this serverless, no-code approach, you can simplify your data pipeline processes and focus more on building scalable, maintainable applications in AWS.

Unleashing OpenSearch: Best Practices for 1 Billion Documents on AWS

Viktor Ardelean — Wed, 05 Jul 2023 15:11:14 +0000

1. Introduction:

Setting up an OpenSearch cluster in AWS to handle big data volumes is crucial in ensuring optimal performance, scalability, and availability.

In this blog post, we will explore important considerations and recommendations for configuring an OpenSearch cluster specifically designed to manage 1 billion documents, each sized at 2KB.

By following these recommendations, such as properly allocating resources and implementing appropriate indexing strategies, we can ensure efficient ingestion and management of the 2TB data set.

This will enable us to harness the full potential of OpenSearch and effectively process and analyze the extensive data at hand.

2. Optimizing Performance and Scalability

2.1 Dedicated Master Nodes

It is recommended to use Multi-AZ with Standby and deploy three dedicated master nodes for optimal stability and fault tolerance.

Avoid choosing an even number of dedicated master nodes to ensure the necessary quorum for electing a new master in case of failures. Three dedicated master nodes offer two backup nodes and the required quorum, providing a reliable configuration.

The following table provides recommended minimum dedicated master instance types for efficient management of OpenSearch clusters, considering factors like instance count and shard count.

Instance count	Master node RAM size	Maximum supported shard count	Recommended minimum dedicated master instance type
1–10	8 GiB	10K	m5.large.search or m6g.large.search
11–30	16 GiB	30K	c5.2xlarge.search or c6g.2xlarge.search
31–75	32 GiB	40K	r5.2xlarge.search or r6g.2xlarge.search
76–125	64 GiB	75K	r5.2xlarge.search or r6g.2xlarge.search
126–200	128 GiB	75K	r5.4xlarge.search or r6g.4xlarge.search

Based on our specific use case, we can opt for a configuration consisting of three m6g.large.search master nodes.

2.2 Scaling Data Nodes

When dealing with substantial data volumes, such as the massive 2TB dataset we have, it becomes crucial to appropriately scale the number of data nodes in our OpenSearch cluster.

Scaling the data nodes allows us to distribute the data across multiple nodes, ensuring efficient storage, retrieval, and processing of the extensive dataset.

To efficiently manage a 2TB dataset in OpenSearch, we can start with three data nodes and conduct thorough performance testing.

Monitoring metrics such as indexing throughput and response times during testing will determine if additional nodes are necessary.

By gradually adding nodes based on performance analysis, we can effectively distribute the workload and ensure optimal dataset management.

2.3 Shard Count

To optimize search performance in OpenSearch, careful consideration of the shard count in our index is crucial. Increasing the number of shards can significantly improve efficiency, particularly when dealing with large datasets. For search-focused workloads, aiming for shard sizes between 10-30 GB is recommended, while sizes between 30-50 GB work well for write-heavy scenarios.

For instance, let's consider a use case where both reads and writes occur equally. In such cases, aiming for a shard size of around 30 GB is ideal.

We can approximate the number of primary shards required using the formula (source_data) * (1 + indexing_overhead) / desired_shard_size.

For our example, the approximate number of primary shards would be (2000) * 1.1 / 30 ≈ 73.33

To ensure an even distribution of shards across our three data nodes, a suitable shard count would be 72. This allocation helps balance the workload and maximizes the utilization of resources within our OpenSearch cluster.

As our data volume grows, adjusting the shard count to maintain optimal performance becomes essential. Adapting the shard count based on the evolving data volume ensures efficient distribution and processing of data across the cluster. We can achieve optimal search performance in OpenSearch by continuously monitoring and optimizing the shard count.

2.4 Adding Replicas

In OpenSearch, replica shards are exact copies of primary shards within an index distributed across data nodes in a cluster. The presence of replica shards ensures data redundancy and increases read capacity.

When indexing data, requests are sent to all data nodes containing primary and replica shards. At the same time, search queries are directed to either primary or replica shards, resulting in a different number of shards being involved in each operation.

Using at least one replica is strongly recommended, as it enhances redundancy and improves read capacity. Additional replicas increase these benefits, offering even greater data redundancy and improving the ability to handle read operations efficiently.

3. Storage Considerations:

3.1 Estimating Storage Requirements:

To account for replicas and various storage considerations in OpenSearch, we must allocate approximately 6 TB of total storage (2 TB for the primary shard and two replicas of 2 TB each).

However, it's important to consider additional factors that affect storage requirements. OpenSearch Service reserves 20% of storage space for segment merges, logs, and other internal operations, with a maximum of 20 GiB per instance. This means that the total reserved space can vary depending on the number of instances in your domain.

To calculate the minimum storage requirement for our example with 2 replicas and 2TB of source data, we can use the simplified formula provided:

Source data * (1 + number of replicas) * 1.45 = minimum storage requirement

Substituting the values:

2TB * (1 + 2) * 1.45 = 8.7TB

Therefore, the minimum storage requirement for replicas and other factors would be approximately 8.7TB.

3.2 Handling Full Reindex

Handling a full reindex is an important aspect of managing data in Elasticsearch. In scenarios where we need to reindex our data, it becomes necessary to have two indices concurrently: one containing the current data and another for the new data.

This approach allows us to perform the reindexing process seamlessly without any downtime. However, it's essential to consider that this scenario requires allocating twice the storage estimated in the previous chapter.

To simultaneously accommodate both indices, we need to ensure sufficient storage capacity to store the existing and newly reindexed data.

Considering the increased storage requirement, we must allocate a minimum of 2 * 8.7TB = 17.4TB.

Additionally, it's worth noting that Elasticsearch allows for dynamic scaling of storage during a full reindex. This means we can add storage capacity specifically for the reindexing process and later scale it down once it is complete. This dynamic allocation prevents us from continuously paying for unused storage and optimizes cost efficiency.

3.3 Instance Type Selection:

When selecting hardware for our OpenSearch cluster, it's crucial to consider storage requirements, shard count, and workload characteristics. The number of shards per data node should align with the node's JVM heap memory, typically aiming for 25 shards or fewer per GiB.

To ensure efficient processing, it's recommended to have an initial scale point of 1.5 vCPUs per shard. For instance, with 72 shards per node, we would need approximately 108 vCPUs.

To accommodate this, scaling our data nodes to 5 would be suitable, resulting in a shard count of approximately 43.2 per node. In this case, selecting a robust instance type like the m6g.12xlarge.search with 48 CPUs and 192 RAM would be advisable.

However, additional instances may be required if performance falls short of expectations, tests fail, or CPUUtilization or JVMMemoryPressure indicators are high. As instances are added, OpenSearch automatically redistributes the shard distribution throughout the cluster, helping to balance the workload and optimize performance.

4. Monitoring:

4.1 Monitor Resource Utilization

It is essential to implement continuous monitoring of CPU, memory, and storage usage in our OpenSearch cluster to ensure optimal performance. By monitoring these metrics, we can identify any resource bottlenecks or imbalances and take appropriate actions to address them.

Regularly reviewing resource utilization allows us to make informed decisions regarding resource allocation, ensuring that our cluster operates efficiently and effectively.

4.2 Configure CloudWatch Alarms

Setting up CloudWatch alarms is a proactive approach to staying informed about the health and performance of our OpenSearch cluster.

By defining thresholds for key metrics such as CPU utilization, storage usage, and search latency, we can receive timely alerts when any of these metrics breach the specified limits. These alarms enable us to quickly identify and address potential issues before impacting overall cluster performance.

Regularly reviewing the cluster and instance metrics provides valuable insights into the behavior and patterns of our cluster, allowing us to fine-tune and optimize its configuration for optimal performance.

5. Deployment Best Practices

5.1 Multi-AZ Deployment

We should deploy our data nodes across multiple Availability Zones (AZs) to achieve high availability and fault tolerance.

By evenly distributing our data nodes across AZs, we ensure redundancy and resilience in our OpenSearch cluster.

This approach safeguards against AZ failures, reducing the risk of downtime and ensuring the continuous operation of our cluster.

5.2 Subnet Distribution:

Dividing our data nodes into multiple subnets distributed across different AZs is advisable to ensure high availability and fault tolerance.

Distributing our data nodes across subnets enhances the cluster's resilience to network-related issues within a specific AZ. This practice improves fault isolation capabilities and minimizes the impact of potential subnet-level disruptions.

6. Enable Log Publishing

To effectively troubleshoot performance, stability, and user activity in our OpenSearch cluster, enabling log publishing and utilizing relevant logs for analysis is crucial. We can direct OpenSearch error logs, search slow logs, indexing slow logs, and audit logs to CloudWatch Logs by enabling log publishing.

7. Restrict Wildcard Usage

We must implement access controls and permissions in our OpenSearch cluster to ensure data security and prevent accidental data loss.

Specifically, we must enforce restrictions on destructive wildcard operations by requiring explicit action names.

This measure mitigates the risk of unintentional data loss by ensuring that users must explicitly specify the action name when performing destructive operations.

By making the following API call, we enable the setting that ensures a specific action name is required for destructive operations:

PUT /_cluster/settings
{
  "persistent": {
    "action.destructive_requires_name": true
  }
}

8. Conclusion

Careful planning and configuration are essential when setting up our OpenSearch cluster in AWS to handle 1 billion documents. By following these best practices, we can optimize our cluster's performance, scalability, and availability, ensuring efficient management of large data volumes. Implementing dedicated master nodes, scaling data nodes, configuring shards and replicas, estimating storage requirements, and monitoring resource utilization are key factors for a robust and reliable OpenSearch deployment.

Customizing the calculations and configurations mentioned in this blog post is important according to our specific requirements and workload characteristics. Regular monitoring, log analysis, and optimization will enable us to maintain a high-performing OpenSearch cluster capable of effectively handling extensive data volumes.