Akash Thakur

Posted on Nov 17

Code Execution with MCP: Building More Efficient AI Agents

#mcp #ai #codeexecution #agents

How integrating code execution with the Model Context Protocol is revolutionizing AI agent efficiency and scalability

Introduction

As AI agents become increasingly sophisticated, they're connecting to more tools, APIs, and data sources than ever before. The Model Context Protocol (MCP) has emerged as a powerful open standard for connecting AI agents to external systems. However, traditional MCP implementations face significant challenges around context window management and token efficiency.

In this article, we'll explore how integrating code execution with MCP addresses these challenges, enabling more efficient, scalable, and capable AI agents. We'll dive into real-world examples, implementation patterns, and best practices.

Understanding the Model Context Protocol (MCP)

The Model Context Protocol (MCP) is an open standard that enables AI agents to seamlessly connect to external systems, tools, and data sources. Think of it as a universal adapter that allows your AI agent to interact with:

Cloud Services: Google Drive, Salesforce, AWS, Azure
Databases: PostgreSQL, MongoDB, Redis
APIs: REST APIs, GraphQL endpoints
File Systems: Local and remote file storage
Development Tools: GitHub, Jira, Slack

MCP provides a standardized way for agents to discover, invoke, and interact with these resources, making it easier to build powerful, multi-tool AI applications.

The Problem: Challenges with Traditional MCP

Challenge 1: Context Window Overload

The Issue:
When an AI agent connects to multiple tools via MCP, all tool definitions must be loaded into the model's context window. As the number of tools grows, this consumes increasingly large amounts of context space.

Real-World Impact:

An agent connecting to 50 tools might need 20,000+ tokens just for tool definitions
This leaves less room for actual conversation and task execution
Response times increase as the model processes more tokens
Costs escalate with every additional tool

Example Scenario:

Agent Context Window (128K tokens):
├── Tool Definitions: 25,000 tokens (50 tools × 500 tokens avg)
├── Conversation History: 15,000 tokens
├── System Instructions: 5,000 tokens
└── Available for Work: 83,000 tokens (65% of capacity)

Challenge 2: Excessive Token Consumption

The Issue:
Every intermediate result from tool calls must pass through the model's context window. For large datasets or multi-step operations, this creates a token consumption spiral.

Real-World Impact:

Downloading a 10,000-row spreadsheet consumes tokens for the entire dataset
Processing results requires the data to be in context multiple times
Complex workflows can easily exceed context limits
Costs multiply with each intermediate result

Example Scenario:

Task: Process sales data from Google Drive and update Salesforce

Without Code Execution:
1. Download spreadsheet: 50,000 tokens (entire file in context)
2. Model processes data: 50,000 tokens (still in context)
3. Update Salesforce: 5,000 tokens (results in context)
Total: 105,000 tokens consumed

The Solution: Code Execution with MCP

Code execution with MCP allows agents to run code in a secure, sandboxed environment. Instead of passing all data through the model's context, agents can:

Execute code to process data locally
Filter and transform results before returning
Maintain state across operations
Build reusable functions and workflows

Key Benefits with Real-World Examples

1. Progressive Disclosure

What It Means:
Agents can load tool definitions on-demand rather than loading everything upfront. Tools are presented as code on a filesystem, allowing the model to read definitions only when needed.

Example:

# Traditional Approach (All tools loaded)
tools = [
    "gdrive.read_file",
    "gdrive.write_file", 
    "gdrive.list_files",
    "salesforce.get_record",
    "salesforce.update_record",
    "salesforce.create_lead",
    # ... 44 more tools
]
# Result: 25,000 tokens consumed upfront

# With Code Execution (Progressive Disclosure)
# Agent searches for what it needs:
search_tools("salesforce")
# Result: Only loads Salesforce tools (2,000 tokens)

# Later, when needed:
search_tools("gdrive")
# Result: Loads Google Drive tools (1,500 tokens)

Benefits:

80% reduction in initial context load
Faster agent startup times
Lower costs per interaction
More context available for actual work

2. Efficient Data Handling

What It Means:
Agents can filter, transform, and process large datasets within the execution environment before returning results to the model.

Example: Sales Data Processing

Without Code Execution:

# Agent downloads entire spreadsheet
spreadsheet = gdrive.get_sheet("sales_data_2024.xlsx")
# Result: 10,000 rows × 20 columns = 200,000 data points
# Tokens consumed: ~50,000 tokens

# Model must process all data in context
# Agent asks: "Find all sales over $10,000 in Q4"
# Model processes 10,000 rows in context
# Additional tokens: ~30,000 tokens
# Total: 80,000 tokens

With Code Execution:

# Agent executes code to filter data
code = """
import pandas as pd

# Load spreadsheet
df = pd.read_excel('sales_data_2024.xlsx')

# Filter: Q4 sales over $10,000
q4_sales = df[
    (df['Quarter'] == 'Q4') & 
    (df['Amount'] > 10000)
]

# Return only filtered results
return q4_sales.to_dict('records')
"""

result = execute_code(code)
# Result: Only 150 rows returned
# Tokens consumed: ~5,000 tokens
# Savings: 75,000 tokens (94% reduction)

Real-World Use Case:
An agent processing customer feedback from a CSV file with 100,000 rows can:

Filter by sentiment (positive/negative)
Extract only relevant columns
Aggregate statistics
Return a summary instead of raw data

Result: 95% token reduction, faster processing, lower costs.

3. Enhanced Control Flow

What It Means:
Agents can use familiar programming constructs (loops, conditionals, error handling) to execute complex workflows in a single step.

Example: Multi-Step Data Pipeline

Without Code Execution:

# Step 1: Get data from API
api_data = api_client.get_data()
# Tokens: 5,000

# Step 2: Process in model context
# Model decides: "I need to transform this"
transformed = transform_data(api_data)
# Tokens: 5,000 (data still in context)

# Step 3: Save to database
db.save(transformed)
# Tokens: 2,000

# Step 4: Update status
status.update("completed")
# Tokens: 1,000

# Total: 13,000 tokens, 4 round trips

With Code Execution:

code = """
import requests
import json
from database import save_to_db
from status import update_status

try:
    # Step 1: Get data
    response = requests.get('https://api.example.com/data')
    data = response.json()

    # Step 2: Transform
    transformed = [
        {
            'id': item['id'],
            'name': item['name'].upper(),
            'value': item['value'] * 1.1
        }
        for item in data
    ]

    # Step 3: Save to database
    save_to_db(transformed)

    # Step 4: Update status
    update_status("completed")

    return {"status": "success", "records": len(transformed)}

except Exception as e:
    update_status("failed")
    return {"status": "error", "message": str(e)}
"""

result = execute_code(code)
# Total: 2,000 tokens (only result returned)
# Savings: 11,000 tokens (85% reduction)
# Single round trip instead of 4

Benefits:

85% reduction in token usage
75% reduction in latency (fewer round trips)
Better error handling
More reliable workflows

4. Privacy Preservation

What It Means:
Intermediate results remain within the execution environment by default. Sensitive data only enters the model's context if explicitly returned.

Example: Processing Sensitive Customer Data

Without Code Execution:

# Agent needs to process customer PII
customers = database.get_customers()
# All customer data (names, emails, SSNs) enters context
# Tokens: 20,000
# Privacy Risk: HIGH - All PII in model context

# Model processes data
anonymized = anonymize(customers)
# Tokens: 20,000 (still in context)

With Code Execution:

code = """
from database import get_customers
from anonymize import hash_pii

# Get customers (stays in execution environment)
customers = get_customers()

# Anonymize within execution environment
anonymized = [
    {
        'id': hash_pii(c['id']),
        'segment': c['segment'],
        'value': c['lifetime_value']
    }
    for c in customers
]

# Only return anonymized data
return anonymized
"""

result = execute_code(code)
# Only anonymized data enters context
# Tokens: 3,000 (only summary data)
# Privacy Risk: LOW - No PII in model context

Use Cases:

Healthcare data processing (HIPAA compliance)
Financial data analysis (PCI compliance)
Personal information handling (GDPR compliance)

5. State Persistence and Skill Development

What It Means:
Agents can maintain state across operations by writing to files, and can save reusable code implementations for future tasks.

Example: Building a Reusable Data Analysis Function

# Agent creates a reusable function
code = """
import pandas as pd
import json

def analyze_sales_data(file_path, filters):
    '''
    Reusable function for sales data analysis
    '''
    df = pd.read_excel(file_path)

    # Apply filters
    for key, value in filters.items():
        df = df[df[key] == value]

    # Calculate metrics
    metrics = {
        'total_sales': df['amount'].sum(),
        'average_sale': df['amount'].mean(),
        'count': len(df),
        'top_product': df.groupby('product')['amount'].sum().idxmax()
    }

    return metrics

# Save function for future use
with open('utils/sales_analyzer.py', 'w') as f:
    f.write(inspect.getsource(analyze_sales_data))
"""

execute_code(code)

# Later, agent can reuse the function
code = """
from utils.sales_analyzer import analyze_sales_data

result = analyze_sales_data(
    'sales_2024.xlsx',
    {'quarter': 'Q4', 'region': 'North'}
)

return result
"""

result = execute_code(code)

Benefits:

Agents build a library of reusable functions
State persists across sessions
Agents become more capable over time
Reduced token usage for repeated operations

Implementation Examples

Example 1: E-commerce Data Pipeline

Scenario: An agent needs to sync product data from Shopify to a local database, filter for high-value items, and generate a report.

code = """
import requests
import sqlite3
from datetime import datetime

# Step 1: Fetch products from Shopify API
shopify_url = "https://your-store.myshopify.com/admin/api/2024-01/products.json"
headers = {"X-Shopify-Access-Token": "your_token"}
response = requests.get(shopify_url, headers=headers)
products = response.json()['products']

# Step 2: Filter high-value products (price > $100)
high_value = [
    p for p in products 
    if float(p.get('variants', [{}])[0].get('price', 0)) > 100
]

# Step 3: Save to local database
conn = sqlite3.connect('products.db')
cursor = conn.cursor()

for product in high_value:
    cursor.execute('''
        INSERT OR REPLACE INTO products 
        (id, title, price, updated_at)
        VALUES (?, ?, ?, ?)
    ''', (
        product['id'],
        product['title'],
        product['variants'][0]['price'],
        datetime.now().isoformat()
    ))

conn.commit()
conn.close()

# Step 4: Generate summary
summary = {
    'total_products': len(products),
    'high_value_count': len(high_value),
    'sync_timestamp': datetime.now().isoformat()
}

return summary
"""

result = execute_code(code)
# Returns: {"total_products": 5000, "high_value_count": 1200, ...}
# Only summary in context, not 5000 products!

Token Savings:

Without code execution: ~150,000 tokens (all products in context)
With code execution: ~500 tokens (only summary)
99.7% reduction

Example 2: Multi-Source Data Aggregation

Scenario: An agent needs to aggregate data from multiple sources (Google Sheets, Airtable, and a REST API) and create a unified report.

code = """
import pandas as pd
import requests
from gdrive import get_sheet
from airtable import get_records

# Step 1: Get data from Google Sheets
sheet_data = get_sheet('1ABC123', 'Sheet1')
df_sheets = pd.DataFrame(sheet_data)

# Step 2: Get data from Airtable
airtable_data = get_records('app123', 'table1')
df_airtable = pd.DataFrame(airtable_data)

# Step 3: Get data from REST API
api_response = requests.get('https://api.example.com/data')
api_data = api_response.json()
df_api = pd.DataFrame(api_data)

# Step 4: Merge and deduplicate
df_merged = pd.concat([df_sheets, df_airtable, df_api])
df_merged = df_merged.drop_duplicates(subset=['id'])

# Step 5: Calculate aggregated metrics
report = {
    'total_records': len(df_merged),
    'sources': {
        'google_sheets': len(df_sheets),
        'airtable': len(df_airtable),
        'api': len(df_api)
    },
    'date_range': {
        'earliest': df_merged['date'].min(),
        'latest': df_merged['date'].max()
    },
    'summary_stats': {
        'total_value': df_merged['value'].sum(),
        'average_value': df_merged['value'].mean()
    }
}

# Save merged data to file for future use
df_merged.to_csv('merged_data.csv', index=False)

return report
"""

result = execute_code(code)
# Complex multi-source aggregation in one step
# Only aggregated report in context, not raw data

Example 3: Intelligent File Processing

Scenario: An agent needs to process a directory of PDF files, extract text, summarize each, and create an index.

code = """
import os
from PyPDF2 import PdfReader
from pathlib import Path

def extract_and_summarize_pdf(file_path):
    '''Extract text and create summary for a PDF'''
    reader = PdfReader(file_path)
    text = ""
    for page in reader.pages:
        text += page.extract_text()

    # Simple summary: first 200 chars
    summary = text[:200] + "..." if len(text) > 200 else text

    return {
        'file': os.path.basename(file_path),
        'pages': len(reader.pages),
        'char_count': len(text),
        'summary': summary
    }

# Process all PDFs in directory
pdf_dir = Path('./documents')
pdf_files = list(pdf_dir.glob('*.pdf'))

index = []
for pdf_file in pdf_files:
    try:
        info = extract_and_summarize_pdf(pdf_file)
        index.append(info)
    except Exception as e:
        index.append({
            'file': pdf_file.name,
            'error': str(e)
        })

# Save index to file
import json
with open('pdf_index.json', 'w') as f:
    json.dump(index, f, indent=2)

# Return summary
return {
    'total_files': len(pdf_files),
    'processed': len([i for i in index if 'error' not in i]),
    'total_pages': sum(i.get('pages', 0) for i in index),
    'index_file': 'pdf_index.json'
}
"""

result = execute_code(code)
# Processed 50 PDFs, extracted text, created summaries
# Only summary statistics in context, not all PDF text

Security Considerations

Implementing code execution with MCP requires careful attention to security:

1. Sandboxing

# Example: Docker-based sandbox
sandbox_config = {
    'image': 'python:3.11-slim',
    'memory_limit': '512m',
    'cpu_limit': '1.0',
    'network': 'none',  # No network access
    'read_only': True,  # Read-only filesystem
    'timeout': 30  # 30 second timeout
}

2. Resource Limits

Memory: Limit to prevent memory exhaustion
CPU: Throttle CPU usage
Time: Set execution timeouts
Disk: Limit disk space usage

3. Code Validation

# Example: Validate code before execution
def validate_code(code):
    # Check for dangerous operations
    dangerous_patterns = [
        'import os',
        'subprocess',
        '__import__',
        'eval(',
        'exec('
    ]

    for pattern in dangerous_patterns:
        if pattern in code:
            raise SecurityError(f"Dangerous pattern detected: {pattern}")

    return True

4. Monitoring and Logging

Log all code executions
Monitor resource usage
Alert on suspicious patterns
Audit trail for compliance

Best Practices

1. Start Small

Begin with simple operations and gradually add complexity as you gain confidence in your security setup.

2. Use Type Hints and Validation

code = """
def process_data(data: list[dict]) -> dict:
    # Type hints help the model understand expected inputs/outputs
    return {'count': len(data)}
"""

3. Error Handling

Always include error handling in executed code:

code = """
try:
    result = risky_operation()
    return {'status': 'success', 'data': result}
except Exception as e:
    return {'status': 'error', 'message': str(e)}
"""

4. Document Functions

Well-documented functions help agents understand and reuse code:

code = """
def calculate_metrics(data):
    '''
    Calculate key metrics from sales data.

    Args:
        data: List of dictionaries with 'amount' and 'date' keys

    Returns:
        Dictionary with 'total', 'average', and 'count'
    '''
    # Implementation...
"""

When to Use Code Execution with MCP

Use code execution when:

✅ Processing large datasets (>1000 rows)
✅ Working with multiple data sources
✅ Need for complex data transformations
✅ Privacy-sensitive operations
✅ Repetitive operations that can be automated
✅ Multi-step workflows

Stick with traditional MCP when:

❌ Simple, single-step operations
❌ Small datasets (<100 rows)
❌ One-off queries
❌ Security requirements prohibit code execution

Conclusion

Integrating code execution with MCP represents a significant leap forward in AI agent efficiency and capability. By enabling:

Progressive disclosure of tools
Efficient data processing within execution environments
Enhanced control flow with familiar programming constructs
Privacy preservation through local processing
State persistence and skill development

We can build AI agents that are not only more powerful but also more cost-effective, scalable, and privacy-conscious.

The key is to balance the benefits against the implementation complexity and security considerations. With proper sandboxing, resource limits, and monitoring, code execution with MCP can transform how we build AI applications.

As AI agents continue to evolve, this pattern will become increasingly important for building production-ready, scalable AI systems that can handle real-world complexity while maintaining efficiency and security.

DEV Community

Code Execution with MCP: Building More Efficient AI Agents

Introduction

Understanding the Model Context Protocol (MCP)

The Problem: Challenges with Traditional MCP

Challenge 1: Context Window Overload

Challenge 2: Excessive Token Consumption

The Solution: Code Execution with MCP

Key Benefits with Real-World Examples

1. Progressive Disclosure

2. Efficient Data Handling

3. Enhanced Control Flow

4. Privacy Preservation

5. State Persistence and Skill Development

Implementation Examples

Example 1: E-commerce Data Pipeline

Example 2: Multi-Source Data Aggregation

Example 3: Intelligent File Processing

Security Considerations

1. Sandboxing

2. Resource Limits

3. Code Validation

4. Monitoring and Logging

Best Practices

1. Start Small

2. Use Type Hints and Validation

3. Error Handling

4. Document Functions

When to Use Code Execution with MCP

Conclusion

Further Reading

Top comments (0)