Sauveer Ketan

Posted on Mar 25 • Originally published at Medium

Strands AI Functions: Write Python Functions in Natural Language

#strandagents #aws #generativeai #python

This post is written for architects and developers already familiar with Amazon Strands Agents SDK.

AWS's experimental new library lets you write AI-powered Python functions in natural language — and the LLM writes the implementation at runtime.

Most real-world agentic workflows still require a lot of traditional code. For example, imagine accepting an uploaded invoice file in an unknown format and converting it into a clean, normalized DataFrame for use in the rest of the workflow. With traditional code, you write format-detection logic, transformation pipelines, prompt templates, response parsers, and retry loops — dozens of lines before you've even gotten to the business logic. What if you could just describe what you want and let the model figure out the rest?

That's exactly what Strands AI Functions is designed to do. Released by AWS as part of the newly launched Strands Labs experimental organization, AI Functions is a Python library that gives developers a disciplined, intent-based approach to build reliable, AI-powered pipelines — without writing traditional prompt orchestration, parsing, and retry logic for the AI components.

"At a high level, AI Functions lets you describe intent, while the framework handles execution, correction, and validation."

What Is Strands Labs?

Before diving into AI Functions, a quick note on where it comes from. In early 2026, AWS launched Strands Labs — a separate GitHub organization designed as an innovation sandbox for experimental agentic AI projects. Think of it as the frontier research wing of the Strands Agents SDK.

Strands Labs launched with three projects: Robots (physical AI agents), Robots Sim (simulation environments), and AI Functions — the one that should immediately catch the attention of any developer or architect building AI-powered pipelines.

AI Functions is experimental. Expect breaking changes. It is not yet production-ready, but the concepts are production-relevant today — understanding them now puts you ahead of the curve.

The Core Idea: Functions Written in Natural Language

An AI Function looks like a normal Python function decorated with @ai_function. But instead of writing code in the function body, you write a docstring in natural language that describes what the function should do.

AI Functions are implemented on top of the Strands Agent runtime. Any valid option of strands.Agent (such as model, tools, system_prompt) can be passed in the decorator.

When an AI Function is called, the library will automatically:

Create a Strands agent
Generate a prompt based on the docstring template and the provided arguments
Parse and validate the result
Return it as a typed Python object

From the outside, it behaves like any other Python function.

from ai_functions import ai_function
from pydantic import BaseModel

class MeetingSummary(BaseModel):
    attendees: list[str]
    summary: str
    action_items: list[str]

@ai_function
def summarize_meeting(transcripts: str) -> MeetingSummary:
    """
    Write a summary of the following meeting in less than 50 words.
    <transcripts>
    {transcripts}
    </transcripts>
    """

# Call it just like any normal Python function
result = summarize_meeting(transcript_text)
print(result.summary)

This function takes typed inputs, returns a typed Pydantic model, and the library handles everything in between — creating the agent, running the model, parsing the output, and returning a validated Python object. To the rest of your codebase, it's just a function call.

Post-Conditions

This is the feature that sets Strands AI Functions apart from every other "just call the LLM" approach. The core philosophy is that you should never rely on prompt engineering alone to guarantee output correctness. Instead, you define post-conditions — validation functions that run after the AI produces its output.

If a post-condition fails, the library automatically feeds the error back to the agent in a self-correcting loop, up to a configurable number of attempts. Your pipeline either gets a validated result or fails cleanly — no silent garbage output sneaking through.

from ai_functions import ai_function, PostConditionResult

# Standard Python validator
def check_length(response: MeetingSummary):
    length = len(response.summary.split())
    assert length <= 50, f"Summary is {length} words, must be ≤ 50"

# Or use another AI Function as a validator!
@ai_function
def check_style(response: MeetingSummary) -> PostConditionResult:
    """
    Check if the summary uses bullet points and provides sufficient context.
    <summary>{response.summary}</summary>
    """

@ai_function(post_conditions=[check_length, check_style], max_attempts=5)
def summarize_meeting(transcripts: str) -> MeetingSummary:
    """Write a concise meeting summary. <transcripts>{transcripts}</transcripts>"""

Notice that a post-condition can itself be an AI Function — enabling sophisticated validation of stylistic or semantic constraints that would be impossible to express in pure Python logic.

The Three Pillars of AI Functions

1. Natural Language Instructions

Describe what you want in plain language, either as a docstring or a returned string from the function body.

2. Post-Conditions

Define explicit validation rules that the AI output must satisfy, triggering automatic self-correcting retries.

3. Python Integration

AI Functions aim to feel like a natural extension of the programming language itself, enabling new kinds of programming patterns and abstractions. They return real Python objects and integrate directly into existing codebases, rather than producing raw text.

The Universal Data Loader: A Good Use Case

Consider one of the most compelling examples in the official documentation. You're building a webapp that accepts invoice uploads in any format — JSON, CSV, PDF, SQLite. Normally, you'd write format-detection logic and separate transformation pipelines for each format.

With AI Functions and code_execution_mode="local" enabled, the agent inspects the file at runtime, determines the format, writes the appropriate loading and transformation code, and returns a properly-typed Pandas DataFrame — complete with schema validation via a post-condition.

When using a Python executor (code_execution_mode="local"), all input variables to the AI function are automatically loaded into the Python environment. This means the agent can directly reference and manipulate these variables in the generated code without needing to parse them from the prompt.

@ai_function(
    post_conditions=[check_invoice_dataframe],
    code_execution_mode="local",
    code_executor_additional_imports=["pandas", "sqlite3"],
)
def import_invoice(path: str) -> DataFrame:
    """
    The file `{path}` contains purchase logs. Extract them into a DataFrame
    with columns: product_name (str), quantity (int), price (float),
    purchase_date (datetime).
    """

# Works on JSON, CSV, SQLite - the agent figures it out
df = import_invoice('data/invoice.json')
df2 = import_invoice('data/invoice.sqlite3')

In practice, this works best when combined with strong post-conditions and constrained execution environments.

⚠️ Security Note

Right now, Strands AI Functions support only "local" execution. Local code execution carries inherent risk. AWS recommends running this inside a Docker container or sandbox environment. Remote sandboxed execution is on the roadmap. Things to consider:

Run local execution in read-only filesystems
Use network restrictions
Strip secrets from runtime environment
Treat AI-generated code as untrusted by default
Add observability — because each agent step is explicit, failures and retries are inspectable, making it far easier to debug than ad-hoc prompt pipelines

Async and Parallel Workflows

AI Functions support async definitions natively, enabling parallel agentic workflows that dramatically reduce wall-clock time. In the stock report example from the official documentation, two research agents run concurrently using asyncio.gather() before their results are combined into a final report — a pattern that maps perfectly onto real-world multi-step analysis pipelines.

@ai_function(tools=[...])
async def research_news(stock: str) -> str:
    """Research and summarize current news for: {stock}"""

@ai_function(tools=[...])
async def research_price(stock: str, past_days: int) -> DataFrame:
    """Retrieve 30-day historical prices for {stock} using yfinance."""

async def stock_workflow(stock: str):
    # Both agents run in parallel
    news, prices = await asyncio.gather(
        research_news(stock),
        research_price(stock, past_days=30)
    )
    write_report(stock, news, prices)

AI Functions as Tools in Multi-Agent Systems

Here's where it gets architecturally interesting — AI Functions can be registered as tools within other agents — both other AI Functions and regular Strands Agents. This creates a composable, hierarchical agent architecture where each layer does exactly what it's best at.

@ai_function(description="Search the web and return a summary", tools=[...])
def websearch(query: str) -> str:
    """Research `{query}` online and return a summary of findings."""

@ai_function(tools=[websearch])  # websearch is now a tool for this agent
def report_writer(topic: str) -> str:
    """Research the following topic and write a report: {topic}"""

# Also works with regular Strands Agents:
# agent = Agent(model=..., tools=[websearch])

Why This Matters for AWS Architects

If you're working with Strands, this library slots in naturally. The default model is Claude on Bedrock, and you can swap in any Strands-supported model.

For architects and platform engineers, the important takeaway is not just the library itself, but the pattern it represents:

Separation of intent from implementation — declare what you need, not how to do it
Deterministic guardrails around non-deterministic AI using post-conditions
Composable, reusable AI components — functions as first-class building blocks for agent graphs
Native Python support — return real objects, not raw strings, maintaining type safety across your pipeline

As the lines between traditional software engineering and AI engineering continue to blur, frameworks like Strands AI Functions point toward a near future where the "implementation" of a function and the "intent" of a function can finally be decoupled — the developer specifies the intent, the model fulfills it, and post-conditions enforce it.

When NOT to Use AI Functions

These functions are inherently non-deterministic, which makes them a poor fit for certain scenarios:

Low-latency paths
Hard real-time requirements
Strong deterministic compliance constraints (e.g., finance)

Additionally, AI Functions can incur higher costs than deterministic pipelines due to repeated LLM invocations (retries, validation passes, and tool calls). Cost controls and limits are therefore essential.

Getting Started

If you want to experiment hands-on, getting started is refreshingly simple. The official documentation includes a clean QuickStart with the meeting summarizer example, and the Strands Labs GitHub has complete examples for stock report generation, multi-agent orchestration, and context management for long-running tasks. The AWS Blog release post also provides more details.

DEV Community