Danilo Poccia for AWS

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

Building a Semantic Storage for Humans and AI Agents

#ai #aws #python #mcp

This week at AWS re:Invent 2025, Amazon S3 Vectors reached general availability, bringing purpose-built vector storage directly into object storage. S3 Vectors now supports up to 2 billion vectors per index (40x the preview capacity), delivers query latencies around 100ms for frequent queries, and integrates with Amazon Bedrock Knowledge Bases and Amazon OpenSearch Service. About a month earlier, Amazon Nova Multimodal Embeddings became available in Amazon Bedrock, providing a unified embedding model that handles text, images, audio, video, and documents through a single model.

The combination of these two services creates an interesting opportunity: store any content in Amazon S3, generate embeddings with Nova, index them in S3 Vectors, and you have semantic search across everything—without managing vector database infrastructure.

That idea became SemStash, a semantic storage system that lets you store any content and find it using natural language. Instead of remembering exact file names or maintaining folder hierarchies, you describe what you're looking for: "the presentation about Q3 revenue" or "photos from the beach trip."

In this post, I'll walk through how SemStash works, the architecture decisions behind it, and how you can use it both as a human tool and as persistent memory for AI agents.

The Core Idea

The fundamental concept is straightforward: when you upload a file, SemStash stores it in S3 and generates a vector embedding using Amazon Nova. This embedding captures what the content means, not just what it contains. When you search, SemStash converts your query into an embedding and finds content with similar meaning.

This architecture means you can search across different media types. Upload a photo of a sunset, then find it by searching for "evening sky with orange colors." Upload a meeting recording, then find it by asking for "discussion about the new product launch."

Understanding the Building Blocks

Before diving into the implementation, it helps to understand how each underlying technology works.

Amazon S3 Vectors

S3 Vectors introduces vector buckets—a new bucket type specifically designed for storing and querying vector embeddings. Unlike regular S3 buckets that store objects, vector buckets organize data into vector indexes where you can run similarity queries. Each vector bucket can hold up to 10,000 indexes, and each index can store tens of millions of vectors.

The key operations are putting vectors (with optional metadata for filtering), querying for similar vectors, and managing the index lifecycle. Writes are strongly consistent, meaning you can query immediately after inserting. S3 Vectors handles the optimization of your vector data automatically as it evolves, maintaining performance without manual tuning.

What makes S3 Vectors particularly interesting for this use case is the cost model. Traditional vector databases often require provisioned capacity, but S3 Vectors follows the S3 pattern: you pay for what you store and query, with no infrastructure to manage. For applications like SemStash where queries might be infrequent but storage needs to be durable and scalable, this works well.

Amazon Nova Multimodal Embeddings

Embedding models convert content into numerical vectors that capture semantic meaning. What makes Nova Multimodal Embeddings different from earlier models is that it handles multiple content types through a single model, mapping them all into the same semantic space.

This unified approach means that an embedding from a text description and an embedding from an image can be compared directly. You can search your photo library with a text query like "person smiling on beach" and find matching images, even though the query is text and the content is visual. The same applies to audio and video: search for "piano music" and find audio files containing piano, or search for "outdoor interview" and find video clips matching that description.

Nova supports four embedding dimensions: 256, 384, 1024, and 3072. Higher dimensions capture more semantic nuance but require more storage. The model uses Matryoshka Representation Learning, which means the first N dimensions of a larger embedding work as a valid smaller embedding, giving you flexibility to balance precision against storage costs.

For synchronous operations, Nova handles up to 8,192 tokens of text or 30 seconds of audio/video. Longer content can be processed asynchronously with automatic segmentation.

How SemStash Works

The architecture separates content storage from semantic indexing:

The core library handles all AWS interactions. Five interfaces—CLI, Python API, Model Context Protocol (MCP) server, Web UI, and REST API—share this same core, ensuring consistent behavior across all access methods.

Storage Design

Each stash consists of two S3 buckets: one standard bucket for your files and one vector bucket for embeddings. The content bucket stores original files with their metadata. The vector bucket uses S3 Vectors to store embeddings with matching keys, enabling fast similarity search.

This design keeps content and embeddings synchronized: when you delete a file, its embedding is also removed. The check command verifies consistency, and sync repairs any drift that might occur.

Content Type Handling

Different content types require different processing before embedding:

Text files (.txt, .md, JSON, HTML, CSV, XML) are embedded directly. The embedding captures the semantic meaning of the text content.

Images (JPEG, PNG, GIF, WebP) are embedded visually. Nova understands the visual content, so you can search for "red car" or "person smiling" and find matching images.

Audio files (MP3, WAV, FLAC, OGG) are processed for semantic content. You can search recordings by their spoken content or audio characteristics.

Video content (MP4, WebM, MOV, MKV) is embedded considering both visual and audio elements. Search for "presentation with charts" or "outdoor interview" and find matching clips.

Documents receive special handling. PDF files are rendered as images and embedded visually, preserving layout and graphics. Word documents, PowerPoint presentations, and Excel spreadsheets have their text extracted and embedded, making all their content searchable.

Using SemStash from the Command Line

The CLI provides the primary human interface. Install it with uv:

uv tool install git+https://github.com/danilop/semstash.git

Creating a stash sets up the S3 bucket and vector index:

semstash init my-stash

Uploading follows a path model similar to a filesystem. Every piece of content has a path starting with /, and the trailing slash determines whether you're specifying a folder or an exact path:

# Upload to root (file keeps its original name)
semstash my-stash upload vacation-photo.jpg /

# Upload to a folder (trailing slash = folder)
semstash my-stash upload meeting-notes.txt /notes/

# Upload with tags for organization
semstash my-stash upload *.jpg /photos/ --tag vacation --tag 2024

Searching uses natural language:

semstash my-stash query "beach sunset"
semstash my-stash query "financial projections for next year"
semstash my-stash query "action items from last meeting"

Results are ranked by semantic similarity. You can filter by tags or path:

semstash my-stash query "sunset" --tag photos
semstash my-stash query "meeting notes" --path /notes/

The Python API

For programmatic access, the same functionality is available through a Python library:

from semstash import SemStash

# Create and initialize storage
stash = SemStash("my-stash")
stash.init()

# Upload content to root
result = stash.upload("photo.jpg", target="/", tags=["vacation", "beach"])
print(f"Stored at: {result.path}")  # /photo.jpg

# Upload to a folder (preserves filename)
result = stash.upload("notes.txt", target="/docs/")
print(f"Stored at: {result.path}")  # /docs/notes.txt

# Query semantically
for item in stash.query("sunset on beach", top_k=5):
    print(f"{item.score:.2f} - {item.path}")
    print(f"  Download: {item.url}")

# Query with path filter
for item in stash.query("meeting notes", path="/docs/"):
    print(f"{item.path}: {item.score:.2f}")

# Get content metadata and URL
content = stash.get("/photo.jpg")
print(f"Type: {content.content_type}, Size: {content.file_size}")

# Browse a folder
for item in stash.browse("/docs/").items:
    print(f"{item.path}: {item.content_type}")

# Download content locally
stash.download("/photo.jpg", "./local-copy.jpg")

# Delete when done
stash.delete("/photo.jpg")

The API supports all the same operations as the CLI: init(), open(), upload(), query(), get(), download(), delete(), browse(), check(), sync(), and destroy().

Web Interface

For browser-based access, SemStash includes a web interface that provides a visual way to interact with your semantic storage:

semstash web
# Open http://localhost:8000/ui/

The web interface makes SemStash accessible without command-line experience. The dashboard at /ui/ shows storage statistics and provides quick actions. The upload page at /ui/upload supports drag-and-drop file uploads with target path specification. Browse pages at /ui/browse/{path} offer paginated content lists with folder navigation. The search page at /ui/search provides semantic search with relevance scores displayed alongside results. Content pages at /ui/content/{path} show previews, metadata, and download/delete options.

The browse interface lets you navigate your content by folder structure:

Semantic search results display with relevance scores, making it clear how well each result matches your query:

Configure the server with environment variables:

export SEMSTASH_BUCKET=my-stash
export SEMSTASH_HOST=0.0.0.0     # Optional: bind address
export SEMSTASH_PORT=8000        # Optional: port number
semstash web

REST API

The same server that hosts the web interface exposes a REST API for programmatic HTTP access. Interactive documentation is available at /docs. Key endpoints:

POST   /init              Create new storage
POST   /open              Open existing storage
POST   /upload            Upload files (multipart form with target path)
GET    /query?q=...       Semantic search (supports path= filter)
GET    /content/{path}    Get metadata and download URL
DELETE /content/{path}    Remove content
GET    /browse/{path}     List stored content at path
GET    /stats             Storage statistics
GET    /check             Consistency check
POST   /sync              Repair inconsistencies
DELETE /destroy           Remove storage (irreversible)

Interactive API documentation is available at /docs.

MCP Server for AI Agents

The Model Context Protocol (MCP) server gives AI assistants persistent semantic memory. Start it with:

semstash mcp

For MCP-compatible assistants, add to your configuration:

{
  "mcpServers": {
    "semstash": {
      "command": "semstash",
      "args": ["mcp"],
      "env": {
        "SEMSTASH_BUCKET": "my-agent-memory"
      }
    }
  }
}

The MCP server exposes tools for uploading content, querying semantically, browsing stored items, and managing the stash. Agents can save information they discover and retrieve it in future conversations.

Using SemStash with Strands Agents

Strands Agents provides first-class support for MCP, making it straightforward to give agents access to SemStash as persistent memory. Here's how to connect a Strands agent to the SemStash MCP server:

from mcp import stdio_client, StdioServerParameters
from strands import Agent
from strands.tools.mcp import MCPClient

# Create the MCP client for SemStash
semstash_client = MCPClient(lambda: stdio_client(
    StdioServerParameters(
        command="semstash",
        args=["mcp"],
        env={"SEMSTASH_BUCKET": "agent-memory"}
    )
))

# Use the MCP client with an agent
with semstash_client:
    tools = semstash_client.list_tools_sync()
    agent = Agent(tools=tools)

    # The agent can now store and retrieve information semantically
    agent("Save this information: The quarterly report deadline is March 15th")

    # Later, the agent can recall it
    agent("When is the quarterly report due?")

The agent doesn't need to know the exact phrasing or keywords used when information was stored. The semantic search finds relevant content based on meaning, so asking about "quarterly report due date" will find content stored with "quarterly report deadline."

For agents that need to combine SemStash with other tools, you can include the MCP client alongside other tool providers:

from strands import Agent
from strands.tools.mcp import MCPClient
from strands_tools import calculator, current_time
from mcp import stdio_client, StdioServerParameters

semstash_client = MCPClient(lambda: stdio_client(
    StdioServerParameters(
        command="semstash",
        args=["mcp"],
        env={"SEMSTASH_BUCKET": "agent-memory"}
    )
))

with semstash_client:
    mcp_tools = semstash_client.list_tools_sync()

    # Combine MCP tools with other tools
    agent = Agent(tools=[calculator, current_time] + mcp_tools)

    # Agent can use all tools together
    agent("What time is it, and do I have any meetings scheduled today?")

This pattern lets agents build knowledge over time. An agent working on a research project can save findings as it discovers them, then recall relevant information when answering questions or generating reports.

Configuration and Tuning

SemStash works with sensible defaults but supports customization through environment variables or a configuration file.

These environment variables configure the web server, MCP server, and Python API. The CLI takes the bucket name as a command argument instead.

SEMSTASH_BUCKET=my-stash        # Bucket name (for web/MCP/Python API)
SEMSTASH_REGION=us-east-1       # AWS region
SEMSTASH_DIMENSION=3072         # Embedding dimension (256, 384, 1024, 3072)

Or through a configuration file (semstash.toml or .semstash.toml):

[aws]
region = "us-east-1"

[embeddings]
dimension = 3072

[output]
format = "table"  # or "json"

The embedding dimension is the main tuning parameter. Higher dimensions (3072) capture more semantic nuance, while lower dimensions (256, 384, 1024) reduce storage costs with some accuracy trade-off. The dimension is set when you create a stash and cannot be changed afterward—when you open an existing stash, SemStash automatically uses its configured dimension.

AWS Requirements

SemStash requires AWS credentials with permissions for S3 (creating and managing buckets, uploading and downloading objects), S3 Vectors (creating indexes, storing and querying vectors), and Bedrock (invoking the Nova embeddings model).

The default region is us-east-1. You can check AWS regional availability for Amazon Bedrock, S3, and S3 Vectors in other regions.

Maintenance

A few commands help keep your stash healthy:

# Verify content and embeddings are synchronized
semstash my-stash check

# Repair any inconsistencies
semstash my-stash sync

# See storage statistics
semstash my-stash stats

# Permanently remove a stash (irreversible)
semstash my-stash destroy --force

The check command reports orphaned embeddings (vectors without content) or missing embeddings (content without vectors). The sync command repairs these issues by removing orphans and regenerating missing embeddings.

What I Learned Building This

Working with S3 Vectors and Nova Multimodal Embeddings together highlighted a few things.

The unified semantic space that Nova provides enables searches that would be difficult to express otherwise. Searching for text and finding images based on meaning, or finding video clips from audio descriptions, opens up workflows that don't fit the traditional keyword-search model. I found myself uploading content without worrying about file organization, trusting that I could describe what I needed later.

S3 Vectors fits naturally into applications where you need durable, scalable vector storage without managing infrastructure. The serverless model—pay for what you store and query—aligns well with applications where usage patterns might be bursty or unpredictable. The GA release this week at re:Invent brought significant improvements: 2 billion vectors per index, ~100ms latencies for frequent queries, and integration with Bedrock Knowledge Bases.

Building for multiple interfaces (CLI, Python API, Web UI, REST API, MCP server) from a shared core turned out to be the right decision early on. Each interface serves a different use case, but they all exercise the same underlying logic, which made testing more straightforward and behavior consistent.

The most interesting use case that emerged was using SemStash as memory for AI agents through the MCP server. Agents can accumulate knowledge over time—saving information they discover and retrieving it in future conversations—without the application developer building custom storage infrastructure. This pattern of "semantic memory" for agents feels like it has broader applications beyond what I've implemented here.

The code is available on GitHub under the MIT license. I'd be curious to hear how others use it, particularly for the agent memory use case.