Karthik Subramanian for AWS Community Builders

Posted on Oct 3

Navigating AWS Neptune Graph with AI

#aws #ai #strands #neptune

Using an AI agent to work with an RDF triple store graph DB

The Problem That Started It All

Picture this: You're a developer who needs to query an Amazon Neptune graph database. You open the SPARQL documentation, see something like this:

SELECT ?subject ?predicate ?object
WHERE {
  ?subject ?predicate ?object .
  FILTER(REGEX(STR(?subject), "http://example.com/users/"))
}
LIMIT 100

And you think: "There has to be a better way."

That was me recently. I had a Neptune database full of valuable graph data, but I was spending more time wrestling with query syntax than extracting insights. So I did what any developer would do in 2025 — I built an AI agent to handle the complexity for me.

The result is Neptune Query Shell — an AI-powered interface that lets you query Neptune databases using natural language, with support for SPARQL, Gremlin, and OpenCypher.

The AI-Driven Solution Journey

Building this tool wasn't about following a master plan. It was about letting AI help solve each challenge as it emerged, iterating through problems that every graph database developer faces.

Iteration 1: Natural Language Query Interface

The Challenge: Graph query languages are complex and intimidating.

The AI Solution: Let the AI write the queries for me.

Instead of learning SPARQL syntax:

SELECT ?person ?age ?location 
WHERE {
  ?person a :Person .
  ?person :age ?age .
  ?person :location ?location .
  FILTER (?age > 30 && ?location = "London")
}

Just describe what you want:

💬 Find all people over 30 in London

The AI agent generates the appropriate query, executes it against Neptune, and provides insights about the results.

Iteration 2: Schema Discovery Agent

The Challenge: Users don't know what's in their own databases.

The AI Solution: Let AI automatically explore and map the database structure.

Traditional approach:

{
  "vertices": [
    {"label": "???", "properties": {"???": "???"}}
  ]
}

AI-powered approach:

🔍 AI discovering database structure...
✅ Schema discovery completed!
📄 Generated schema/user_schema.json with your database structure:
   - Found 3 entity types: Person, Company, Location
   - Found 5 relationship types: WORKS_FOR, LIVES_IN, KNOWS  
   - Discovered 15 properties across all entities
   - Extracted 4 RDF namespaces for SPARQL queries

The AI agent systematically explores the database using discovery queries, analyzes the structure, and generates a complete schema configuration file. No more manual database inspection.

Iteration 3: The Context Window Solution

The Challenge: What happens when your query returns 10,000 records but your AI context window can only handle 1,000?

The Naive Approach (that breaks):

Neptune Query → 10,000 records → AI Context → TOKEN OVERFLOW → 💥

The AI-Driven Solution: Dual-path architecture with intelligent CSV export.

User Experience:

💬 Your request: Find all people in London
🤖 AI: Found 1,247 people in London (showing first 50):
    [Rich table with sample results]

    I notice most work in tech industry. Would you like to explore by occupation?

💬 Export to CSV
🤖 AI: ✅ Exported all 1,247 records to london_people_20241025_223045.csv (1.2 MB)

The AI gets enough data to provide meaningful insights without crashing, while users get access to complete datasets through CSV export.

Iteration 4: Multi-Language Support

The Challenge: Neptune supports three different query languages with different syntax patterns.

The AI Solution: Template-based abstraction that lets one AI agent handle all languages.

class QueryLanguage(Enum):
    SPARQL = "sparql"
    GREMLIN = "gremlin" 
    OPENCYPHER = "opencypher"

# Language-specific AI instructions
templates/query_languages/
├── sparql_instructions.j2
├── gremlin_instructions.j2  
└── opencypher_instructions.j2

The same AI agent can now generate queries in SPARQL, Gremlin, or OpenCypher based on the database configuration, with specialized instructions for each language while sharing the same core architecture.

Why Strands Agent Framework?

The key to making this work was choosing the right AI framework. After evaluating several options, I chose the Strands Agent SDK for reliable tool calling:

from strands import tool

class AIQueryGenerator(BaseNeptuneAgent):
    @tool
    async def execute_neptune_query(self, query: str) -> Dict[str, Any]:
        """AI can execute queries like calling a native function"""
        return await self.query_service.execute_query(query, for_ai_context=True)

    @tool 
    async def export_to_csv(self, filename: Optional[str] = None) -> Dict[str, Any]:
        """AI can export results on demand"""  
        return self.query_service.export_last_results(filename)

Key Benefits:

Native @tool decorator for seamless AI-function integration
Async/await support for Neptune's HTTP API
Built-in streaming for real-time feedback
Reliable Bedrock integration

The AI agent can execute queries and export data as naturally as calling any Python function. No complex prompt engineering required — just tools that work.

The Context Window Problem & CSV Export Strategy

Here's a problem every AI developer faces: large datasets break AI context windows.

The solution isn't to limit query results — it's to be smart about what the AI sees versus what the user gets:

async def execute_query(self, query: str, for_ai_context: bool = False) -> Dict[str, Any]:
    """Dual-path query execution"""
    # Always store complete results
    complete_results = await self.neptune_client.execute_query(query)
    self.last_complete_results = complete_results

    if for_ai_context:
        # Truncate for AI - prevent token overflow
        ai_results = complete_results[:50]  # Smart limit
        return {"results": ai_results, "truncated": len(complete_results) > 50}

    return complete_results

Why This Works:

✅ AI gets enough data to understand patterns (50 records)
✅ AI provides meaningful insights without crashing
✅ Users get complete datasets via CSV export
✅ No token limits, no failures, best of both worlds

The CSV export isn't just a nice-to-have feature — it's the solution that makes AI-powered large dataset analysis practical.

Neptune Query Shell vs AWS Neptune MCP

AWS provides their own Neptune MCP server for tool-based query execution. Here's how they serve different needs:

Feature	AWS Neptune MCP	Neptune Query Shell	Best Use Case
Query Languages	✅ Gremlin, OpenCypher	✅ SPARQL, Gremlin, OpenCypher	When you need SPARQL support
Interface Style	Tool-based execution	Conversational AI	Different interaction models
Schema Access	✅ get_graph_schema tool	✅ AI auto-discovery + file generation	Both provide schema access
Result Processing	Raw JSON responses	Rich tables + AI insights	Need visualization
Data Export	❌ Not included	✅ Smart CSV export	Large datasets
Setup Complexity	Low (MCP tool config)	Low (terminal + schema config)	Both are easy to set up
Learning Curve	Low	Low	Both are beginner-friendly

Key Differentiator: Neptune Query Shell supports SPARQL queries and provides a conversational AI interface, while AWS Neptune MCP focuses on simple tool-based execution for Gremlin/OpenCypher.

AWS Neptune MCP excels at:

MCP workflow integration (call tools from AI assistants)
Simple query execution in existing AI applications
When you need basic Neptune access via MCP protocol

Neptune Query Shell excels at:

Neptune databases
Interactive exploration with AI guidance and insights
Learning graph databases through conversation
Large dataset analysis with export capabilities

Architecture Overview

Key Components:

AI Query Generator - Natural language → Query translation with result insights
Schema Discovery Agent - Automatic database exploration and schema generation
Query Execution Service - Dual-path result handling with context window management
Neptune Client - Multi-language query support with connection management

Why This Approach Still Matters

You might wonder: "Why not just use AWS's Neptune MCP server or write queries manually?" Here's why the conversational AI approach adds value:

Philosophy Differences:

Manual Querying: "Learn the syntax, write the query"
AWS Neptune MCP: "Here are tools to execute Gremlin/OpenCypher queries"
Neptune Query Shell: "Let's have a conversation about your data"

Real-World Benefits:

For Developers: Query databases without learning complex syntax
For Data Scientists: Get insights and complete datasets for analysis
For Learning: Understand graph databases by seeing AI-generated queries
For Teams: Mixed skill levels can all access graph data effectively

Plus, being open source means full customization, community improvements, no vendor lock-in, and learning opportunities.

Key Features in Action

1. Intelligent Result Display

Raw Neptune output:

{"results": {"bindings": [{"person": {"type": "uri", "value": "http://..."}}]}}

Neptune Query Shell output:

┌─────────────────┬─────┬──────────┬──────────────┐
│ Name            │ Age │ Location │ Company      │
├─────────────────┼─────┼──────────┼──────────────┤
│ Alice Johnson   │ 34  │ London   │ TechCorp     │
│ Bob Smith       │ 28  │ London   │ DataCorp     │
└─────────────────┴─────┴──────────┴──────────────┘

🤖 AI Insights: Most people in London work in tech (87%). 
   Average age is 31. Would you like to explore by industry?

2. Streaming AI Process

Watch the AI work through your request in real-time:

💬 Your request: Find the most connected users in our network

🤖 AI Thinking Process:
─────────────────────────
I need to find users with the most connections. Let me generate a query to count relationships per user...

🔍 Executing Neptune query...

Based on the results, I can see the connection patterns. Let me analyze this data...
─────────────────────────
🤖 Processing complete!

Real-World Impact

Since launching Neptune Query Shell, I've seen it solve real problems:

For Developers:

"I can finally explore our graph database without spending hours on SPARQL docs"
"The schema discovery saved me days of manual database inspection"
"My team can now query the graph database without learning query languages"

For Data Scientists:

"The CSV export lets me analyze large datasets in familiar tools"
"AI insights help me discover patterns I wouldn't have thought to look for"

For Learning:

"I'm actually learning SPARQL by seeing what the AI generates"
"The conversational interface makes graph databases approachable"

Try It Yourself

Ready to experience AI-powered graph querying?

🚀 Get Started: github.com/karthiks3000/neptune-query-shell

⭐ Star the repo if you find it useful

💬 Join the discussion: Share your Neptune challenges and see how the community can help

🤝 Contribute: Check out the issues tab for ways to improve the project

The README contains complete setup instructions and examples to get you querying in minutes.

The Bigger Picture

Graph databases are incredibly powerful but historically hard to use. The future isn't about replacing traditional querying — it's about giving developers multiple ways to interact with their data:

Manual queries for precise control
MCP tools for application integration
AI assistance for exploration and learning
Natural language for business users

Tools like Neptune Query Shell make graph data accessible to teams with mixed technical skills, while still providing the power and flexibility that experienced developers need.

The iterative, AI-driven development approach I used here — letting AI help solve each challenge as it emerged — is becoming a powerful pattern for building developer tools. Sometimes the best solutions come from asking "How can AI help?" at each step of the journey.

DEV Community