DEV Community: Yasith wijesuriya

Efficient Vector Storage for AI: Why I Chose Pinecone with AWS

Yasith wijesuriya — Fri, 02 Jan 2026 04:52:53 +0000

       When building Generative AI applications, one of the biggest challenges is managing massive amounts of vector embeddings. Recently, I integrated Pinecone with an AWS-based AI pipeline, and the results were impressive. In this article, I want to share my hands-on experience and why this combination is a "game-changer" for AI engineers.

Why Pinecone?
While AWS offers OpenSearch, I found Pinecone to be exceptionally developer-friendly for vector storage. It is a managed, cloud-native vector database designed specifically for high-performance AI applications.

Seamless Integration with AWS
What I love about Pinecone is how easily it "plugs" into the AWS ecosystem. Here are three ways I used it:

Serverless Scaling with AWS Lambda: I used AWS Lambda to trigger Pinecone API calls. Since both are serverless, you don’t have to manage any infrastructure. You simply scale your logic and your storage as needed.

Using Amazon Bedrock for Embeddings: I connected Amazon Bedrock (using models like Titan) to generate embeddings from raw data and then stored those vectors directly in Pinecone. This makes building RAG (Retrieval-Augmented Generation) applications much simpler.

Connectivity with LangChain & LlamaIndex: If you are using frameworks like LangChain, Pinecone serves as a robust vector store that can be initialised with just a few lines of code while running on AWS EC2 or ECS.

My Key Takeaways

Speed: The retrieval latency is incredibly low, which is crucial for real-time AI chat applications.
Simplicity: You don’t need to be a database expert to set up an index in Pinecone.
Cost-Effective: With the serverless tier, you only pay for what you use, making it ideal for startups and individual builders.

Conclusion
If you are building AI on AWS, I highly recommend giving Pinecone a try as your vector database. It removes the operational overhead, allowing you to focus on building better AI models.

Using ChatPromptTemplate with create_tool_calling_agent in LangChain (Instead of create_agent)

Yasith wijesuriya — Mon, 29 Dec 2025 15:58:05 +0000

If you’ve used LangChain before, you’ve probably started with create_agent().

It works.
It’s fast.
But once you move into real agent systems (tools, memory, multi-agent, LangGraph), it starts to feel… limiting.

This post explains why I stopped using create_agent() and switched to
ChatPromptTemplate + create_tool_calling_agent instead.

What’s wrong with create_agent()?

Nothing is wrong with it — it’s just too abstract.

Some pain points I hit:

❌ Hard to control the actual prompt

❌ Message roles feel hidden

❌ Tool calls are difficult to debug

❌ Doesn’t scale well for multi-agent or graph-based flows

If you want full control over how your agent thinks, you need to build it more explicitly.

1. ChatPromptTemplate (No hidden prompts)

Instead of relying on built-in agent prompts, I define everything myself.

Why I like this:
- I control system rules
- I control where history goes
- I control where tool reasoning lives

Nothing is magic. Everything is visible.

2. MessagesPlaceholder (This part is important)

Two placeholders matter the most:
- chat_history - Keeps conversation context
- agent_scratchpad- Stores tool calls + reasoning

3. create_tool_calling_agent (The real upgrade)

This agent:

Understands tool schemas
Decides when to call tools
Uses tool results in its reasoning loop Compared to create_agent(), this feels way more intentional. No guessing. No hidden behaviour.

4. AgentExecutor (Run the loop)

Set verbose=True.
Seriously. It saves hours when debugging tool behavior.

When this approach actually makes sense

Use this if you’re building:
- Research agents
- Tool-heavy workflows
- Multi-agent systems
- LangGraph pipelines

If you just want a quick demo, create_agent() is fine.
But for real systems, this pattern scales much better.

Final thoughts

> Agents don’t become powerful because of the model
They become powerful because of:

clear prompts
clear memory
clear tool usage

ChatPromptTemplate + create_tool_calling_agent gives you all three.

(This post is based on my own experience and opinions while working with LangChain.There’s no single “right way” to build agents — this is just the approach that worked best for me.)

Optimized Image Hosting: Why I Integrated Cloudflare R2 with my AWS Backend

Yasith wijesuriya — Sun, 28 Dec 2025 05:53:32 +0000

As a developer, choosing the right storage solution is always a balance between performance, ease of use, and cost. For my latest project, I decided to take a Hybrid Cloud approach by combining the power of AWS with the cost-efficiency of Cloudflare R2.

In this article, I’ll explain why I made this choice and how it benefits my application's architecture.

The Challenge: AWS S3 vs. Cloudflare R2

When I started building my backend on AWS, Amazon S3 was my first thought for image storage. It is the industry standard for durability and security. However, for my specific use case, two factors led me to explore Cloudflare R2:

Egress Fees:
AWS S3 charges for data transferred out to the internet. For an image-heavy site, these "egress fees" can become unpredictable as traffic grows.
Simplicity:
Cloudflare R2 offers a zero-egress fee model and a very simplified setup process.

Why this Hybrid Approach Works

I still believe AWS is the king of compute and AI. By using AWS Lambda to handle my backend logic (and OpenAI API integrations), I get the best performance. But by pointing my image storage to Cloudflare R2, I save significantly on bandwidth costs.

Key Benefits I Noticed:

1. S3 Compatibility: Since Cloudflare R2 is S3-compatible, I could use the standard AWS SDKs to interact with it.


Global Speed: Leveraging Cloudflare’s global network ensures my images load fast for users everywhere without needing to configure complex CDN distributions.

A Look at My Setup

Here is a glimpse of my Cloudflare R2 dashboard where I manage the assets for my project, "Mebius":

"Figure: Managing image assets within the Cloudflare R2 'mebius' bucket."
_

When Would I Stick to Pure AWS S3?

Even though I used R2 here, AWS S3 remains my go-to choice for:

- Deep Integration: If I were using AWS Rekognition for image analysis or SageMaker for ML, keeping data in S3 is much more efficient.


Strict Security: For enterprise apps requiring complex IAM policies, AWS S3's security granularity is unmatched.

Conclusion

Cloud architecture is not about choosing one provider and sticking to it; it's about choosing the right tool for the right job. By combining AWS's computational strength with Cloudflare's affordable storage, I built a solution that is both powerful and cost-effective.

I am excited to keep exploring more AWS services as I grow this project!