DEV Community: elbanic

We fail every day, but that failure becomes the success of our next attempt (feat. Claude Code)

elbanic — Wed, 18 Feb 2026 21:50:32 +0000

Dev Sentinel: Recording Failure So We Learn From It

"We fail every day, but that failure becomes the success of our next attempt."

Failure Repeats

Not long ago, a production incident occurred in a service I operate.

An SQS-triggered Lambda handler processed a single event that spawned an excessive number of DynamoDB transactions in parallel. Eventually, the Lambda hit its timeout limit.

It was a reminder that Promise-based parallelism is not always optimal. IO-bound workloads sometimes require concurrency control or even sequential execution. The code appeared logically sound, yet it failed under real-world resource constraints.

I had a similar experience in a toy project.

A memory-based persistence layer worked perfectly in a local environment, but after deploying to a cloud-based Docker setup, consistency tests failed. The root cause was a stateful design in an environment where container instances could change per request.

Both incidents taught more than simple bug fixes.

Code may be logically correct, but if it ignores the constraints of its execution environment, the system will eventually break.

These lessons clearly help in future projects.

Does Every Failure Lead to Success?

We fail every day. We encounter bugs, make incorrect assumptions, and struggle with performance issues.

Becoming a senior engineer often means accumulating these experiences until they turn into intuition. Operational incidents, scaling failures, and flawed architectural decisions gradually build a sense that "something feels risky."

However, in reality, not every failure translates into future success. Many lessons are never recorded, never revisited, and similar mistakes reappear in different contexts.

The problem is not failure itself. The problem is that failure disappears.

Dev Sentinel: Detecting and Preserving Failure

Dev Sentinel started from this observation. The core idea is simple: Detect failure cycles, record them, and surface them when a similar situation arises again.

I define failure as a cycle: Attempt → Frustration → (Abandonment | Resolution)

Within this cycle, I designed two concepts.

1. Experience Capture

Detect frustration, then track whether the attempt ended in abandonment or resolution, and store it as experience. This is not just logging. It structures the context: what the problem was, where the difficulty emerged, and how it concluded.

2. Active Recall

When similar frustration is detected again, surface relevant past experiences. The goal is not to solve the problem automatically, but to remind the developer: "You have seen something like this before."

To implement this, Dev Sentinel leverages Claude Code's UserPromptSubmit and Stop hooks.

During UserPromptSubmit, it detects signals of frustration (phrasing, repeated attempts, context shifts, etc.).
During Stop, it determines whether the failure cycle concluded and stores the structured experience.

What If Dev Sentinel Had Existed Then?

Looking back at the earlier incidents:

Lambda timeout caused by excessive parallelism
Stateful design failure in a containerized environment

Both were not merely implementation mistakes. They shared a pattern: underestimating environmental constraints.

If Dev Sentinel had existed, then in a later project:

When designing high-concurrency processing logic
When introducing in-memory caching casually

It could have surfaced past experiences as a reminder.

Not to provide the answer, but to ask: "Have you seen this pattern fail before?"

Between Organizational Runbooks and Personal Memory

Many organizations reduce failure through runbooks, SOPs, wikis, and knowledge bases. These are highly effective. However, organizational knowledge does not always align perfectly with individual context.

Dev Sentinel takes the opposite direction:

Capture individual failure experiences first
Accumulate them over time
Return them to the individual when relevant

The expectation is that accumulated personal experience leads to better design judgment, and that judgment ultimately influences projects and organizations.

Closing Thoughts

We fail every day. But failure does not automatically become an asset. Failures that are not remembered are repeated.

Dev Sentinel is not a grand AI system. It is a small mechanism to prevent failure from quietly fading away. Tonight feels like a good time to reflect on what today's lesson was.

git repo: https://github.com/elbanic/dev-sentinel

I Built an AI System Design Interviewer That Runs in My Mac Terminal

elbanic — Wed, 11 Feb 2026 06:39:31 +0000

Recently, I've been noticing something strange.

Developers around me are writing no code.

AI tools help with architecture, implementation, testing, documentation, and even debugging. In many workflows, developers are shifting from coding to directing.

If coding itself is changing, it raises an interesting question:

Will coding interviews eventually lose relevance?

I don't have a definitive answer. But one trend is very clear: companies are putting increasing emphasis on system design interviews.

And those are notoriously hard to practice.

The Real Problem With System Design Interviews

Algorithm problems are solo-friendly. You can grind them on LeetCode anytime.

System design interviews are different.

They require:

Someone to guide the conversation
Follow-up questions that adapt to your answers
Verbal explanation of your thought process
Structured feedback after the session

Building a Virtual System Design Interviewer (Mac CLI)

That led me to build a virtual system design interview tool that runs entirely inside the Mac terminal.

It's built using:

Swift for the native Mac experience
Strands Agents to orchestrate the AI interviewer workflow
MLX Whisper for speech recognition
Qwen3 TTS to generate interviewer voice responses
Claude Sonnet to generate structured interview feedback

The goal wasn't to build another chat-based Q&A bot.

I wanted it to feel like a real interview.

How the Interview Works

Once the session starts, the AI interviewer leads a ~30 minute system design interview based on a selected topic.

The experience is fully voice-driven.

Flow:

The AI interviewer asks a question (spoken via TTS)
You answer using your microphone
The conversation continues dynamically
Every question and answer is transcribed automatically

After the interview ends, the system generates a detailed evaluation report.

Making the Interview Feel Real

I integrated Qwen3 TTS to simulate a speaking interviewer. It's not perfect, but it dramatically improves immersion and forces you to explain ideas verbally - which is critical in real interviews.

The interview can end in two ways:

You manually stop using /end command
The 30-minute timer expires (but you still want to send /end command)

After that, the AI generates structured feedback including:

Strength analysis
Weakness detection
Communication clarity evaluation
Architectural decision quality
Tradeoff reasoning depth

Example of What Gets Captured

Because every interaction is transcribed, you can review:

Where you hesitated
How your design evolved
Whether your explanations were structured
How well you justified tradeoffs

This turned out to be one of the most valuable features for me personally.

Current Limitations

This project is still experimental.

Right now:

❌ Session audio is not recorded
🔜 Local LLM support is planned
🔜 Streaming LLM integration is planned

The current version prioritizes simplicity and low friction setup.

Demo / Source Code

If you're curious or want to try it yourself:

👉 https://github.com/elbanic/sdi.coach

For more on system design interview preparation, check out my AI System Design Tutor that helps you learn system design concepts right within your IDE.

Final Thoughts

It's an uncertain time to be a developer.

The industry is evolving fast. Hiring expectations are shifting. The tools we rely on are changing every year.

But building tools to adapt to those changes feels like the most reliable way forward.

If this helps someone practice system design or sparks new ideas, that alone makes the project worth it.

If you have feedback, feature ideas, or want to contribute, I'd genuinely love to hear from you.

dev.echo - Crushing Communication Debt

elbanic — Wed, 28 Jan 2026 08:04:47 +0000

I am not a native English speaker

Yet I work as a software developer in a professional environment where English is the standard. As AI becomes increasingly integrated into our lives and deeply connected with business processes, the volume of communication has skyrocketed. We are constantly under pressure to produce faster results. While translation technology has made massive leaps and we now live in an era of real-time machine translation, keeping up with the torrent of real-time spoken English remains a significant challenge for foreigners like myself.

The Concept of "Communication Debt"

Let me be clear: my lack of English fluency does not diminish my value. I continue to solve complex business and technical problems, proving my engineering capabilities every day. However, language barriers create friction. When I miss a nuance or fail to answer quickly because I'm processing the language rather than the logic, I accrue what I call "communication debt."

I started thinking about how AI could help solve the specific moments where I fall behind due to language struggles. For example:

Losing Focus: Someone is explaining a concept during a meeting, and my mind wanders for a split second, causing me to lose the thread.
The "Deer in Headlights" Moment: During a review of my software design, I get an unexpected question. I know the answer, but the sudden pressure makes me freeze, and I miss the chance to articulate my thoughts effectively.
Onboarding Struggles: Joining a new team is hard enough, but when the conversation is filled with unfamiliar jargon and acronyms, "ramping up" becomes twice as difficult.

Enter the AI Assistant

What if I had a dedicated assistant to bridge these gaps? Here is how it would handle those scenarios:

Scenario 1 (Losing Focus)
- Assistant: "The current discussion is about [Topic X]. They are currently debating how to optimize the performance of [Topic X]."
Scenario 2 (Design Review Pressure)
- Assistant: "The [Feature X] you implemented last August supports your argument here. Remember, [Feature X] is a robust function already proven in our live service."
Scenario 3 (New Team Jargon)
- Assistant: "The team is discussing [Term X]. This is the core technology for this project, defined as..."

How It Works: `dev.echo`

The role of this assistant is simple but powerful. It listens to meetings in real-time to summarize the conversation, and it maintains a knowledge base of my existing notes, documents, and deliverables. By combining an AI agent with RAG (Retrieval-Augmented Generation), it provides rich, relevant context the moment I need to catch up.

I built this tool specifically for Mac + AWS users. However, if you prefer not to use AWS, you can run it entirely offline using a local LLM (Llama).

You can explore the project here: dev.echo on GitHub

A vital note on compliance: Many companies have strict policies regarding recording or transcribing meetings. Please ensure you fully understand and adhere to your company's security and privacy policies before using this tool.

Closing Thoughts

Language should not be a barrier to demonstrating your technical expertise. This tool is my attempt to level the playing field, ensuring that "communication debt" doesn't stand in the way of great engineering.

I am curious to hear your thoughts. How could this software be used to help you in your daily work?

Agentic Software Engineering (ASE): The Software Development Landscape is Shifting Again

elbanic — Tue, 02 Dec 2025 19:40:00 +0000

Agentic Software Engineering (ASE): The Software Development Landscape is Shifting Again

These days, we often hear stories like "Successful CI/CD Automation with a Coding Agent" or "An MVP for a side project completed in just a few days." It’s becoming clear that AI agents are moving beyond being mere assistants and are diving deep into the actual development workflow.

Industry buzzwords like AI-driven development, AI-DLC (AI-driven Development Life Cycle), and Agentic Software Engineering (SE 3.0) are already rampant. I want to consolidate this rapidly evolving trend from the perspective of a working developer and call it "Agentic Software Engineering (ASE)."

ASE is far more than simple code generation. It signifies a new software development framework where the agent understands and actively participates in the entire software development lifecycle, including testing, refactoring, and deployment. This is beyond the level of a "slightly smarter autocomplete." It’s a concept that encompasses agents which take all code and documentation within a microservice as context, combine various tools, and produce actual change lists.

Since the dawn of software, the fundamental framework-Requirements, Design, Implementation, Testing, and Operations-has remained largely unchanged. These five phases are the bedrock of software engineering and are unlikely to shift easily. The change ASE will bring is the ability to execute this lifecycle with better quality products and with far greater agility.

Yet, few people can clearly answer how this change will happen or what future awaits. We can feel AI transforming the development paradigm, but the pace of change is so rapid that predicting the next step is difficult. With concepts and tools constantly changing every year or two, no one can confidently state "What the Next State is."

The most practical approach we can take is to look back and re-examine what past paradigm shifts actually changed.

What OOP, TDD, and Agile Actually Transformed

What did the major paradigm shifts in software development history actually change?

1. OOP: The Structuring Paradigm that Allowed Software to Withstand Complexity

Before the advent of Object-Oriented Programming (OOP), procedural programming was the mainstream. Conceptualized in the 1970s via Smalltalk and popularized by languages like C++, Java, and C#, OOP became the dominant force. It structured software-which was exploding in complexity, seen in large-scale GUI applications, game engines, and enterprise business systems-using concepts like classes, objects, and messages.

The core insight is that the paradigm shift from procedural to object-oriented allowed us to manage software at a "maintainable level of complexity." OOP provided the absolute foundation for improving software quality and productivity for decades to come.

2. TDD: The Practical Technique that Reduced Defect Density and Boosted Maintainability

As demonstrated in famous case studies by IBM and Microsoft, Test-Driven Development (TDD) showed remarkable results, reducing pre-release defect density by 40–90% upon adoption.

While initial development time saw a slight increase, the benefits-reduced bug count, increased test coverage, improved code structure, and easier refactoring-dramatically enhanced quality and maintainability from a total lifecycle perspective. TDD changed the 'practical technique' of writing code by providing developers with a 'tight feedback loop.'

3. Agile & Scrum: The Methodology that Enabled Organizational Pivoting

The transition from Waterfall to Agile/Scrum was more than a change in development methodology; it was a shift in organizational culture. It moved away from "slow, document-heavy project management" to "iterative experimentation and pivoting by small teams."

Amazon's use of the 'two-pizza team' structure to enable fast, customer-centric experimentation and frequent deployments is a prime example of how Agile can transform an organization's speed of execution.

Here are the key common threads we must note:

OOP changed Code Structure and Reusability.
TDD changed Confidence in the written code.
Agile changed Organization, Process, and Culture.

Together, they enabled us to build software "more robustly, more frequently, and more safely."

The Value and Prerequisites of ASE: A Future That Is Not Free

Just as the shift from procedural programming to OOP required learning a new paradigm and changing team culture, the transition to ASE will be similar. Code will still exist, and the framework of Requirements/Design/Testing/Operations will remain.

What changes is "How quickly we can validate and deploy new ideas."

The core values of ASE are as follows:

Agent-Level Reusability: Managing prompts, workflows, and tool combinations as reusable "agent modules" to elevate the unit of development work.
Maximization of Experimentation Speed: Securing the overwhelming speed to rapidly prototype new ideas using the "1 Human + Agent Stack" combination, pushing them all the way through testing and deployment.

If this combination takes hold, the speed of software development and the density of experimentation are highly likely to scale to a degree incomparable to the past.

ASE is Not an 'Installation,' It's a 'Re-architecture'

There is a critical caveat, however. ASE is not an automatic change handed out simply by adopting a coding agent or following the trend. The environment must first be set up for the agent to do real work.

The prerequisites for an agent to unleash its full potential are:

Rich Context: All source code, design documents, specs, and issues across microservices must be linked and searchable, providing the agent with abundant context.
Perfect Feedback Loop: A unit/E2E testing environment and a sandbox covering almost all code must be established, allowing the agent to self-generate and execute tests.
Automated Workflow: A robust CI/CD pipeline capable of running static analysis and security scans is essential.

In organizations lacking this infrastructure, the agent will ultimately remain just a smarter autocomplete within the IDE. ASE is less about "installing a new tool" and more about re-architecting the entire development environment to be agent-friendly.

Who Will Lead This Change?

Past patterns will repeat. Just as not every company could instantly rewrite all existing code in OOP when it first emerged, ASE will likely follow a similar trajectory.

Some teams won't move beyond treating the agent as a smart coding assistant for implementation tasks.
Other teams will preemptively build frameworks that delegate testing, documentation, refactoring, and deployment to the agent.
These advanced teams will achieve more frequent deployments and tighter experiments with fewer people.

Conclusion

The skeleton of software engineering remains, but the paradigm is shifting once again. This time, the AI agent is at the center of that transformation.

"Agentic Software Engineering (ASE)" is not just a methodology for improving efficiency; it is an opportunity to fundamentally redefine how we build and deliver software to the world.

The question is simple.

"Will you merely observe this era, or will you be the one to design your team's environment?"

Hacking System Design Interviews with an AI Tutor

elbanic — Thu, 27 Nov 2025 23:26:00 +0000

"In an age where AI writes our code/algorithms, what new skills are interviewers starting to ask? 🤔"

My recent conversations with HR specialists have confirmed a significant shift: many companies are intensifying the focus on extended system design interview - an expanded concept of the coding interview. As AI coding agents rise, the differentiability of simple algorithm writing is declining. The core competence for a modern software developer is now the ability to holistically understand the entire product lifecycle and implement a robust and resilient system.

This change led me to a question: "What if a developer could study System Design right within their IDE, guided by an AI Tutor, without leaving their IDE environment?"

The result is the System Design Tutor Project. I've implemented an AI designed to mentor and guide interview preparation, much like a seasoned engineer sitting right beside you.

The AI Tutor's 'Flow of Knowledge'

The System Design Tutor primarily uses a RAG (Retrieval-Augmented Generation) system to assist with interview preparation. Think of it as an organized librarian for system design - it rapidly finds and delivers accurate, in-depth knowledge to the learner.

Sync Workflow: Stocking the Library

The Sync Workflow is how the AI Tutor keeps its knowledge base up-to-date.

Starting with clicking a button (TutorUI), the Sync Data mechanism downloads source materials from a trusted GitHub repository (e.g., donnemartin's system-design-primer).
Llama Embed then converts these extensive documents into small, AI-digestible pieces (embeddings).
Finally, these transformed knowledge chunks are stored in ChromaDB, a vector database, ensuring the tutor can rapidly access and reference the information at any time.

Question Workflow: Personalized Tutoring on Demand

This is the intelligent core that handles the actual interaction.

The user's question (e.g., "Teach me the design of a Real-time Chat System") is received via the Cline.
Cline calls the Tutor MCP Server to query the Tutor Agent, which holds the core AI logic.
The Tutor Agent analyzes the query and initiates the RAG process by searching ChromaDB for all relevant stored knowledge.
By combining the retrieved knowledge with a powerful Large Language Model (like AWS Bedrock Claude), it generates a comprehensive, structured learning guide - not just a simple text dump.
The Cline then wraps this guide to the user, simulating an engaging, conversational learning experience.

A Quick Demo

You will need to install VS Code Extension: SystemDesignTutor before start.

1. User → Cline: "Teach me the design of a Real-time Chat System"

2. Cline → AI Tutor: Utilizes ChromaDB and the LLM to create a structured tutoring session.

3. Cline → User: Instantly receives a systematic guide covering everything from Load Balancer placement to Scalability and Consistency trade-offs.

Full source code is available here: https://github.com/elbanic/SystemDesignTutor

The Unbreakable Foundation: Quality of Knowledge

Using this AI Tutor, I realized that the value of the system is ultimately defined by the quality of its content. While the system-design-primer was an excellent starting point, the potential for this RAG system lies in continuously injecting diverse, verified, and high-quality content.

In an era of generative AI, where incomplete information is easily exposed, our challenge is to focus on foundational principles and robust design. AI tools are here to help us focus on what truly matters.

For more on building RAG systems, check out my guide on building an AWS-based RAG pipeline and context management strategies.

What was your biggest roadblock when learning System Design, and how do you see an AI Tutor changing that learning process for you or your team?

Unlocking Automation with Strands Agents and Bedrock

elbanic — Tue, 25 Nov 2025 01:20:00 +0000

Introducing Strands Agents & The YourDay Project Overview

Strands Agents is a powerful, code-first framework that dramatically simplifies the process of building and operating complex, multi-tool Agentic systems. It provides a robust structure for orchestrating Large Language Models (LLMs) and integrating them with external services.

For more insights on AI agent development and context management, check out my series on context management with Kiro.

To showcase how easily Strands Agents can be utilized, I've built a practical example: the YourDay project. This project combines the framework with the powerful Claude Sonnet model from AWS Bedrock and publicly available APIs (Open-Meteo, GNews) to automatically generate a daily summary and post it to the Obsidian note app. Let's see how Strands Agents makes this complex automation straightforward.

YourDay

Project Overview

YourDay is an AI agent based on the Strands SDK, designed to automate the following tasks:

Collects current weather data via Open-Meteo's APIs.
Fetches the top 10 news headlines from the GNews API.
Uses AWS Bedrock Claude Sonnet 4.5 to transform the collected data into a formatted summary report in Markdown.
Automatically publishes the summary to an Obsidian Vault via the Obsidian MCP (Markdown Communication Protocol) server, automating the creation of the Daily Weather & News note.

System Diagram

API Keys

To run YourDay, the following access permissions and keys are required:

GNews API Key: For collecting news data.
AWS Bedrock Access (Claude Sonnet): For generating the AI summary (requires an AWS account).
Obsidian API Key: For integration with the Obsidian MCP server.

Code Implementation: The Heart of the Agentic System

Now let's implement the code. YourDay is structured to integrate various clients (Weather, News, Obsidian) and uses the Strands Agent to orchestrate the generation of the final Markdown.

1. YourDayAgent: The Core Agent Defining the Workflow

The YourDayAgent class collects data, and then calls the Bedrock model via self.agent.invoke_async(prompt=prompt) to generate summary. Finally, it posts the result to Obsidian.

2. WeatherClient: Open-Meteo Integration

The WeatherClient uses Open-Meteo's Geocoding API to convert the location name into coordinates, and then calls the Forecast API to fetch current weather information.

3. NewsClient: GNews API Integration

The NewsClient uses the provided NEWS_API_KEY to call the GNews top-headlines endpoint, refining and returning a list of top news articles.

4. ObsidianClient: Publishing to Obsidian

The ObsidianClient uses a PUT request targeting the Obsidian MCP server to write the Markdown content to a specific path within the Vault.

5. CLI: Execution Interface

cli.py loads the configuration, initializes the clients and the YourDayAgent, and then calls agent.run() to execute the entire workflow.

Then, the entire workflow will be executed by executing _run_async of YourDayAgent described in step 1.

6. Demo: Agent Execution Result

Let's run the script now. When the yourday run command is executed in the CLI, you can see the agent progress through the steps of loading configuration, initializing clients, and generating the summary. The screenshot below illustrates the successful final posting to Obsidian, including successful communication with Bedrock and the path of the newly created Markdown file.

Scheduling the Task (Cronjob)

To ensure this software runs automatically every morning, you can register it as a scheduler using cronjob (on Linux/macOS).

Here's an example of scheduling this script as a cronjob to run every morning at 7:00 AM. You'll need to adjust the paths to suit your environment.

The complete source package including the code above is available here: https://github.com/elbanic/yourday

Build Your First Agent

Strands Agents makes building automated Agentic workflows a reality by neatly orchestrating complex LLM calls and external service integrations. Through the YourDay project, we can experience the era where agents handle routine daily tasks simply by defining API keys and the workflow.

What's in your daily routine that could be automated with Agent?

Building an AWS-Based RAG Pipeline

elbanic — Fri, 21 Nov 2025 21:02:00 +0000

The Generational AI Blind Spot

We've got a fantastic new AI coding assistant, but when you ask it about your company's proprietary service architecture, it gives you a generic shrug (or worse, a confident hallucination). Why? Because our AI lives on the general internet, and our team's hard-won knowledge is locked behind firewalls and scattered across tools like Confluence, Slack, and internal documentation.

This is a pain point for everyone. As a service developer, we need that agent to understand the nuance of our codebase and team conventions. As a service operator, we can't afford to waste time retracing the steps of a solved production issue - we need the fastest, most reliable fix, instantly.

If you're interested in how to systematically manage context for AI coding assistants, check out my series on context management with Kiro and Context - Bridging the AI Agent's Blind Spot.

User Stories

As a service developer, I want to provide context to my coding agent based on the knowledge my team possesses.
As a service operator, I want to find the right and shortest path to solving similar issues my team has previously encountered.

The Power of Retrieval Augmented Generation (RAG)

The bridge between general LLM intelligence and specific team expertise is Retrieval Augmented Generation (RAG). This article details the construction of a robust, AWS-based RAG pipeline designed to eliminate those "blind spots." I'll walk you through the entire ETL process - extracting changes every six hours, cleaning sensitive info with Comprehend, and indexing for code structure (Neptune) and semantics (OpenSearch).

Our goal is simple: to create an AI agent that is not only intelligent but institutionally smart, giving software developers the right context to AI agents.

Architecture the RAG Data Pipeline

Building a RAG system that delivers high-quality, up-to-date answers requires a disciplined and automated data pipeline. Our AWS-based RAG system is designed to periodically collect diverse internal materials - from wikis to Slack threads - and process, embed, and index them for the question answering engine.

The core of this system is an ETL (Extract, Transform, Load) workflow, triggered every six hours by AWS Step Functions or whenever the source data is updated.

Extract: Changes are incrementally synced from diverse data sources and saved to an S3 raw-data bucket.
Transform/Process: Sensitive information (like PII) is detected and masked using Amazon Comprehend. Document formats are standardized and saved to an S3 processed-data bucket.
Load/Index: Embeddings are generated, and the data is indexed in two places: Amazon OpenSearch Service for vector search and Amazon Neptune for code structure and relational information. If you'd like to see what these two databases can do, check out this article on context management.

Data Sources and Incremental Ingestion

The ingestion stage focuses on efficiency by only processing what has changed. We pull from everywhere: internal wikis (Confluence, SharePoint), documents (Word/PDF), and collaboration tools (Slack/Teams message threads).

To handle large datasets efficiently, AWS Step Functions and Lambda functions/AWS Glue jobs are scheduled (via EventBridge) to incrementally sync only the content that has been changed or newly created since the last collection. This is critical for keeping the knowledge base fresh without wasting resources. We use APIs (like Confluence or Slack) to fetch content and store the results in an S3 raw-data bucket.

Preprocessing and Transformation for Quality

The quality of your RAG answer is directly proportional to the quality of your source documents. This stage ensures a clean, consistent knowledge base.

Filtering and Cleansing: We use Amazon Comprehend's PII detection to automatically identify and mask sensitive items (names, emails) before they're embedded.
Format Standardization: We convert various formats (Word, PDF) into plain text using tools like AWS Textract.
Chunking: For optimal retrieval, the cleaned content is split into smaller, meaningful chunks (typically 200–500 tokens). Each chunk gets a unique ID and reference metadata, which is stored in the S3 processed-data bucket.

Dual Indexing: Vectors and Graphs

Effective RAG often requires more than just semantic similarity. By using two specialized databases, we can enrich the context with both meaning and structure.

Embedding Generation: We call models like Titan Text Embeddings models via Amazon Bedrock to generate the vector embedding for each document chunk.
Vector DB (Semantic Search): The generated embedding vectors are stored in the k-NN vector index of Amazon OpenSearch Service. This enables real-time vector search to find the most semantically similar document chunks relevant to the user's query.
Graph DB (Structural Search): Content with relationships - like source code, configuration files, or relational data - is stored in Amazon Neptune. This allows the RAG query to perform a knowledge graph query to retrieve associated entities, enriching the context with structural understanding.

Query Processing and Serving

This is where the user's question becomes a relevant, cited answer.

Query Reception (MCP server): The query hits an AWS API Gateway endpoint, which is forwarded to a backend Lambda function.
RAG Retrieval: The Lambda function calls the Bedrock RAG API (/retrieveAndGenerate). It retrieves the top $k$ similar chunks from OpenSearch and simultaneously queries Neptune for additional structural context.
Response Generation: All of this retrieved context (the document text chunks and key entity information) is passed in a single prompt to Claude Sonnet 4.5 (Actually, you can use any LLM you want). The model synthesizes this evidence and the original question to formulate the final response. The response must include the source of information to ensure trustworthiness.

The Evolution of Context Engineering

We're still iterating to find the optimal way to develop a highly accurate agent. There will always be a gap when it comes to fully keeping up with human context and nuance. However, by defining tasks, breaking them into manageable units, and building a team-specific knowledge base that serves as a context engine, we can continuously evolve our agents to be more accurate, more reliable, and ultimately, more valuable to the developers and operators who rely on them.

Context - Bridging the AI Agent's Blind Spot

elbanic — Wed, 19 Nov 2025 01:46:00 +0000

Context - Bridging the AI Agent's Blind Spot

In the last post, I introduced how to leverage Kiro's Steering feature to automatically record task completion history and use it as persistent context for subsequent work. This time, we'll dive into a more technical approach to feeding context to the AI agent, focusing on securing code quality and stability.

For those interested in enterprise-scale context management, I've also written about building an AWS-based RAG pipeline that can handle institutional knowledge at scale.

AI agents generally tend to do only what they are asked. If you request a modification to a specific function in the source code, the agent typically changes just that part. The problem arises when the modified function is referenced in multiple other locations. While agents sometimes figure out related code to adjust, they often fail to consider the full side effects.

Ultimately, unless the developer provides a perfect prompt that accounts for the entire code flow and includes all relevant parts, it's difficult to produce high-quality code in large projects. This issue grows exponentially with the project's size.

Limitations of Existing Context Provisioning Methods

To address this, several services have emerged that turn the entire code repository into a Knowledge Base, delivered to the agent via RAG (Retrieval-Augmented Generation) or an MCP (Model Context Protocol) Server.

Claude Context: This is an open-source VSCode plugin for Anthropic's Claude code agent. It embeds code chunks and stores them in a vector database. When a query is made, it uses semantic search to find relevant code, inserting it into Claude's prompt context. This allows it to handle large codebases (millions of lines) by delivering only the necessary information.
Codebuddy: An AI coding helper for JetBrains IDEs and VSCode. Like Cursor, it scans and stores the entire codebase in a vector database, enabling the agent to answer questions about the repository.

However, these approaches commonly require cloud access or external Database. This poses a constraint for organizations with restrictive software usage policies or those in isolated network environments (e.g., air-gapped systems). Furthermore, calling LLMs is not cheap, as costs accrue based on usage.

These issues could be solved with a local knowledge base that can be easily built and used as open-source. This is the motivation behind the local context knowledge base I've developed, which I introduce here.

Local Context Bank - Converting the Local Codebase into a Knowledge Graph

One of the key methodologies in context engineering is 'retrieving relevant knowledge from a vector store.' GraphLoom is being developed with the goal of implementing this method, optimized for the local environment.

The core objectives GraphLoom aims to achieve are:

Independent Operation: No need to call external APIs or remote MCPs, allowing usage in isolated environments.
Multi-Source Context: Ability to provide semantic context from various documents, including code, specs, design decisions, and implementation summaries.
Modular Context: Support for adding context from independent code packages (e.g., back-end, front-end, microservices).

Local Context Bank combines two core technologies to provide semantic search, impact analysis, and specification-code synchronization for a local codebase.

Design

1. Modeling the Codebase as a Graph: Understanding Relationships for Impact Analysis

Local Context Bank stores the source code as a graph of relationships between functions, classes, and modules. When a part is modified, it traverses this graph to find all related nodes, notifying the agent of potential impacts.

This approach is crucial for inferring side effects. For example, if you change function A, graph traversal identifies other functions that call it, related data structures, etc., prompting the agent to review and modify them together.

It constructs a Code Knowledge Graph (CKG) by using Abstract Syntax Tree (AST) parsing (via tools like Tree-sitter) and LSP (Language Server Protocol) to extract dependency relations (function calls, variable references, import relationships). This CKG makes it easy for the agent to quickly grasp the scope of potential changes.

2. Contextual Retrieval via Vector DB: Semantic Search for Intent

In addition to the graph structure, Local Context Bank embeds the project's code and documentation and stores them in a local vector database. When the agent needs information, it can query naturally, and the system retrieves the semantically relevant parts using semantic search.

For instance, if the agent asks, "Find the configuration related to this function," the vector DB finds code snippets or specs with high similarity and provides them as context. This RAG-based approach helps overcome the context window limitations of the LLM by retrieving only the relevant code from large repositories. Since it avoids reading all files for every request, it also offers cost-saving benefits in a local environment.

Demo: GraphLoom Workflow

I named this software gloom. Here are the steps for indexing the Local Context Bank project locally to show how the Agent utilizes the MCP.

Initialization: Set up the GraphLoom environment and local vector DB.

Workspace Creation: Specify the project directory for analysis.

Indexing & Graph Build: Parse code and documents to build the knowledge graph and vector DB.

Impact: You can query related nodes for a specific code change through CLI.

MCP integration: The Agent queries the context using natural language.

Results: Agents can utilize the provided output as context.

Expanding the Potential of Coding Agents

I built and used this software myself, and it reduced the time it took to provide context to coding agents. Software developers using coding agents will need context management capabilities when conducting software engineering. If you can't use a commercially available service, you can implement something similar like this to improve the agent's efficiency.

The ultimate goal of (Local) Context Bank is to reduce the time developers spend manually managing context and encourage agents to generate high-quality code that considers side effects.

Context Management with Kiro (Part 1)

elbanic — Tue, 18 Nov 2025 06:35:00 +0000

The Weight of Context for Humans and AI Coding Assistant

With the advancement of AI technology, coding assistants have become deeply embedded in the daily development environment. But, let's momentarily set aside AI and consider how vital Context is when we work with people. If a colleague abruptly asks about a single line of code without any background knowledge of the previous discussion or project history, you'd likely feel just as lost.

Just as collaboration is impossible without context, context is everything in collaboration with an AI coding agent. The agent must accurately understand the project's goals, architecture, and existing codebase to serve as a truly "smart" partner.

I've been using spec-driven development tools like Kiro (or VSCode + Cline) for my personal coding projects. While these tools offer incredible productivity, the process of constantly prompting and providing context for better code quality led to the following pain points:

Repetitive Work: Having to re-enter the project's relevant context every time I start a new session.
Token Limits: Issues where the agent's token limit was hit during long sessions, causing previous conversation history to be cut off.

In this article, I will explain how I overcame these inefficiencies and introduce my method for systematically managing context for the AI coding assistant throughout my projects.

The Automation Strategy for Recording and Utilizing 'Memory'

To solve the recurring problem of context entry, I devised a method to automatically select and include highly relevant context. This involves leveraging the Kiro Steering or Cline Rule features.

About Kiro Steering and Cline Rules

Kiro Steering: This is a core feature of Kiro, the AWS-provided, agent-based IDE. It's a system that permanently guides the AI agent's behavior by providing continuous project knowledge (e.g., coding conventions, architectural decisions) stored as Markdown files in the .kiro/steering/ directory. This grants the agent an Institutional Memory of the project.

Cline Rule: A feature provided by the VSCode extension Cline, which offers system-level guidelines or preferences. These are typically stored as Markdown files in the .clinerules/ directory (or a file) at the project root, delivering project-wide context and rules to the agent to ensure consistent behavior.

Practical Application: 'Memory Management' Instructions in the Steering File

I leveraged these features to give the agent specific Instructions demanding continuous context management for the project. In a real Kiro project, the .kiro/steering/implementation.md file specifies the following rules. These instructions are key to ensuring the agent records new implementations and is mandated to reference past records before starting the next task.

Upon Implementation Completion (Mandatory Recording)

MUST: Write an implementation summary or task completion status as Markdown and save it to the .kiro_context/ folder. (Purpose: Record implementation history)
SHOULD: Write design decisions or technical specifications as Markdown and save them to the .design_decision/ folder. (Purpose: Record design decisions)

Before Starting Work (Mandatory Reference)

MUST: Read the .kiro_context/*.md files and understand the entire context. (Purpose: Reference previous implementation)
SHOULD: Refer to the .design_decision/*.md files to grasp the design intent. (Purpose: Understand design intent)

Context Recording Example in the GraphLoom Project

Through these instructions, the agent is instructed to summarize the implementation after completing each task and save it to the .kiro_context folder. If you check the GraphLoom repository and .kiro_context, you can see the files the agent is actually recording:

.kiro_context/
├── task_1_implementation_summary.md
├── task_2_implementation_summary.md
...

Consequently, when the agent needs to reference previously implemented code, there is no need for me to explain the details repeatedly. The agent can understand the project's current status and implementation history simply by reading the summary files it recorded itself.

Interestingly, even when these files are included in the Rule or Steering instructions, the agent doesn't always read every file. The agent appears to make its own relevance judgment on the current task and selectively reads only the most necessary context files. Therefore, writing clear and intuitive titles and content for the summary files is a crucial factor in helping the agent with its 'selective knowledge utilization.'

Faster, More Accurate Development

By establishing this system for context management, the time spent inputting context during development has been drastically reduced. The greatest advantages are:

Agent Perspective: The agent maintains continuous project memory without worrying about token limits, and it performs subsequent tasks faster and more accurately because there's no need for repetitive prompting of detailed previous work.
Developer Perspective: The context files provide automated project documentation, allowing both the agent and me to quickly verify past implementations.

Ultimately, this method is more than just a developer convenience; it's a core strategy that advances AI agent collaboration from 'one-time conversations' to 'sustainable knowledge sharing.'

In the next post, I'll introduce GraphLoom, a tool I personally developed for context management, and provide specific examples of its use. For a broader perspective on building knowledge bases for AI systems, you might also be interested in my article on building an AWS-based RAG pipeline.

DEV Community: elbanic

We fail every day, but that failure becomes the success of our next attempt (feat. Claude Code)

Dev Sentinel: Recording Failure So We Learn From It

Failure Repeats

Does Every Failure Lead to Success?

Dev Sentinel: Detecting and Preserving Failure

1. Experience Capture

2. Active Recall

What If Dev Sentinel Had Existed Then?

Between Organizational Runbooks and Personal Memory

Closing Thoughts

I Built an AI System Design Interviewer That Runs in My Mac Terminal

Recently, I've been noticing something strange.

The Real Problem With System Design Interviews

Building a Virtual System Design Interviewer (Mac CLI)

How the Interview Works

Making the Interview Feel Real

Example of What Gets Captured

Current Limitations

Demo / Source Code

Final Thoughts

dev.echo - Crushing Communication Debt

I am not a native English speaker

The Concept of "Communication Debt"

Enter the AI Assistant

How It Works: dev.echo

Closing Thoughts

Agentic Software Engineering (ASE): The Software Development Landscape is Shifting Again

Agentic Software Engineering (ASE): The Software Development Landscape is Shifting Again

What OOP, TDD, and Agile Actually Transformed

1. OOP: The Structuring Paradigm that Allowed Software to Withstand Complexity

2. TDD: The Practical Technique that Reduced Defect Density and Boosted Maintainability

3. Agile & Scrum: The Methodology that Enabled Organizational Pivoting

The Value and Prerequisites of ASE: A Future That Is Not Free

ASE is Not an 'Installation,' It's a 'Re-architecture'

Who Will Lead This Change?

Conclusion

Hacking System Design Interviews with an AI Tutor

The AI Tutor's 'Flow of Knowledge'

Sync Workflow: Stocking the Library

Question Workflow: Personalized Tutoring on Demand

A Quick Demo

The Unbreakable Foundation: Quality of Knowledge

Unlocking Automation with Strands Agents and Bedrock

Introducing Strands Agents & The YourDay Project Overview

YourDay

Project Overview

System Diagram

API Keys

Code Implementation: The Heart of the Agentic System

Build Your First Agent

Building an AWS-Based RAG Pipeline

The Generational AI Blind Spot

The Power of Retrieval Augmented Generation (RAG)

Architecture the RAG Data Pipeline

Data Sources and Incremental Ingestion

Preprocessing and Transformation for Quality

Dual Indexing: Vectors and Graphs

Query Processing and Serving

The Evolution of Context Engineering

Context - Bridging the AI Agent's Blind Spot

Context - Bridging the AI Agent's Blind Spot

Limitations of Existing Context Provisioning Methods

Local Context Bank - Converting the Local Codebase into a Knowledge Graph

Design

1. Modeling the Codebase as a Graph: Understanding Relationships for Impact Analysis

2. Contextual Retrieval via Vector DB: Semantic Search for Intent

Demo: GraphLoom Workflow

Expanding the Potential of Coding Agents

Context Management with Kiro (Part 1)

The Weight of Context for Humans and AI Coding Assistant

The Automation Strategy for Recording and Utilizing 'Memory'

Practical Application: 'Memory Management' Instructions in the Steering File

Context Recording Example in the GraphLoom Project

Faster, More Accurate Development

How It Works: `dev.echo`