DEV Community: Gaurav Gaur

Gen AI Isn't Solving the Problem Most Development Teams Actually Have

Gaurav Gaur — Tue, 16 Jun 2026 11:19:44 +0000

It was an afternoon when one of our reconciliation flows started throwing NullPointerExceptions in production. The fix, once we found it, was two lines. Finding those two lines took nearly six hours. Three engineers and endless log grepping. Tracing through an integration application with JSF UI that predates most of the libraries we take for granted today. No modern APIs exposed. No clean service boundary to isolate the problem. Just a chain of legacy integration points that required someone to hold the full mental map of the system in their head to understand what was breaking where.

Six months into using Gen AI extensively in software delivery, I keep coming back to those production bugs. They still feel like the norm, not the exception.

The Conference Demo vs Your Monday Morning

The keynote demos are genuinely impressive. The presenter opens a laptop, talks to an AI assistant, and a fully functional web service materialises. The crowd cheers. It looks like the future.

And it is — for some problems. But most of the development work I do, the work I actually live in, does not look like spinning up a new service. It looks like debugging a reconciliation failure buried inside a legacy integration platform. It looks like a cron batch job that failed silently and left no useful trace. It looks like a data transformation bug hidden four systems deep in a component last documented sometime around 2018 by somebody who no longer works for the organisation. There are still systems working with XMLs.

The demos are not wrong. They just solve a problem that I rarely have.

The Productivity Paradox

After six months with Gen AI tools, I have landed on something that feels uncomfortably true:

Developers were never slow at writing code. They were slow at figuring out **where* to write it.*

Think about the last bug you fixed. How long did the actual code change take once you understood the problem? Five minutes? Ten? The hours were spent elsewhere. Reading unfamiliar code. Tracing execution paths. Building a mental model of the system. Validating assumptions. Eliminating false leads. The bottleneck was understanding - cognitive overhead. Not typing.

Gen AI has undeniably accelerated code generation. Boilerplate code, unit tests, documentation drafts, simple feature implementations — all of these can now be produced significantly faster than before. I've seen feature development accelerate in my own team. But the dominant bottleneck in most enterprise systems was never code generation. It was comprehension. The hidden dependencies. The opaque patterns that only become visible after somebody has spent enough time on a codebase to develop intuition about it. That part stayed exactly where it was.

We got dramatically faster at the part that was already manageable.

A Real Example: When Context Lives Outside the Code

Recently, I investigated a reconciliation issue where transactions appeared incorrect on the UI. The AI was genuinely useful at first. It quickly highlighted suspicious code paths within the application and suggested plausible root causes.

None of them were entirely correct. After several hours we eventually traced the issue through an integration database, a legacy transformation component, and an upstream system that communicated through this legacy integration layer. The component predated modern API standards and I could not write any MCP or connect to its database in any way.

The final code change took minutes. Understanding the path the data had taken through the system consumed most of the day. The interesting lesson was not that the AI failed. The interesting lesson was that the AI only had visibility into part of the problem. The knowledge required to solve the issue lived across multiple systems, integration layers, historical design decisions, and several teams.
The real challenge here was reconstructing context.

Gen AI Amplifies What Already Exists

One thing surprised me during the last six months. Gen AI amplifies whatever it touches. Not just the good parts.

If your applications are well documented, integration contracts are clear, and operational practices are mature, AI can deliver meaningful productivity gains. The model has enough context to reason effectively.

If your environment contains technical debt, unclear ownership boundaries, fragile deployment processes, or institutional knowledge locked inside a handful of experienced engineers, AI surfaces those weaknesses just as effectively. I've seen AI-generated changes pass every unit test and then fail immediately when they hit an integration layer. The problem wasn't the generated code. The problem was that the integration contract existed only in somebody's head. That issue existed long before Gen AI arrived. AI simply made it impossible to ignore.

Gen AI has become a forcing function for problems that were always there. Pipeline complexity, team silos, missing runbooks — they did not appear with AI. In many ways, Gen AI acts like an amplifier. Good engineering practices become more valuable. Weak engineering practices become more visible. If you were already carrying process debt, Gen AI lands in that environment and makes the debt harder to defer.

This observation aligns closely with findings from the DORA research programme. The report has increasingly highlighted that AI's impact is heavily influenced by the surrounding engineering system rather than the tooling itself.

Maybe We've Been Measuring The Wrong Thing

Most AI productivity discussions focus on output.

Lines of code written.

Features completed.

Pull requests merged.

Like someone was telling me that they have written a service while they were on a train journey. But enterprise software rarely fails because developers could not type quickly enough. It fails because knowledge is fragmented. Teams cannot trace dependencies. Integration contracts are undocumented. Critical operational knowledge exists only inside someone's head.

If Gen AI is teaching us anything, it may be that software engineering productivity has always been a knowledge-management problem disguised as a coding problem. The more I work with AI, the more I suspect that this distinction matters.

The Real Unlock: Context, Not Code

I have been trying to articulate where productivity genuinely improves. The best answer I have today is simple:

Productivity is fewer hours spent chasing failures you cannot explain.

The contained problems — bugs where the scope is clear and the relevant context is available — are noticeably faster to solve. Describe the issue, provide the relevant code and prompt right. The AI often gets you surprisingly close to the answer.

The difficult problems are different. A failure appears in one system but originates three hops upstream. The relevant context is spread across source code, operational dashboards, integration platforms, databases, support tickets, and undocumented assumptions accumulated over years. AI cannot reason about information it cannot access.
The real opportunity is not generating more code. It is about It is helping developers navigate complexity and reducing the search space. Surfacing hidden patterns. Making system behaviour more understandable.

That is fundamentally a context problem. Not a coding problem.

The Bottleneck Simply Moves

Another pattern I've observed is that AI often shifts bottlenecks rather than eliminating them. Feature development that previously took several days may now take hours.

But deployment approvals, compliance reviews, security assessments, testing processes, release windows, and governance controls have not accelerated at the same pace. The queue simply moves downstream. Code gets generated faster. Organisational throughput does not necessarily improve.

In highly regulated environments, the distance between "code written" and "code deployed" often remains the dominant constraint. If anything, AI has made that gap more visible. The development teams can move faster. The surrounding delivery system frequently cannot.

It is not uncommon that releases are moved because of end of month activities or business peak activities.

The Next Enterprise Problem: Agent Sprawl

Another challenge is already emerging. For years, organisations struggled with tool sprawl. Now we are creating agent sprawl.

One team uses GitHub Copilot. Another uses Claude and someone using cursor. Someone else is experimenting with MCP servers, vector databases, and workflow orchestration platforms.

Teams are building custom internal agents to help the platform. Every agent develops its own context boundary, permissions model, knowledge source, and operational behaviour. Over time this begins to look familiar. We have seen similar patterns before with microservices, cloud platforms, and integration technologies. Multiple solutions for same problem with slight variations.

Without standardisation, governance, and ownership, complexity grows faster than value. Access to AI is no longer the limiting factor. Integration and consistency increasingly are.

What This Means For Engineering Teams

The AI story is becoming less about individual tools and more about organisational architecture. The teams that will benefit most from Gen AI are unlikely to be the teams with the most sophisticated prompts.

They will be the teams that can feed AI enough and have invested in:

Strong observability
Well-defined integration contracts
Discoverable architectural knowledge
Consistent engineering practices

In other words, the foundations that good engineering organisations have always needed.

The uncomfortable reality is that AI cannot compensate for missing context. It can only amplify whatever context already exists. And I do not have a clean answer for how you achieve that in a large organisation. I am not sure anyone does yet.

Final Thoughts

I still believe Gen AI represents a genuine shift in software engineering. I am faster than I was a year ago. My team is faster than it was a year ago. Many activities that once felt repetitive now take a fraction of the time.That is real progress.

But I still think about that production issue. Six hours searching for two lines of code. Three engineers on a call. A JSF application none of us originally designed. A chain of integration points that only became understandable after piecing together knowledge from multiple systems and multiple people.

Gen AI did not change that day. It could not. The dominant problem was never code generation. The problem was that the knowledge required to understand the system was fragmented and largely invisible.

The biggest productivity gains over the next few years may not come from better models. They may come from making systems more legible. Better observability. Clearer integration contracts. Architectural decisions that live somewhere other than inside somebody's head. Gen AI did not create the need for those things. It simply made it much harder to pretend they were optional.

Have you seen a similar pattern in your organisation? Where AI works brilliantly for contained problems but struggles at integration boundaries? I'd be interested to hear what has worked — and what hasn't — in your teams.

References

The 4 Signals That Actually Predict Production Failures - Part 2

Gaurav Gaur — Wed, 22 Apr 2026 06:24:58 +0000

A practical guide

In the first part, I covered the two initial signals to diagnose that something is wrong:

Latency
Traffic

Those two alone explain a surprising number of production incidents. But they don't explain everything. Rising latency tells you a problem is developing. Traffic tells you what the system is dealing with.

I mentioned two more signals:

Errors
Saturation

These two tell you something more important - whether the system is approaching failure. And this is where monitoring becomes truly operational. I will cover those two signals in this blog. Let us start with Errors.

Errors - The most misunderstood signal

Many teams think error monitoring is simple. It is about counting failures. Raise an alert when they increase. In practice, error metrics are rarely that straightforward.

The first mistake teams make is treating all errors as equal. They are not. Some errors are expected and some errors are harmless. Others indicate an outage in progress. Monitoring must differ between them. Otherwise alerts become noise. And noisy alerts get ignored, which defeats the entire purpose. I have seen production systems where engineers simply muted error alerts because they fired every few hours.

Error rate is more important than error count

Raw error counts are misleading. What do you think - ten errors per minute might be catastrophic or irrelevant?

It depends on traffic. If you process:

100 requests per minute → 10 errors = disaster
100,000 requests per minute → 10 errors = background noise

Error rate is what matters. A simple production alert looks like this:

It means alert when:

Error rate > 2%

This works far better than static thresholds because it scales automatically with traffic.

4xx vs 5xx - Critical distinction

One of the most common monitoring mistakes is combining 4xx and 5xx errors. They represent completely different problems. Let me talk through them.

5xx errors

These indicate system failures:

Exceptions
Timeouts
Dependency failures
Resource exhaustion

5xx errors should almost always trigger alerts. They mean the system is failing users.

4xx errors

These usually indicate client behaviour:

Invalid input
Authentication failures
Missing resources

Most of the time, 4xx errors should not page engineers. But they should still be monitored. Their spikes often reveal integration problems.

Partner systems misbehaving
Clients sending unexpected requests
Sometimes bots discovering your APIs

I once saw a system where 40% of traffic suddenly became 401 responses. Nothing was broken in my service. A client service had deployed a change with an incorrect token configuration. The service was healthy. The integration was not. Without separate 4xx monitoring we would never have noticed.

Error budget thinking

Once services mature, error monitoring becomes less about incidents and more about error budgets.

Instead of asking "Did we have errors?" You ask “Did we exceed acceptable failure levels?”
Example SLO:

99.9% success rate

That allows:

0.1% failure

Error budgets prevent overreaction to minor fluctuations. Without them, teams end up firefighting dashboards instead of protecting user experience. In most post-mortems, latency and errors are symptoms. Saturation is usually the cause. Let us move to the next indicator – saturation.

Saturation — Where failures actually begin

If latency is the early warning signal, saturation is the root cause signal. Most production outages start with a resource limit somewhere. I am not necessarily talking about CPU or memory. I am talking about less obvious resources like thread pools, connection pools, queue consumers, file descriptors, and rate limits.

These limits quietly fill up until requests start waiting and then timing out. Then they start failing. By the time error rates increase, saturation has usually been happening for a while.

CPU and Memory - Necessary but not enough

Infrastructure metrics still matter. They just don't tell the whole story.
Monitor:

CPU utilization
Memory usage
Disk I/O
Network throughput

Example:
rate(container\_cpu\_usage\_seconds\_total[1m])

and:
container\_memory\_usage\_bytes

The Metrics that break systems most often

As I mentioned in my previous blog, you need effective metrics. In this section I will list a few metrics that can prove useful.

Connection pool usage.

Monitor connection pool usage. When a connection pool fills up - requests queue internally, latency increases, timeouts appear, and errors follow.
In this scenario CPU can still be 30%. Memory can still be healthy. The service still looks "green." Except users are waiting seconds for responses.

Example — Monitoring a connection pool

Micrometer automatically exposes Hikari metrics:

hikaricp_connections_active

hikaricp_connections_idle

hikaricp_connections_pending

The critical one is:

hikaricp_connections_pending

If pending connections increase steadily, saturation is approaching and action is needed.

Kubernetes saturation signals

Container platforms introduce new saturation points. An important metric to monitor is:

kube_pod_container_status_restarts_total

Restarts indicate instability.

And:

container_cpu_cfs_throttled_seconds_total

CPU throttling causes latency spikes even when CPU usage looks normal. That one surprises a lot of teams.

Dependency Metrics — The missing visibility layer

Most services are only as reliable as their dependencies – databases, caches, APIs, queues, and third-party integrations.

When dependencies slow down, your service slows down. But if you only monitor your service, you won't see the cause. You only see the symptoms. Dependency metrics close that gap. Without them, incident investigations turn into guesswork.

Downstream Latency Metrics

Every external call should have a latency metric. Even if the dependency is "reliable." Especially then.

Simple example:

Timer.Sample sample = Timer.start(registry);

Response response = paymentClient.process(request);

sample.stop(
 registry.timer("payment.api.latency")
);

During incidents, this metric often points directly at the problem.

Dependency error metrics

Track dependency failures separately.

Example:

payment_api_errors_total

This helps answer:

Are we failing… or is the dependency failing?

That distinction saves time during incidents.

Database metrics — Where many incidents begin

Databases rarely fail suddenly. They slowly degrade. I have seen these follow a pattern. First queries take slightly longer. Then pools begin filling. Then request latency increases. Then timeouts appear.

The progression is almost always the same. Which means the signals are predictable.

Query latency

Slow queries often trigger cascading failures.

Track:

db_query_duration_seconds

Watch percentiles and not averages. The same rule applies as service latency.

Connection pool usage

Database pools deserve dedicated dashboards.

Track:

db_connections_active

db_connections_idle

Pool exhaustion is a classic outage pattern.

Lock contention

Lock waits produce unpredictable latency spikes, especially under load.

Important metrics include:

Lock wait time
Deadlocks
Blocked queries

These metrics explain incidents that otherwise look random.

Queue metrics — The early warning

Event-driven systems fail differently and have a different pattern. Instead of request latency increasing, queues begin filling. Messages accumulate silently. Until delays become visible. Queue metrics often detect issues earlier than service metrics.

Queue Depth

Example metric:

messages_available

If depth increases steadily, it means something is wrong.

Either:

Producers too fast
Consumers too slow
Dependencies degraded

Queue depth is one of the most reliable early warning signals in distributed systems.

Consumer Lag

For streaming systems, lag is critical.

Example:

kafka_consumer_lag

Lag increasing means consumers cannot keep up. Eventually processing delays impact users.

Pattern worth recognizing

After enough incidents you start recognizing patterns. One of the most common looks like this:

Dependency latency increases
Connection pools fill
Request latency increases
Queues grow
Errors appear

When you see that progression on dashboards, you already know the story before investigation begins. Good monitoring turns incidents into recognizable shapes. And recognizable shapes reduce stress during outages.

Experienced engineers eventually learn that most outages are not mysterious. They follow patterns. Because uncertainty is what makes incidents difficult.

Not complexity.

I hope you find these useful, I will continue the discussion in the final blog of this series.

The 4 Signals That Actually Predict Production Failures - Part 1

Gaurav Gaur — Tue, 24 Mar 2026 13:52:23 +0000

Monitoring is not a dashboard only problem

In the last couple of years, I have moved across a few product teams. Every time I walk into an engineering team and ask how monitoring works. I get a standard response.

There is a dashboard

Usually, Grafana. Sometimes Kibana or Splunk. Team members have passed me links buried in wiki pages and nobody updates them anymore.

But dashboards don't make a system observable. I have worked on large scale cloud projects. All of them have dashboards but still struggle to answer basic questions during an incident.

What can fail?

The answer to this question can change how you think about monitoring. To me, monitoring is a system thinking problem. Any cloud solution that I work on is not a single application anymore. With cloud solutions, I am talking about API Gateways, event streams, microservices, containers, managed cloud services, and external integrations.

Something is always lagging somewhere, and all users are going to tell you is "The system is slow".

It might mean that either an API timing out or queue is backing up, or a container restarting or network path degraded.

After years of working on enterprise solutions, I can tell you that bugs do not cause major production incidents. They are often caused by lack of visibility, meaning missing the right metrics.

In this article, I try to put down some of the monitoring metrics that I find useful. These can help you diagnose incidents and keep the system reliable.

Monitoring vs Observability – Why it matters?

I can say that monitoring used to mean one thing:

Is the server alive?

If CPU was below 80% and memory was below 70%, everything looked fine.

Modern systems are not the same. You can have a healthy-looking infrastructure, a normal CPU utilisation, a healthy memory graph and can still have a production outage. I have seen services returning timeouts for hours while dashboards showed everything green. That can happen as traditional monitoring focuses on resources while failures can occur during interactions between services.

This is where observability comes in. Monitoring answers: "Is something wrong?" and observability answers: "Why is it wrong?" You need both.

What all you need?

In practice, you need three things. Metrics are used to detect problem, logs to explain errors and traces to discover latency. If your metrics are wrong, you would never know something is failing. And if you don't know something is failing, you never check logs and traces. Which is why metrics are the entry point of any investigation.

Typical monitoring challenges

Most of the time, teams don't have strategies for monitoring. It is the last backlog item to be picked up before the final production release. One service team adds a dashboard, another adds alerts and a third team introduces a different naming convention.

Six months down the line, you get duplicate metrics and inconsistent naming. There are no standard dashboards and alerts that nobody trusts. Eventually, teams ignore alerts, stop relying on monitoring and fall back to guesswork. That is a dangerous place to be.

One pattern I have seen repeatedly is metric explosion without clarity. A service exposes 400 metrics, and nobody knows which one matters.

Good monitoring is not about collecting more metrics. It is about collecting the right metrics. A production-ready service rarely needs more than 10–20 core metrics and a small number of critical alerts. Everything else is investigation detail. Not operational signal.

Four signals that every service needs

I recommend every service must expose below four signals. I sometimes refer to them as minimum survival metrics.

Latency
Traffic
Errors
Saturation

They can help you diagnose most production incidents. Let us discuss them.

Latency – Earliest signal

Latency is usually the earliest signal that something is going wrong. Long before error rates increase… Long before services fail… Latency begins to drift upward.

Slow systems become broken systems eventually. If you only monitor errors, you will detect problems late. If you monitor latency, you can catch problems early.

Why averages are false indicators?

Average latency is one of the most misleading metrics in production systems, however, many teams are still monitoring it. Consider a scenario where 990 requests complete in 40ms and 10 requests take 4 seconds. It means an average latency of about 80 ms. But a user experiencing the slow requests would strongly disagree.

Observing percentiles can solve this problem. You should track all below:

p50 for a typical user experience
p95 for degraded experience, and
p99 for worst-case experience

If p99 starts moving, it is the first clue during the incidents. Even if averages look perfect. Below is a code snippet to produce percentile metrics.

@RestController
public class OrderController {
    private final Timer orderTimer;

    public OrderController(MeterRegistry registry) {
        this.orderTimer =
            registry.timer("order.api.latency");
    }

    @GetMapping("/orders/{id}")
    public Order getOrder(@PathVariable String id) {
        return orderTimer.record(() -> {
            return orderService.getOrder(id);
        });
    }
}

What good latency dashboards look like?

A good latency dashboard should answer one question instantly

Are users experiencing slow responses?

That means, you are looking for percentiles and not averages with clear trends. If you need five minutes to interpret a graph, the dashboard is not working.

Traffic — System load

Traffic on your services tells you what the system is dealing with. Without traffic metrics, you cannot interpret latency or errors correctly. An error spike during a traffic spike means something very different from an error spike during normal load.

Traffic metrics include requests per second or events per second or messages per second or also batch rates. Most incidents begin with a traffic change. Sometimes expected and sometimes not.

A common pattern that I have always observed – Traffic increases and that increases latency. Integrations slow down and errors appear. Without traffic metrics, the root cause looks mysterious. With traffic metrics, it becomes obvious.

Prometheus query example

Requests per second:

rate(http_server_requests_seconds_count[1m])

This metric alone explains a surprising number of incidents.

Production incident

I once observed a latency spike that looked like a system failure. When I checked, CPU and memory looked normal, pods were healthy, error rates were low, and everything looked fine except traffic.

Traffic had doubled within ten minutes because of a partner integration retry loop. Nothing was technically broken, but the system was overloaded. Without traffic metrics, I would have spent hours investigating. That is the difference good monitoring makes.

I will continue this discussion in my next blog post.

Vibe Coding - Conversational Software Development - Part 3 Prompt Discipline

Gaurav Gaur — Mon, 11 Aug 2025 14:51:54 +0000

Introduction

It is speculated that vibe coding could fundamentally change how we build software. Instead of writing lines of code, we will describe our goals in plain English and working software will be generated in response.

In my last post, I experimented with a few Vibe Coding tools and shared my hands-on experience. If you have been following closely, you might have noticed something subtle but important that I am using natural language as an interface. The words I choose shape how AI interprets and builds software. And behind that process lies a critical yet often overlooked layer of the system prompt.

So, What Exactly is a Prompt?

Think of Vibe Coding as a chat-driven engineering environment. Every message you send, or the prompt is not just casual conversation. It is more like writing an operating manual for your AI assistant.
A prompt sets the ground rules. It can define your preferred tech stack, coding style, naming conventions, or whether the AI should ask for clarification before making assumptions. In other words, it’s your interface for aligning the AI with your intent.

System prompt

Why Prompts Matter

From my experience, if the prompt is not clear or consistent, things can quickly go off track. Here are just a few issues I ran into when prompts were vague:

The AI picked the wrong programming language.
It introduced unfamiliar and sometimes unnecessary libraries.
It ignored earlier context and gave contradictory results.

Even with advanced tools like ChatGPT, Claude, or Cursor, vagueness in instructions can lead to unpredictable behaviour. It is not about the quality of the model rather it is about the clarity of the direction we give it.

Configure System Prompts

A benefit of using most of the modern AI platforms is that they allow users to define system level prompts. You can define the prompts either globally (across entire workspace) or local (for every project). This helps maintain consistency and avoid repeating the context over and over.

I now make it a habit to start every coding session by explicitly setting the system prompt. It is like configuring your dev environment but in a conversational format.

Designing an Effective Prompt

I am still learning as I go, but I want to share a sample prompt that has worked well for me. The idea is to set clear constraints right from the start. It gives the AI less room for misinterpretation and reduces friction during the session.
Here is a sample system prompt I often use:

Consider yourself a frontend developer.

All UI components should use React and Tailwind CSS.

Use JavaScript only and avoid any external libraries unless specified.

Ask for clarification if any requirement is unclear.

Focus on clean and modular code.

This prompt does a few important things:

It defines the role of the assistant (a frontend developer).
It sets technology boundaries—no Python, TypeScript, or surprise libraries.
It encourages the AI to ask questions if something is ambiguous.

You can easily extend this prompt to add more context depending on your project needs. For example:

All UI components must be accessible.
Ensure mobile responsiveness.
The backend is built on Java-based APIs.

This initial alignment streamlines development by cutting down on time consumption and limiting AI interactions. You can direct AI to focus according to your development approach and technology choices.

Another observation is that AI assistants demonstrate higher effectiveness when working with commonly used frameworks and tools such as React, Tailwind, and Node. These models have seen far more examples of those technologies, which means you’ll get more reliable and relevant responses.

New Developers: Don't Overthink It

If you have been following along, the discussion so far might make it feel like you need to master a dozen concepts before you even begin with Vibe Coding. But that is not true.
If you are just getting started, my advice is to set a few clear boundaries and get going. Let us take an example to create an interactive dashboard. Here is an example of a prompt that works well to start with:

I am a new developer. I want to build an interactive data dashboard.
Can you suggest a tech stack that is easy to maintain and well-supported?

Most AI assistants like ChatGPT, Claude, and Gemini will then help you through your upcoming steps. The assistants will pose clarifying questions about your requirements which allows them to develop both your tech stack and system prompt.

Tools That Help You Craft Better Prompts

As I continue experimenting, I have come to realize how important the right prompt is. And the good news? You don't have to guess. Tools like:

can help you test, iterate, and refine your prompts in real time. Below is an example of how I refined one of mine using Google Gemini.

My Prompt Refinement Process

I started with this base prompt to explore London Air Quality Data.

I want to build an interactive dashboard for exploring data located at London Air Quality Data (Excel).

Please ask me questions to better understand the requirements.

Also then help me write a system prompt that guides you to use React, Tailwind, and D3 for visualization.

If London Air Quality Data is not available at above link, I have committed a copy of it, as CSV, at the time writing this post.

Gemini asked a few smart follow-up questions about:

what kind of visualizations I had in mind,
the type of interactivity I expected,
the layout preferences, and
how I planned to handle the data source.

To narrow the scope, I directed the focus to only one sheet:

Only evaluate the ‘Monthly Averages’ sheet. Assume everything else

Based on that, Gemini helped me generate a refined system prompt that I could use to generate my dashboard. Below is the output prompt

### System Prompt:

You are an expert web development assistant tasked with building an interactive data exploration dashboard. 
Your goal is to generate a detailed plan, including component structure and initial code snippets, for a React application that visualizes air quality monthly averages.

__Project Objective__
Create an interactive dashboard to explore monthly average air quality data for London, sourced from a specific Excel file.

### Core Technologies:
- Frontend Framework: React
- Styling Framework: Tailwind CSS
- Visualization Library: D3.js

### Data Source:
- __URL__: https://data.london.gov.uk/download/290a22f1-5eef-4801-8286-3f288b49093a/acce7f88-70f0-4fd0-9160-f02a9d96b2c3/air-quality-london.xlsx
- Specific Sheet to Use: "monthly averages"

### Data Interpretation & Preparation:
1. Download and parse the Excel file from the provided URL.
2. Focus exclusively on the data within the "monthly averages" sheet.
3. Identify key columns for visualization:
  - Date (or equivalent column representing the month/year for aggregation)
  - Site Name (or equivalent for location/monitoring station)
  - Pollutant (or equivalent for the type of pollutant, e.g., NO2, PM2.5)
  - Value (or equivalent for the monthly average concentration)
4. Data cleaning and transformation should include:
  - Converting date strings/numbers to valid Date objects.
  - Handling potential missing or invalid data points gracefully (e.g., excluding them from calculations or using interpolation if appropriate for visualization context, but primarily focusing on showing available data).

### Dashboard Functionality & Interactivity:
1. Data Loading: Efficiently load and parse the "monthly averages" data from the Excel file upon component mount. Use an appropriate library like xlsx (SheetJS) for Excel parsing.
2. Filters/Controls:
  - Pollutant Type Filter: Allow users to select one or more pollutant types (e.g., a multi-select dropdown or checkboxes).
  - Monitoring Station Filter: Allow users to select one or more specific monitoring stations/sites.
  - Year Filter: Provide a way to filter data by year (e.g., a dropdown for available years or a range slider if multiple years are selected).
3. Visualizations (D3.js):
  - Time Series Line Chart: The primary visualization should be a line chart showing the monthly average trend over time for selected pollutants and stations.
    -Each line should represent a combination of pollutant and station (or allow choosing a single pollutant to compare multiple stations, or vice versa).
    - Axes should be clearly labeled (e.g., "Date" on X-axis, "Concentration" on Y-axis).
  - Interactive Tooltips: On hover, display detailed information for specific data points (e.g., date, pollutant, station, exact monthly average value).
4. User Interface (React & Tailwind CSS):
  - Clean, responsive, and intuitive dashboard layout.
  - Filter controls should be easily accessible (e.g., in a sidebar or header).
  - The main visualization area should be prominent.
  - Use Tailwind CSS for all styling, ensuring a modern and clean aesthetic.

### Output Requirements:
1. A high-level React component structure (e.g., App.js, Dashboard.js, Chart.js, FilterControls.js).
2. Pseudo-code or a conceptual outline for data fetching and initial parsing of the Excel sheet.
3. An example of how a D3 line chart component might receive data and render, focusing on the integration with React and general D3 patterns.
4. Considerations for managing application state (e.g., selected filters, loaded data).

Final Thoughts

System prompts require continuous refinement as they need to match the evolving needs of your project just like we rewrite code to enhance its clarity and maintainability. Your prompts should evolve to match your project's development to reflect:

new tools or stack changes,
updated coding patterns or style guides,
and shifts in architecture or design decisions.

A good prompt is more than basic instructions to your assistant. You can consider it as a design contract between you and your AI assistant. In my next and final post I will continue to discuss how to further fine tune the prompts.

My suggestion for now is not to stress too much about getting it perfect the first time. Start simple, iterate, and treat your prompt like part of the engineering process where your intent meets implementation.

[Boost]

Gaurav Gaur — Sun, 20 Jul 2025 09:04:26 +0000

Gaurav Gaur

Jul 19 '25

Vibe Coding - Conversational Software Development - Part 2 In Practice

#programming #ai #vibecoding #productivity

5 min read

Vibe Coding - Conversational Software Development - Part 2 In Practice

Gaurav Gaur — Sat, 19 Jul 2025 22:15:00 +0000

Introduction

In my previous blog post, I introduced the concept of Vibe Coding. It is one of the new ways that is attracting even non-programmers. Users can describe their thoughts using natural language and AI tools would convert that into a working application. Spotting this opportunity, I thought I should experiment and understand what that actually looks like in action. I took this opportunity to test out a few tools and see how they really impact my workflow.

It is not just about automating tasks; it is about changing our behaviour on how to approach a problem. To me, it feels like a declarative approach, especially when you are navigating a new framework or language for the first time.

Smarter Coding in the IDE: GitHub Copilot

I first started with the most common tool that is gaining popularity in the corporate world. It is GitHub Copilot. I have been using it regularly in VS Code, and while it is not exactly magic, it is undoubtedly helpful. When I am deep into the code and I use it for quick assistance with things like scaffolding code, writing tests, or debugging tricky edge cases. It saves me time context switching to browser. You can interact with Copilot right in your code or you can open a chat window to explain your issue.

Copilot now has an “agent mode,” which allows you to give broader instructions that can span across multiple files at feature level. It is not flawless and can sometimes come up with generic solutions. But most of the time it helps to cut down time spent on the boilerplate code.

The best part of copilot is that it is embedded in the editor, which means it is just a click away whenever I need it. Also, it is context-aware, it often makes suggestions that fits with existing application style and architecture.

Terminal-First Development: Claude Code

There are developers who love working in the terminal. I would suggest tools like Claude Code or OpenCodeX CLI for them. These are worth checking out. Although it works from the terminal, but it understands your entire codebase and can implement changes based on natural language commands.

I recently tried out Claude Code for a side project. From the CLI, I was able to set up a basic project, refactor functions, and even tidy up some old scripts. It is not a traditional simple text-based interface, and I recommend it as a must have in your toolkit.

Anthropic has done a good job with Claude Code documentation and working examples. It is worth checking out if you are curious.

AI in the Browser: Replit

The other tool I tried was Replit. It has latest AI features in browser and target developers above beginner level. I provided it below prompt to generate me a Trello like dashboard to manage agile teams.

Generate me an app something like Trello where I can track my tasks from to-do, in-progress and done. It should have a board to move tickets and ticket analysis capabilities

What really impressed me was how collaborative the experience is—you provide a prompt, and it guides you through the development process step by step. It feels almost like working alongside a junior developer who has done their homework. You are not just getting code; you are getting a plan, followed by clean, organized output. And if something goes wrong, Replit adapts and tries again. It really helps you navigate the development journey, generating and even debugging as it goes.

This tool is especially convenient for generating a quick prototype. While it might not be the best fit for highly complex systems, it shines when you need to get something up and running in a browser environment. Below is a sample application that it generated for my above prompt. It has done a decent job and has added features like tags for tickets, included date on every ticket and even presenting in-progress bar per ticket.

A Growing Ecosystem

These are not the only available options in the market. The app store for AI-driven development is rapidly filling up, and each tool has its own feature set and niche.

Lovable is a great option for working with UIs and interfaces that you can tweak visually. It is another non-IDE alternative that lets you can interact with simple natural language text prompts. You can describe what you want, and it updates the interface accordingly. It also supports backend connections, integrations, and multi-user collaboration.
Bolt.new is another available option for full stack apps and frameworks like Next.js or Svelte, and mobile via Expo. I think Bolt is outstanding if you are a beginner. The design outputs are sometimes a bit better than what I was expecting.
Another similar tool is V0 by Vercel. It is more developer focussed online tool. It is built by the team behind Next.js and it supports modern web technologies out of the box.

The tool to consider and adopt really depends on your problem statement.

Final Thoughts

I think AI tools are enhancing our programming capabilities. If I have to pick one of those, I will pick that seamlessly blends into the background, providing just the right amount of assistance without being intrusive. Tools are only valuable if they help you build faster and improves overall experience. I will continue the discussion and explore System Prompts in part 3 of this series.

Vibe Coding - Conversational Software Development - Part 1 Introduction

Gaurav Gaur — Fri, 04 Jul 2025 13:49:24 +0000

Introduction

Since I started coding, I have seen developer communities strive to make programming more human-readable—almost like writing in English or a preferred language. Many modern languages introduced syntactic sugar to make code more intuitive and conversational. These efforts have made significant advancements, but now, we are witnessing something far more transformative.

Natural language can now be translated directly into functional software. The concept is widely referred as Vibe Coding. It is an AI first approach for rapid software development. Let me try to explain the idea with the help of a step-by-step diagram that I have added below. As the picture shows, you put down your thoughts or overall idea as a prompt. You direct what step you want to achieve or what is your end goal. The chat-based AI works on your prompt and comes up with a generated code. You preview the output of the code and can fine-tune it further. Once you are happy, you put that code into your server.

End-to-end experience with chat-based AI development

The term Vibe Coding was popularized in early 2025 by researcher Andrej Karpathy, known for his work in Open API and Tesla. He defines Vibe Coding as a process of describing goals in natural language, letting advanced models handle implementation, and iterating quickly without worrying too much about the underlying code. In short - you describe, and large language models generate and refine. ¹

My Exploration of Vibe Coding

I recently started to explore this new notion of coding. While the definition makes it look straightforward, its applications vary depending on what you want to achieve and the tools you use. So far, I have seen it manifesting in several forms based on what you want to get out of it. You can perform things like:

Rapid Prototyping – This is my favourite use case. It can be used for a quick UI mock-up or lightweight programs to validate a hypothesis.
Automating Repetitive Tasks – Developers constantly look for ways to reduce toil.
Experimentation – You can ask how a library works, generate example usage, or compare multiple implementations—all in minutes.
Problem Solving and Debugging – Something, I use for debugging obscure issues. Although, I am not convinced that it is saving time.

Shifting Focus from Syntax to Problem Solving

In my opinion, Vibe Coding shifts your focus from typing every bracket or declaring every variable to problem solving. You concentrate on delivering features instead of obsessing over implementation details.

You can use AI tools to bounce off ideas, look for alternative implementations, and race through your development. But be cautious, at the end of the day, it is still your code. You are responsible for verifying it and test it and making it production ready. You must critically review the code whether all edge cases are handled and make any tweaks, if needed.

AI Driven Tools

As I mentioned earlier that it does not matter whether you are a new developer or a seasoned programmer - you can leverage Vibe coding to your advantage. Just like you would hire the right candidate for a specific role, you need to use the right AI platform for the right use case.

In this series of blogs, I intend to explore several available options and categorize them based on how and where they can be used effectively.

Easy Entry: Chat Based Builders

Let us start with the most accessible option that is chat-based builders. If you are new to all this, the easiest way to get started is using conversational AI tools. All you need is a well-structured prompt.
These tools gained traction with popularity of Anthropic’s Claude artefact. It lets you build live dashboard, tools and interfaces directly in chat window.

Try prompting:

Create a dashboard for analysing social media posts performance using most relevant visualisation from D3.

Claude will generate a live preview that you can tweak, inspect, and even publish. Below is a screenshot of what it generated for me.

Claude generated Social Media Analytics dashboard

Tools like ChatGPT, Google Gemini and Microsoft Copilot offer similar features. You can generate fully functional code with simple natural language prompts.

Try something fun like

Build a simple recipe idea generator. I want to input a few ingredients I have, and it suggests a random recipe. I should be able to save recipes I like and add my own custom recipes

You will get a working prototype that you can experiment with or customize further. Here is another screenshot—this one is from ChatGPT.

ChatGPT generated Recipe Ideas UI

You can download this auto generated code and share it wider. You can execute the same prompt on Google Gemini and Microsoft Copilot to compare the results, and may share in the comments.

What Is Next

In part 2 of the series, I will continue this discussion and add some of my experiences of working with these tools.

References

Wikipedia. "Vibe Coding." Modified May 10, 2025 ↩

Scatter Gather Pattern: Designing High-Performing Distributed Systems

Gaurav Gaur — Wed, 22 Mar 2023 13:51:46 +0000

Introduction

In this blog post, I will explore the Scatter-Gather Pattern, a cloud scalability design that can process large amounts of data and perform time-consuming, intricate computations. The pattern works like a guide for creating distributed systems to achieve parallelism. The approach can significantly reduce processing times for an online application to cater to as many users. The idea is to break down an operation into independent tasks to achieve approximately constant response time.

What is Scatter Gather Pattern

Let us take an example to understand the problem where the pattern can be effective. Suppose an application that runs on a single-core processor. It can process an incoming request in 64 seconds to produce a result. If we migrate the same application to a 16-core processor, it can generate the same output in roughly four seconds. The multicore processor will spawn sixteen threads to work in parallel to compute the result, with a few extra microseconds for managing multiple processing threads. A four-second response time is good but still considered slugging for a web application. Upgrading the processor further, verticle scaling, will mitigate the problem, not solve it.

The Scatter-Gather pattern can solve this problem. The structure of the Scatter-Gather is like a tree, with the application controller acting as the root. When the controller receives an incoming request, it divides it into multiple independent tasks and assigns them to available leaf nodes, the “scatter” phase. Leaf nodes are multiple machines or containers available on a distributed network. Each processor leaf works independently and in parallel on its computation, producing a portion of the response. Once they have completed their task, each leaf sends its partial response back to the root, which collects them all as part of the “gather” phase. The root controller then combines the partial responses and returns the final result to the client. The diagram below pictures the pattern.

Scatter-Gather pattern illustration

The strategy enables us to exploit cloud infrastructure to allocate as many virtual machines as needed and scale horizontally. The application can distribute the load dynamically to achieve the desired response time. If a particular leaf node has a high response time (e.g., due to noisy neighbors), the root controller can also redistribute the load for a quicker response. Let us examine a use case in the passage below.

Use Case

Now that we understand the pattern, let's explore a practical use case where it can demonstrate remarkable effectiveness despite its apparent simplicity. Imagine a website of a rail company that sells tickets to its customers. Suppose a customer plans a weekend trip from London and wants to optimize expenses by identifying how far he can travel with the same budget. He submits a query to the website to list all direct trains departing from London throughout the country.

One approach to handle this task is to send individual queries to each rail operator in the UK, collect the results, and then present them to the user. However, this will be impractical, as the application issues a query at a time. To improve the efficiency of this solution, we can divide the UK into five regions: Eastern, Northwest and Central, Wales and Western, Scotland, and Southern. When a request arrives, the root controller divides it and assigns it to the five leaf nodes to process in parallel. Each leaf node concentrates on a specific region, which reduces the need to search the entire country. Each leaf node returns a list of all the direct trains from London to that region. Finally, the root controller merges all the results to create a final list to display to the customer.

Implementation Details

The previous section has demonstrated a practical use case for improving the scalability of a web application. However, it is essential to consider the following points before adopting this tactic:

1. Handling non-responsive leaf nodes

Even though the applications are using a cloud environment, there is always a possibility that machines may become unavailable due to network or infrastructure issues. Moreover, the leaf nodes might communicate with external APIs (as we saw in the example use case previously) that may not respond promptly. To mitigate these issues, the root controller must set an upper limit on response time, how long it can wait for a reply. If a leaf node does not respond within that time, the root controller can ignore it and aggregate the results it has already gathered to formulate a response for the user. This approach can minimize the impact on the user by sacrificing a small portion of the result.

2. Decoupling the root controller from processing leaf nodes

It's crucial to avoid coupling the root controller with the leaf nodes to improve the architecture. A tight coupling between them would mean that the root controller continuously tracks working leaf nodes, leading to unnecessary overhead on the application. Instead, we can use a message broker. A pub-sub communication style can create a loosely coupled root controller and leaf nodes. Whenever there is an incoming request, the root controller broadcasts messages to the potential working leaf nodes. All the working nodes can then subscribe to the incoming message, process it, and publish their results to a separate queue. The root controller can consume the results from that queue, aggregate them, and respond to the user. The figure below illustrates the implementation.

Scatter-Gather pattern implementation for decoupled root controller

3. Asynchronous request processing

The pattern implementation discussed so far works best for immediate responses, ranging from a few milliseconds to a second. However, this implementation would not work for computation-intensive tasks, which can consume minutes to hours. Some examples of these computational tasks include data-intensive applications such as scientific simulations, data analytics, or machine learning. To effectively tackle these scenarios, it is essential to divide the root controller into two distinct components: a dispatcher and a processor. This division would enable us to process requests asynchronously. Below are the steps for such asynchronous communication.

When a request comes in, the dispatcher accepts the request and generates a unique identifier. The dispatcher replies to the user with an estimated time to finish the task and provides the previously generated unique identifier to track the request.
The request is then published to the leaf nodes.
The leaf nodes process the request and send their results to a queue subscribed by the processor.
The processor service combines the results received from individual processor leaves and persists them into a database.
Once the estimated time has elapsed, the user can provide the unique identifier to fetch the result. If the result is unavailable within the estimated time, the processor can either request the user to check later or provide a partial result.

The figure shows a step-by-step asynchronous implementation.

Scatter-Gather pattern asynchronous implementation

Important Considerations – Number of Leaf Nodes

After seeing the implementation in the previous section, the next crucial step is to choose the optimum number of leaf nodes. The pattern seems to suggest that adding more leaf processors and increasing parallelization can improve efficiency, but it comes at the cost of managing overhead.

When a request comes in, the root controller must divide it, send it to the distribution infrastructure, and so on. While the cost of splitting, reading, and writing from the network is negligible at first compared to the computation logic, this cost grows as we scale leaf nodes. If parallelization continues, the overhead may exceed the computation cost.
The next consideration is that the root controller waits for the result from all leaf nodes, which means the overall response time depends on the slowest node. It is commonly known as the "straggler" problem, and it can considerably delay the system's completion time due to a single slow node. To mitigate the straggler problem, established techniques such as speculative execution and dynamic partitioning can distribute the load evenly.
The most favourable number of leaf nodes also depends on the availability and accuracy of service. Suppose a service wants to achieve 97% availability without sacrificing the result, and the cloud vendor environment provides a leaf node at 99% availability. In this case, choosing scatter-gather seems to be the right choice. However, a careful calculation reveals that the service can have at most three processing leaves. Adding a fourth processing leaf might lead to an SLA breach, due to the probability of node failure (0.99 × 0.99 × 0.99 × 0.99 == 0.96). It gets even worse if we add more leaves.

Therefore, choosing the number of leaf nodes is critical for designing a successful scatter-gather implementation.

Summary

The scatter-gather pattern is a distributed computing pattern. The pattern has several advantages such as improved performance, increased fault tolerance, and better scalability. However, the choice of the optimal number of processing leaf nodes is vital to achieving high performance. Overall, the scatter-gather pattern is a powerful tool for developing high-performing distributed applications.

Originally published on https://www.gaurgaurav.com/patterns/scatter-gather/

Can I Code Without My Laptop?

Gaurav Gaur — Wed, 08 Mar 2023 13:32:26 +0000

Learning Adaptability

A few weeks ago, my laptop crashed during a meeting. It was painful as I was about to start on an exciting new feature that my Product Owner (PO) had just proposed. I immediately rushed to the IT department for assistance, and they informed me that they needed to take a backup and completely rebuild my laptop. They estimated that rebuilding would take slightly over a half day to complete.

Feeling frustrated, I asked myself: "Can I code without my laptop?". In the past, I would have answered 'NO' without hesitation. But on second thought, I realized that I know my system well and am also familiar with the domain. After more introspection, I recognized that I was already doing it without consciously realizing it. So, I went to my PO and requested him to print the new requirements for me.

From Requirements to Success Criteria

There are numerous factors that a software engineer must consider before writing even a single line of code. First and foremost is understanding the business problem and who the actors are. Once you have a clear understanding of the requirements, it enables you to identify any flaws in the requirements or if it contradicts any existing features. You can then break it down into manageable pieces and think about how to reuse those pieces or determine if something already exists. This process helps you to define the final success criteria.

Strategies and Considerations

Once you understand the problem landscape, the next step is to start thinking about solutions and strategies. Consider whether the operation will be compute-intensive or data-intensive. Can you offload any work to the client to reduce the server load? Are there any successful known solutions, patterns, or techniques that can be utilized? Should the results be cached, and if so, where? Additionally, consider breaking down the feature and delegating work to other team members to work in parallel.

In addition to the technical considerations, it's crucial to consider the nature of the application you're developing. Is it a web application or an API? You must also consider who will consume the service - web clients, iOS, Android, B2B, and B2C clients. Additionally, you need to focus on contracts and communication channels, both of which are crucial components.

Beyond Development

Have we discussed security yet? Are you planning to place this new feature behind already available authentication flows and firewalls? Have you considered data in transit and data at rest? Are there any alerting mechanisms in place in case anything goes wrong?

So far, everything looks perfect. You know the problem, the strategy to develop the solution, the consuming client, deployment, and security. Once the feature is ready, it will deploy to production. However, have you considered what happens when it's in production? Do you need any configuration changes? Are there any potential obstacles to scaling the application? How will you address any performance issues that arise in production? Is there a prebuilt dashboard for reporting the critical metrics of the new feature?

Producing Robust and Maintainable Code

Ahh, we have listed almost everything, and now it's time to start writing code. But, before you begin, you should consider the testability and maintainability of your code. You first think about your test before the code. Test Driven Development (TDD) should guide your development process to build a robust feature. This technique can help you solve problems more effectively and efficiently.

Now that you have completed the necessary groundwork, it's time to translate your well-thought-out heuristic into code using your preferred programming language and claim the story points.

Journey to Deliver High-Quality

Junior developers may not have developed the ability to think that comprehensively yet, while seasoned software engineers know what it takes to deliver a component. And as you progress to be a tech lead, you will perform similar tasks, but for many components, simultaneously. Although it may seem like a lot of hard work, once you start doing it, it will become more intuitive. The only tedious part of this entire process is actually typing the code and committing it, for which I need my laptop.

Originally published on https://www.gaurgaurav.com/2023/random/can-i-code-without-my-laptop/

Pipes And Filters Pattern - Streamlining Data Processing in Distributed Systems

Gaurav Gaur — Mon, 06 Mar 2023 14:09:18 +0000

Introduction

Applications today collect an infinite amount of data. Many applications need to transform this data before applying any meaningful business logic. Tackling this complex data or a similar processor intensive task without a thought-through strategy can have a high-performance impact. This article introduces a scalability pattern – pipes and filters – that promotes reusability and is appropriate for such scenarios.

Problem Context

Consider a scenario where incoming data triggers a sequence of processing steps, where each step brings data closer to the desired output state. The origin of data is referred to as a data source. Examples of data sources could be home IoT devices, a video feed from roadside cameras, or continuous inventory updates from warehouses. The processing steps during the transformation usually execute a specific operation and are referred to as filters. These processing steps are independent and do not have a side-effect, i.e., running a step does not depend on any other steps. Each filtering step reads the data, performs a transforming operation based on local data and produces an output. Once the data has gone through the filters, it reaches its final processed stage where it is consumed, referred to as Data Sink.
A straightforward implementation could be a complete service that takes the data input, performs all the steps sequentially and produce an output. The modules within the service perform the required step and pass on the data to the next module. Although the solution initially looks good as it hides all the complexity of data processing, the use of such a monolithic service will have listed problems:

The solution limits code reuse.
Any change in a processing filter step will lead to a release of all the filters.
The slowest processing filter step can become a bottleneck impacting the overall throughput of the service.
As the solution scales, it will scale all the processing steps. Such scaling will lead to excessive resource utilisation in the cloud when not intended.

Thus, the above approach is inflexible, non-scalable and against reusability{:target="_blank"}. A good solution must address all the above concerns. Let us try to find a solution to this in the next section.

Solution

We can improve the suboptimal solution from the previous section by splitting the monolithic service into a series of components or functions, each performing a single processing filter step. These functions are combined to form a pipeline where functions receive a standard input and produce an accepted output. Such a decomposition will introduce loose coupling into the solution. It would allow effortless removal, replacement, rearrangement, or plug-in of a new filter step in pipelines. Moreover, it will enable code reuse if we need a similar processing filter in any other pipeline, the existing filters can be shared.

The solution also addresses the issue of the slowest processing filter step being a bottleneck, as it provides an opportunity to scale individual functions. The slowest running function can run parallel instances to spread the load and improve throughput. The individual scaling of components will also save us from any extra costs.

Finally, the processing filters can take advantage of cloud infrastructure. For instance, we can request a specific virtual machine based on the behavior of a filter - processor-intensive or memory-intensive.

When to use this pattern

As we now have a better knowledge of the pattern, this section explores when to think about the pattern.

The pattern is valuable for breaking down complex processing workflows, where each step is independent.
The processing steps have distinctive hardware needs or scalability requirements, and, in some cases, even preferred programming language requirements.
The pattern can address flexibility concerns in workflows where requirements and business processes are changing constantly.
The pattern is efficient in removing bottlenecks in workflows by parallel and distributed processing of individual steps.

Important Considerations

The pattern has many advantages but consider the below points before adopting this pattern.

The increased flexibility and clear separation come at the expense of complexity, especially when individual filter steps are too granular. Moreover, transferring data and communication between filters is an overhead.
There is always a risk of losing the messages between the filters; thus, a thought-through strategy is needed to mitigate these scenarios.
Like above, there is also a need for a recovery strategy if a pipeline fails. Can a new message be injected into the pipeline, or is there a provision to save the state of the pipeline?
Each filter step must be stateless and must have sufficient context. Since these filter steps operate in isolation, each filter must be provided with enough inputs to work.

Implementation Details

Multiple implementations of the pattern are possible based on the nature of the solution.

Dynamic Pipeline Behavior

Data-Driven Strategy:
It is also known as a push strategy. The data source writes a message into the pipeline. Filters read the incoming message, process it, and push it further. Finally, the message reaches the data sink, where it is consumed.

Demand-Driven Strategy:
A data-driven strategy pushes the message through the pipeline, while in a demand-driven strategy, a message is pulled. Whenever the data sink initiates a read, the data source produces a new message. The filters then process the new message, and the final transformed result is sent to the data sink in response to the original call.

Message Behavior

During the discussions until now, we notice that the message travels through the pipeline, typically via message brokers. However, we can skip sending the entire data through the pipeline. An alternative could be to store the data at a temporary location on a distributed file system or in a preferred database. Each filter step can pass the storage location of the data from where it can be read.

The implementation reduces the data travelling through the pipe. Moreover, the latest state of the data is always persisted in case the pipeline fails. But the disadvantage of the implementation is that it increases I/O operations, which increases the chances of exceptions.

Filter Behavior

The number of filters should be developed carefully in a pipeline. While designing filters, it is wise to break down a large filter into multiple constituent steps to minimize the complexity. On the other hand, filters that are light on processing can be combined into a single step to reduce the overhead. Filter steps introduced in a pipeline should be good enough to maximize concurrency, but small enough to minimize the overhead of transferring data through the pipeline stages.

Sample Use case

Previous sections have made clear how to use this pattern. This section provides an example to explain pattern usage. Let us say we are collecting a continuous stream of data for a machine learning project. Since the data is collected from multiple sources, the raw data could be in different formats, incomplete, or dirty. Raw data must be cleaned and structured before conducting any analysis on it. Data cleaning is a tedious and time-consuming process but is a necessary step to get a quality result. The data cleaning process consists of six steps, specifically

Remove duplicates,
remove irrelevant data,
fix structural errors,
filter unwanted outliers,
handle missing data, and finally
validate your data.

Although we can write a service that hides the complexity of data cleaning behind, it would have the same demerits as were listed earlier. It would be worth creating a pipeline for the data cleaning, as a couple of steps in the above pipeline may need specialized processing and hardware. Furthermore, there are multiple off-the-shelf tools available that can be used in the pipeline.

Summary

In the end, I would say that pattern helps create scalable applications. The pattern resembles a water supply system, where a stream of water flows through the channel via various filters. The number of filters installed on the supply channel is based on the final usage of water. For example, water supplied to a household will pass through more filters than water supplied for irrigation.

Originally published on https://www.gaurgaurav.com/patterns/pipes-and-filters/

Load Balancing Pattern: A Must For Scalability

Gaurav Gaur — Tue, 28 Feb 2023 10:47:14 +0000

Introduction

Any modern website on the internet today receives thousands of hits, if not millions. Without any scalability strategy, the website is either going to crash or significantly degrade in performance. A situation we want to avoid. As a known fact, adding more powerful hardware or scaling vertically will only delay the problem. However, adding multiple servers or scaling horizontally, without a well-thought-through approach, may not reap the benefits to their full extent.

The recipe for creating a highly scalable system in any domain is to use proven software architecture patterns. Software architecture patterns enable us to create cost-effective systems that can handle billions of requests and petabytes of data. The article describes the most basic and popular scalability pattern known as Load Balancing. The concept of Load Balancing is essential for any developer building a high-volume traffic site in the cloud. The article first introduces the Load balancer, then discusses the type of Load balancers, next is load balancing in the cloud, followed by Open-source options and finally a few pointers to choose load balancers.

What is a Load Balancer?

A load balancer is a traffic manager that distributes incoming client requests across all servers that can process them. The pattern helps us realize the full potential of cloud computing by minimizing the request processing time and maximizing capacity utilization. The traffic manager dispatches the request only to the available servers, and hence, the pattern works well with scalable cloud systems. Whenever a new server is added to the group, the load balancer starts dispatching requests to it and scales up. On the contrary, if a server goes down, the dispatcher redirects requests to other available servers in the group and scales down, which helps us save money.

Types of Load Balancers

After getting the basics of Load Balancer, the next is to familiarize with the load balancing algorithms. There are broadly 2 types of load balancing algorithms.

Static Load Balancers

Static load balancers distribute the incoming traffic equally as per the algorithms.

Round Robin is the most fundamental and default algorithm to perform load balancing. It distributes the traffic sequentially to a list of servers in a group. The algorithm assumes that the application is stateless and each request from the client can be handled in isolation. Whenever a new request comes in it goes to the next available server in the sequence. As the algorithm is basic, it is not suited for most cases.
Weighted Round Robin is a variant of round robin where administrators can assign weightage to servers. A server with a higher capacity will receive more traffic than others. The algorithm can address the scenario where a group has servers of varying capacities.
Sticky Session also known as the Session Affinity algorithm is best suited when all the requests from a client need to be served by a specific server. The algorithm works by identifying the requests coming in from a particular client. The client can be identified either by using the cookies or by the IP address. The algorithm is more efficient in terms of data, memory and using cache but can degrade heavily if a server start getting stuck with excessively long sessions. Moreover, if a server goes down, the session data will be lost.
IP Hash is another way to route the requests to the same server. The algorithm uses the IP address of the client as a hashing key and dispatches the request based on the key. Another variant of this algorithm uses the request URL to determine the hash key.

Dynamic Load Balancers

Dynamic load balancers, as the name suggests, consider the current state of each server and dispatch incoming requests accordingly.

Least Connection dispatches the traffic to the server with the fewest number of connections. The assumption is that all the servers are equal and the server having a minimum number of connections would have the maximum resources available.
Weighted Least Connection is another variant of least connection. It provides an ability for an administrator to assign weightage to servers with higher capacity so that requests can be distributed based on the capacity.
Least Response Time considers the response time along with the number of connections. The requests are dispatched to the server with the fewest connections and minimum average response time. The principle is to ensure the best service to the client.
Adaptive or Resource-based dispatches the load and makes decisions based on the resources i.e., CPU and memory available on the server. A dedicated program or agent runs on each server that measures the available resources on a server. The load balancer queries the agent to decide and allocate the incoming request.

Load Balancing in Cloud

A successful cloud strategy is to use load balancers with Auto Scaling. Typically, cloud applications are monitored for network traffic, memory consumption and CPU utilization. These metrics and trends can help define the scaling policies to add or remove the application instances dynamically. A load balancer in the cloud considers the dynamic resizing and dispatches the traffic based on available servers. The section below describes a few of the popularly known solutions in the cloud:

AWS - Elastic Load Balancing (ELB)

Amazon ELB is highly available and scalable load balancing solution. It is ideal for applications running in AWS. Below are 4 different choices of Amazon ELB to pick from:

Application Load Balancer used for load balancing of HTTP and HTTPS traffic.
Network Load Balancer is used for load balancing both TCP, UDP and TLS traffic.
Gateway Load Balancer is used to deploy, scale, and manage third-party virtual appliances.
Classic Load Balancer is used for load balancing across multiple EC2 instances.

GCP – Cloud Load Balancing

Google Cloud Load Balancing is a highly performant and scalable offering from Google. It can support up to 1 million+ queries per second. It can be divided into 2 major categories i.e., internal, and external. Each major category is further classified based on the incoming traffic. Below are a few load balancer types.

A complete guide to compare all the available load balancers can be found on the Google load balancer page.

Microsoft Azure Load Balancer

Microsoft Azure load balancing solution provides 3 different types of load balancers:

Standard Load Balancer - Public and internal Layer 4 load balancer
Gateway Load Balancer - High performance and high availability load balancer for third-party Network Virtual Appliances.
Basic Load Balancer - Ideal for small scale application

Open-Source Load Balancing Solution

Although a default choice is always to use the vendor specific cloud load balancer, there are a few open-source load balancer options available. Below is a couple of those.

NGINX

NGINX provides NGINX Plus and NGINX, modern load balancing solutions. There are many popular websites including Dropbox, Netflix and Zynga, that are using load balancers from NGINX. The NGINX load balancing solutions are high performance and can help improve the efficiency and reliability of a high traffic website.

Cloudflare

Cloudflare is another popular load balancing solution. It offers different tiers of load balancer to meet specific customer needs. Pricing plans are based on the services, health checks and security provided.

Zero Trust platform plans
Websites & application services plans
Developer platform plans
Enterprise plan
Network services

Choosing Load Balancer

It is evident from the sections above that a load balancer can have a big impact on the applications. Thus, picking up the right solution is essential. Below are a few considerations to make the decision.

Identifying the short term and long-term goals of a business can help drive the decision. The business requirements should help identify the expected traffic, growth regions and region of the service. Business considerations should also include the level of availability, the necessity of encryptions or any other security concerns that need to be addressed.
There are ample options available in the market. Identifying the necessary features for the application can help pick the right solution. As an example, the load balancer should be able to handle the incoming traffic for the application such as HTTP/HTTPS or SSL or TCP. Another example is a load balancer used for internal traffic has different security concerns than external load balancers.
Cloud vendors provide various support tiers and pricing plans. A detailed comparison of the total cost of ownership, features and support tier can help identify the right choice for the project.

Most experts agree that it is a best practice to use a load balancer to manage the traffic which is critical to cloud applications. With the use of a load balancer, applications can serve the requests better and also save costs.