DEV Community: Vivek Alhat

The Builder Design Pattern

Vivek Alhat — Sat, 29 Mar 2025 19:13:00 +0000

When creating objects in programming, sometimes we need to set many optional values. If we use constructors for this, it can get messy quickly and cause constructor hell. The builder pattern helps us solve this problem by providing a step-by-step way to create objects.

Why Use the Builder Pattern?

Consider a User class where some properties (like age and email) are optional. Without the Builder Pattern, you might need multiple constructors or pass undefined for missing values. This can make code hard to read.

The Builder Pattern solves this by:

Making object creation more readable.
Avoiding unnecessary or confusing constructor arguments.

Example: Creating a User Object with a Builder

Let's take a look at how the Builder Pattern works in TypeScript.

Step 1: Define the `User` Class

class User {
  constructor(
    public name: string,
    public age?: number,
    public email?: string
  ) {}
}

This User class has three properties: name, age, and email. Only name is required.

Step 2: Create a `UserBuilder` Class

interface IUserBuilder {
  setName(name: string): this;
  setAge(age: number): this;
  setEmail(email: string): this;
  build(): User;
}

class UserBuilder implements IUserBuilder {
  private user: Partial<User> = {};

  setName(name: string): this {
    this.user.name = name;
    return this;
  }

  setAge(age: number): this {
    this.user.age = age;
    return this;
  }

  setEmail(email: string): this {
    this.user.email = email;
    return this;
  }

  build(): User {
    if (!this.user.name) {
      throw new Error("User must have a name");
    }
    return new User(this.user.name, this.user.age, this.user.email);
  }
}

Step 3: Using the `UserBuilder`

const user = new UserBuilder()
  .setName("John Doe")
  .setAge(35)
  .setEmail("johndoe@email.com")
  .build();

console.log(user);

Output:

User { name: 'John Doe', age: 35, email: 'johndoe@email.com' }

If we try to create a user without a name, we get an error:

const newUser = new UserBuilder().setAge(70).build(); // Error: User must have a name

Benefits of Using the Builder Pattern

✅ Improves readability – Each property is set explicitly.
✅ Reduces constructor overloads – No need to pass undefined for optional parameters.
✅ Flexible object creation – Easily add or remove properties without modifying the constructor.

When to Use the Builder Pattern

Use the Builder Pattern when:

You have a class with many optional properties.
You want to make object creation more readable.
You want to enforce required fields before creating the object.

Conclusion

The Builder Pattern is a simple yet powerful way to create objects, especially when dealing with optional parameters. It makes your code cleaner, easier to understand, and less error-prone.

I am implementing design patterns in Go, Python, and TypeScript. You can find the repository here.

Happy coding! 🚀

The Singleton Design Pattern

Vivek Alhat — Fri, 07 Mar 2025 09:23:58 +0000

Design patterns are essential for writing clean, scalable, and maintainable code. One of the most commonly used design patterns is the Singleton Pattern. In this article, we’ll explore what the Singleton pattern is, why it’s useful, and how to implement it with an example.

What is the Singleton Design Pattern?

The Singleton pattern is a creational design pattern that ensures a class has only one instance and provides a global point of access to that instance. This pattern is useful when you need to control access to shared resources, such as logging mechanisms, configuration settings, or database connections.

Characteristics of the Singleton Pattern

Single Instance: Ensures that only one instance of the class exists.
Global Access Point: Provides a way to access the single instance.
Lazy Initialization: The instance is created only when it is first needed.
Thread Safety (Optional): In multi-threaded environments, precautions should be taken to avoid multiple instances.

How to Implement the Singleton Pattern

Let's implement a Logger class using the Singleton pattern in TypeScript:

class Logger {
  // private static instance variable
  private static instance: Logger;

  // private constructor to prevent external initialization
  private constructor() {}

  // static method to get the instance
  public static getInstance(): Logger {
    if (!Logger.instance) {
      console.log("Creating new instance");
      Logger.instance = new Logger();
    }
    return Logger.instance;
  }
}

// Usage
const logger1 = Logger.getInstance();
const logger2 = Logger.getInstance();

if (logger1 === logger2) {
  console.log("Both are the same loggers");
}

Explanation:

Private Constructor: Prevents external instantiation.
Static Instance Variable: Stores the single instance.
Static Method (getInstance): Ensures only one instance is created and provides global access.
Lazy Initialization: The instance is created only when first accessed.
Ensuring Uniqueness: Checking if logger1 === logger2 confirms that both variables reference the same instance.

Use Cases of Singleton Pattern

The Singleton pattern is useful in scenarios where multiple instances could lead to inconsistencies or unnecessary resource usage. Some common use cases include:

Logging: A single instance ensures consistent logging throughout the application.
Configuration Management: Prevents multiple conflicting configuration objects.
Database Connection Pooling: Avoids multiple redundant connections to the database.

Pros and Cons of Singleton Pattern

Pros:

✔ It ensures a single point of access to a resource.
✔ It saves memory by preventing multiple unnecessary instances.
✔ It is useful for managing shared states like caching and logging.

Cons:

✖ It can introduce global state hence making debugging harder.
✖ It is not inherently thread-safe hence needs additional handling in multi-threaded environments.

Conclusion

The Singleton pattern is a powerful and commonly used design pattern that ensures only one instance of a class exists.

I am implementing design patterns in Go, Python, and TypeScript. You can find the repository here.

Happy coding! 🚀

Understanding Design Patterns: A Beginner's Guide

Vivek Alhat — Tue, 04 Feb 2025 17:15:44 +0000

If you've worked on a software project that became difficult to manage as it grew, you've likely encountered design challenges that could have been solved with a structured approach. This is where design patterns come in. They provide reusable solutions to common problems in software development, improving code maintainability, readability, and scalability.

In this article, we’ll explore what design patterns are, why they are useful, their different types, and their pros and cons. This article will serve as a foundation for the upcoming articles in this series, where we will dive deeper into specific design patterns.

What Are Design Patterns?

Design patterns are proven solutions to recurring design problems in software development. They are not specific implementations but rather templates that can be used to solve particular problems in different contexts.

Why Use Design Patterns?

Software development is often about solving problems efficiently. Instead of reinventing the wheel every time, developers can leverage design patterns to:

Improve Code Reusability – Patterns provide well-established solutions that can be adapted across different projects.
Enhance Maintainability – Well-structured code is easier to read, modify, and debug.
Reduce Development Time – With predefined solutions, developers can focus on implementation rather than figuring out the architecture from scratch.

Types of Design Patterns

Design patterns are categorized into three main types:

1. Creational Patterns

These patterns deal with object creation mechanisms, improving flexibility and reusability. Some common creational patterns include:

Singleton – Ensures only one instance of a class exists.
Factory – Provides an interface for creating objects but allows subclasses to alter the type of objects that will be created.
Builder – Separates object construction from its representation, allowing step-by-step creation of complex objects.

2. Structural Patterns

These patterns focus on composing classes and objects to form larger structures while ensuring flexibility and efficiency. Examples include:

Adapter – Allows incompatible interfaces to work together.
Decorator – Dynamically adds responsibilities to an object without modifying its code.
Facade – Provides a simplified interface to a larger system.

3. Behavioral Patterns

These patterns deal with communication between objects, focusing on improving the flexibility of interactions. Examples include:

Observer – Defines a one-to-many dependency where multiple objects are notified of state changes in another object.
Strategy – Enables selecting an algorithm’s implementation at runtime.
Command – Encapsulates requests as objects, allowing parameterization of clients and request queuing.

Pros and Cons of Design Patterns

✅ Pros:

Increases Code Reusability – Solutions can be used across multiple projects.
Enhances Maintainability – Code is more modular and easier to refactor.
Improves Scalability – Helps in designing software that can evolve over time.
Standardized Solutions – Reduces ambiguity in architecture discussions.

❌ Cons:

Overuse Can Lead to Complexity – Applying patterns unnecessarily can make code more complicated.
Learning Curve – Understanding and properly implementing design patterns requires time and experience.
Performance Overhead – Some patterns, like Decorator and Observer, may introduce additional processing costs.

References

Refactoring Guru

Conclusion

Design patterns are powerful tools that help developers build efficient, maintainable, and scalable software.

In the upcoming articles of this series, we will explore specific design patterns in detail. Stay tuned!

Building a Multi-Agent Blog Writer Using Crew AI

Vivek Alhat — Sun, 29 Dec 2024 14:26:44 +0000

Creating high-quality, engaging blog content often involves multiple stages, from planning and writing to editing. With Crew AI, you can build a multi-agent system that automates this process using AI, streamlining content creation while maintaining quality. In this blog, we’ll explore what Crew AI is, its key components, and how to build a blog writing multi-agent system step by step.

What is Crew AI?

Crew AI is a Python-based framework for orchestrating intelligent agents to collaborate on complex tasks. Each agent represents a specialized entity with a distinct role, expertise, and goals. By defining tasks and assigning them to agents, Crew AI enables you to build workflows where agents seamlessly interact to achieve a shared goal.

Key Concepts

Agent
An Agent represents an individual entity responsible for executing specific tasks. Each agent has a:
- Role: Defines its primary responsibility.
- Goal: Specifies the desired outcome for its tasks.
- Backstory: Provides contextual information to enhance its performance.
Task
A Task represents a unit of work assigned to an agent. It includes:
- Description: A detailed explanation of what the agent must do.
- Expected Output: Criteria for evaluating task completion.
- Assigned Agent: The agent responsible for executing the task.
Crew
A Crew is a collection of agents working together to complete tasks. The tasks can be executed sequentially or hierarchically, depending on the Process configuration.
Process
The Process defines how tasks are executed. It can be:
- Sequential: Tasks are completed in a predefined order.
- Hierarchical: Tasks are executed in a managerial hierarchy.

In this example, we use the groq/llama-3.1-8b-instant language model as the LLM. This model provides the natural language capabilities for agents to perform their roles effectively. You'll need an API key to interact with this LLM. You can create your Groq account here.

Let’s dive into building a blog writing system where agents collaboratively plan, write, and edit a blog post.

1. Setting Up the Environment

Before starting, ensure you have the required dependencies installed.

pip install crewai groq

2. Initialize the LLM

The language model is the backbone of our system. Define it as follows:

from crewai import LLM  

llm = LLM(  
    model="groq/llama-3.1-8b-instant",  
    api_key="",  # Replace with your API key  
)

This sets up the LLM that all agents will use for processing text-based tasks.

3. Defining Agents

a) Content Planner
The Content Planner is responsible for creating a structured outline for the blog post. Define it with a clear goal and backstory to guide its outputs:

from crewai import Agent  

planner = Agent(  
    role="Content Planner",  
    goal="Develop a comprehensive and structured content outline for {topic}, including key sections, subsections, and supporting points.",  
    backstory="An expert content strategist skilled at breaking down complex topics into manageable parts.",  
    llm=llm,  
)

b) Content Writer
The Content Writer expands the outline into a detailed blog post:

writer = Agent(  
    role="Content Writer",  
    goal="Produce captivating and informative blog posts about {topic}.",  
    backstory="A versatile writer passionate about simplifying complex ideas.",  
    llm=llm,  
)

c) Content Editor
The Content Editor refines and polishes the blog post to ensure quality:

editor = Agent(  
    role="Content Editor",  
    goal="Refine the blog post, ensuring clarity, coherence, and grammatical accuracy.",  
    backstory="A meticulous editor with a strong eye for detail.",  
    llm=llm,  
)

4. Defining Tasks

a) Content Planning Task
The first task involves creating an outline:

from crewai import Task  

plan = Task(  
    description="Create a detailed content outline for {topic}, including main sections and key points.",  
    expected_output="A structured outline for {topic}, with headings and bullet points.",  
    agent=planner,  
)

b) Content Writing Task
The second task transforms the outline into a complete blog post:

write = Task(  
    description="Transform the outline into a 1000-word blog post in markdown format.",  
    expected_output="A comprehensive blog post with clear language and examples.",  
    agent=writer,  
)

c) Content Editing Task
The final task involves refining the blog post:

edit = Task(  
    description="Review and polish the blog post, ensuring quality and alignment with the outline.",  
    expected_output="A polished, error-free blog post with enhanced structure and tone.",  
    agent=editor,  
)

5. Assembling the Crew

Combine the agents and tasks into a crew:

from crewai import Crew, Process  

crew = Crew(  
    agents=[planner, writer, editor],  
    tasks=[plan, write, edit],  
    verbose=True,  
    process=Process.sequential,  # Execute tasks in sequence  
)

6. Kicking Off the Process

Prompt the user for the blog topic and start the process:

topic = input("Topic?\n")  
edit.output_file = topic + ".md"  # Save the final output as a markdown file  
crew.kickoff(inputs={"topic": topic})

How It Works

The Content Planner creates an outline for the given topic.
The Content Writer expands the outline into a detailed blog post.
The Content Editor reviews and refines the content for quality.

The entire process runs sequentially, ensuring each task builds upon the output of the previous one.

Building a simple rate limiter in Go

Vivek Alhat — Sat, 28 Dec 2024 18:10:37 +0000

Rate limiting is a critical concept in web development and API design. It ensures that users or systems can only make a limited number of requests to a server within a specific time frame. In this blog post, we’ll explore what rate limiting is, why it’s essential, and how to implement a simple rate limiter in Go.

What Is Rate Limiting?

Imagine a theme park with a roller coaster ride that can only accommodate 10 people every 10 minutes. If more than 10 people try to get on within that timeframe, they’ll have to wait. This analogy mirrors the principle of rate limiting in software systems.

In technical terms, rate limiting restricts the number of requests a client (e.g., a user, device, or IP address) can send to a server within a predefined period. It helps:

Prevent abuse and ensure fair usage of resources.
Protect servers from being overwhelmed by excessive traffic.
Avoid costly overuse of third-party APIs or services.

For example, an API might allow 100 requests per minute per user. If a user exceeds this limit, the server denies further requests until the limit resets.

How Does Rate Limiting Work?

One common way to implement rate limiting is through the token bucket algorithm. Here’s how it works:

A bucket starts with a fixed number of tokens (e.g., 10).
Each request removes one token from the bucket.
If the bucket has no tokens left, the request is denied.
Tokens are replenished at a steady rate (e.g., 1 token every second) until the bucket is full.

Building a Simple Rate Limiter in Go

Let’s dive into building a rate limiter in Go that limits each client to 3 requests per minute.

Step 1: Define the Rate Limiter Structure

We’ll use the sync.Mutex to ensure thread safety and store information like the number of tokens, the maximum capacity, and the refill rate.

package main

import (
    "sync"
    "time"
)

type RateLimiter struct {
    tokens         float64   // Current number of tokens
    maxTokens      float64   // Maximum tokens allowed
    refillRate     float64   // Tokens added per second
    lastRefillTime time.Time // Last time tokens were refilled
    mutex          sync.Mutex
}

func NewRateLimiter(maxTokens, refillRate float64) *RateLimiter {
    return &RateLimiter{
        tokens:         maxTokens,
        maxTokens:      maxTokens,
        refillRate:     refillRate,
        lastRefillTime: time.Now(),
    }
}

Step 2: Implement Token Refill Logic

Tokens should be replenished periodically based on the elapsed time since the last refill.

func (r *RateLimiter) refillTokens() {
    now := time.Now()
    duration := now.Sub(r.lastRefillTime).Seconds()
    tokensToAdd := duration * r.refillRate

    r.tokens += tokensToAdd
    if r.tokens > r.maxTokens {
        r.tokens = r.maxTokens
    }
    r.lastRefillTime = now
}

Step 3: Check If a Request Is Allowed

The Allow method will determine if a request can proceed based on the available tokens.

func (r *RateLimiter) Allow() bool {
    r.mutex.Lock()
    defer r.mutex.Unlock()

    r.refillTokens()

    if r.tokens >= 1 {
        r.tokens--
        return true
    }
    return false
}

Step 4: Apply Rate Limiting Per IP

To limit requests per client, we’ll create a map of IP addresses to their respective rate limiters.

type IPRateLimiter struct {
    limiters map[string]*RateLimiter
    mutex    sync.Mutex
}

func NewIPRateLimiter() *IPRateLimiter {
    return &IPRateLimiter{
        limiters: make(map[string]*RateLimiter),
    }
}

func (i *IPRateLimiter) GetLimiter(ip string) *RateLimiter {
    i.mutex.Lock()
    defer i.mutex.Unlock()

    limiter, exists := i.limiters[ip]
    if !exists {
        // Allow 3 requests per minute
        limiter = NewRateLimiter(3, 0.05)
        i.limiters[ip] = limiter
    }

    return limiter
}

Step 5: Create Middleware for Rate Limiting

Finally, we’ll create an HTTP middleware that enforces the rate limit for each client.

import (
    "fmt"
    "net"
    "net/http"
)

func RateLimitMiddleware(ipRateLimiter *IPRateLimiter, next http.HandlerFunc) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        ip, _, err := net.SplitHostPort(r.RemoteAddr)
        if err != nil {
            http.Error(w, "Invalid IP", http.StatusInternalServerError)
            return
        }

        limiter := ipRateLimiter.GetLimiter(ip)
        if limiter.Allow() {
            next(w, r)
        } else {
            http.Error(w, "Rate Limit Exceeded", http.StatusTooManyRequests)
        }
    }
}

Step 6: Set Up the Server

Here’s how to hook it all together and test the rate limiter.

func handleRequest(w http.ResponseWriter, _ *http.Request) {
    fmt.Fprintf(w, "Request processed successfully at %v\n", time.Now())
}

func main() {
    ipRateLimiter := NewIPRateLimiter()

    mux := http.NewServeMux()
    mux.HandleFunc("/", RateLimitMiddleware(ipRateLimiter, handleRequest))

    fmt.Println("Server starting on :8080")
    http.ListenAndServe(":8080", mux)
}

Testing the Rate Limiter

Start the server and test it using curl or your browser:

curl -X GET http://localhost:8080

Send 3 requests quickly: All should succeed.
Send a 4th request within the same minute: You should see Rate Limit Exceeded message.
Wait for 20 seconds and try again: The bucket refills, and requests should succeed.

Source Code

GitHub Repo

Understanding Database Consistency

Vivek Alhat — Wed, 25 Dec 2024 09:34:06 +0000

When working with databases, one of the most important concepts to understand is consistency. It ensures that the data in your database remains reliable and meaningful. Let’s dive into what database consistency is, its types, and why it’s essential for your applications.

What Is Database Consistency?

In simple terms, database consistency refers to the correctness and validity of the data stored in a database. Whenever data is added, modified, or deleted, the database should ensure it remains in a consistent state. This means that all defined rules, constraints, and relationships between data must be upheld.

For example, imagine you’re transferring money between two bank accounts. If $100 is deducted from one account, it must be added to the other account, anything else would result in inconsistent data.

Consistency is a key pillar of the ACID properties of databases:

Atomicity: Transactions are all-or-nothing.
Consistency: The database moves from one valid state to another.
Isolation: Concurrent transactions don’t interfere with each other.
Durability: Once a transaction is committed, it’s permanent.

Types of Database Consistency

There are two main types of consistency in databases: strong consistency and eventual consistency. Let’s break these down.

Strong Consistency
- Strong consistency ensures that all users see the same data at the same time, no matter where they are accessing it from.
- Every transaction is immediately visible to all parts of the system once it’s complete.
- This type of consistency is common in traditional relational databases like MySQL and PostgreSQL, which use ACID transactions.

Example: In an online ticket booking system, once a ticket is sold, it’s immediately marked as unavailable to everyone else.

Eventual Consistency
- Eventual consistency is often used in distributed systems and NoSQL databases.
- It doesn’t guarantee immediate consistency, but over time, all parts of the system will reflect the same data.
- This approach is suitable for scenarios where high availability and partition tolerance are prioritized over immediate consistency (as per the CAP theorem).

Example: In a social media app, when you update your profile picture, it might take a few seconds or minutes for all your friends to see the change, but eventually, they will all see the updated picture.

Differences Between Strong and Eventual Consistency

Feature	Strong Consistency	Eventual Consistency
Guarantee	All users see the same data instantly.	All users see the same data eventually.
Use Case	Critical systems like banking or finance.	High-availability systems like social media.
Latency	Higher latency due to synchronization.	Lower latency with delayed consistency.
Complexity	Simpler to reason about.	More complex to implement in large systems.
Example Systems	Relational databases (MySQL, PostgreSQL).	NoSQL databases (Cassandra, DynamoDB).

Why Does Consistency Matter?

Consistency is crucial for:

Data Integrity: Ensuring that data remains accurate and follows defined rules.
User Trust: Providing users with correct and reliable information.
Business Logic: Preventing errors like duplicate entries, incorrect balances, or invalid relationships between data.

Balancing Consistency with Other Factors

In real-world systems, achieving perfect consistency can sometimes come at the cost of performance or availability. Distributed databases, for instance, often face a trade-off between consistency, availability, and partition tolerance, a concept known as the CAP theorem.

Depending on your application’s requirements, you might prioritize one over the others:

High Consistency: Ideal for financial systems or other critical applications where data accuracy is paramount.
High Availability: Suitable for applications like social networks or e-commerce sites where uptime is more important than immediate consistency.

Conclusion

Database consistency is a foundational concept that ensures the reliability and accuracy of your data. Whether you prioritize strong consistency or eventual consistency depends on your application’s specific needs. By understanding and applying the right strategies, you can design systems that balance consistency with performance and scalability.

Building a simple RAG agent with LlamaIndex

Vivek Alhat — Mon, 30 Sep 2024 14:44:00 +0000

LlamaIndex is a framework for building context-augmented generative AI applications with LLMs.

What is context augmentation?

Context augmentation refers to a technique where additional relevant information or context is provided to an LLM model, improving its understanding and responses to a given query. This augmentation typically involves retrieving, integrating, or attaching external data sources such as documents, embeddings, to the model's input. The goal is to make the model more informed by providing it with necessary context that helps it give better, more accurate and nuanced answers. Retrieval augmented generation(RAG) is the most popular example of context augmentation.

What are agents?

Agents are automated reasoning and decision engines powered by LLMs that use tools to perform research, data extraction, web search, and more tasks. They can be used for simple use cases like question-answering based on the data to being able to decide and take actions in order to complete tasks.

In this post, we'll build a simple RAG agent using LlamaIndex.

Building a RAG agent

Installing dependencies

We'll be using Python to build simple RAG agent using LlamaIndex. Let's first install required dependencies as below:

pip install llama-index python-dotenv

Setting up LLM and loading documents

We'll be using OpenAI's gpt-4o-mini as the LLM. You need to put the API key in environment variables file. You can read more about setting up a local LLM using LLamaIndex here.

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, Settings
from llama_index.llms.openai import OpenAI
from dotenv import load_dotenv

# Load environment variables (e.g., OPENAI_API_KEY)
load_dotenv()

# Configure OpenAI model
Settings.llm = OpenAI(model="gpt-4o-mini")

# Load documents from the local directory
documents = SimpleDirectoryReader("./data").load_data()

# Create an index from documents for querying
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

First we configure the LLM model using OpenAI and specifying the gpt-4o-mini model. You can switch to other available models/LLMs depending on your needs.
Then, we use SimpleDirectoryReader to load documents from the local ./data directory. This reader scans through the directory, reads files, and structures the data for querying.
Next, we create a vector store index from the loaded documents, allowing us to perform efficient vector-based retrieval during query execution.

Creating custom functions for agent

Now, let's define some basic functions that the agent can use to perform tasks.

def multiply(a: float, b: float) -> float:
    """Multiply two numbers and returns the product"""
    return a * b

def add(a: float, b: float) -> float:
    """Add two numbers and returns the sum"""
    return a + b

Creating tools for the agent

Next, we'll create tools from the functions and the query engine that we defined earlier, which the agent will use to perform tasks. These tools acts as utilities that the agent can leverage when handling different types of queries.

from llama_index.core.tools import FunctionTool, QueryEngineTool

# Wrap functions as tools
add_tool = FunctionTool.from_defaults(fn=add)
multiply_tool = FunctionTool.from_defaults(fn=multiply)

# Create a query engine tool for document retrieval
space_facts_tool = QueryEngineTool.from_defaults(
    query_engine,
    name="space_facts_tool",
    description="A RAG engine with information about fun space facts."
)

The FunctionTool wraps the add and multiply function and exposes them as tools. The agent can now access these tools to perform calculations.
The QueryEngineTool wraps the query_engine to allow the agent to query and retrieve information from the vector store. We've named it space_facts_tool with a description, indicating that this tool can retrieve information about space facts. You can ingest anything and customize the tool as per the ingested data.

Creating the agent

We will now create the agent using ReActAgent. The agent will be responsible for deciding when to use the tools and how to respond to queries.

from llama_index.core.agent import ReActAgent

# Create the agent with the tools
agent = ReActAgent.from_tools(
    [multiply_tool, add_tool, space_facts_tool], verbose=True
)

This agent uses ReAct framework, which allows the model to reason and act by utilizing the given tools in a logical sequence. The agent is initialized with the tools we created, and the verbose=True flag will output detailed information on how the agent reasons and executes tasks.

Running the agent

Finally, let's run the agent in an interactive loop where it processes user queries until we exit.

while True:
    query = input("Query: ")

    if query == "/bye":
        exit()

    response = agent.chat(query)
    print(response)
    print("-" * 10)

How the RAG agent works?

When you ask a question related to the documents you ingested, the space_facts_tool i.e., the vector store tool retrieves the relevant information using the query_engine.
When you ask for calculations, the agent uses either add_tool or multiply_tool to perform those tasks.
The agent decides on-the-fly which tool to use based on the user query and provides the output.

Building a simple load balancer in Go

Vivek Alhat — Sat, 07 Sep 2024 12:40:46 +0000

Load balancers are crucial in modern software development. If you've ever wondered how requests are distributed across multiple servers, or why certain websites feel faster even during heavy traffic, the answer often lies in efficient load balancing.

In this post, we'll build a simple application load balancer using Round Robin algorithm in Go. The aim of this post is to understand how a load balancer works under the hood, step by step.

What is a Load Balancer?

A load balancer is a system that distributes incoming network traffic across multiple servers. It ensures that no single server bears too much load, preventing bottlenecks and improving the overall user experience. Load balancing approach also ensure that if one server fails, then the traffic can be automatically re-routed to another available server, thus reducing the impact of the failure and increasing availability.

Why do we use Load Balancers?

High availability: By distributing traffic, load balancers ensure that even if one server fails, traffic can be routed to other healthy servers, making the application more resilient.
Scalability: Load balancers allow you to scale your system horizontally by adding more servers as traffic increases.
Efficiency: It maximizes resource utilization by ensuring all servers share the workload equally.

Load balancing algorithms

There are different algorithms and strategies to distribute the traffic:

Round Robin: One of the simplest methods available. It distributes requests sequentially among the available servers. Once it reaches the last server, it starts again from the beginning.
Weighted Round Robin: Similar to round robin algorithm except each server is assigned some fixed numerical weighting. This given weight is used to determine the server for routing traffic.
Least Connections: Routes traffic to the server with the least active connections.
IP Hashing: Select the server based on the client's IP address.

In this post, we'll focus on implementing a Round Robin load balancer.

What is a Round Robin algorithm?

A round robin algorithm sends each incoming request to the next available server in a circular manner. If server A handles the first request, server B will handle the second, and server C will handle the third. Once all servers have received a request, it starts again from server A.

Now, let's jump into the code and build our load balancer!

Step 1: Define the Load Balancer and Server

type LoadBalancer struct {
    Current int
    Mutex   sync.Mutex
}

We'll first define a simple LoadBalancer struct with a Current field to keep track of which server should handle next request. The Mutex ensures that our code is safe to use concurrently.

Each server we load balance is defined by the Server struct:

type Server struct {
    URL       *url.URL
    IsHealthy bool
    Mutex     sync.Mutex
}

Here, each server has a URL and an IsHealthy flag, which indicates whether the server is available to handle requests.

Step 2: Round Robin Algorithm

The heart of our load balancer is the round robin algorithm. Here's how it works:

func (lb *LoadBalancer) getNextServer(servers []*Server) *Server {
    lb.Mutex.Lock()
    defer lb.Mutex.Unlock()

    for i := 0; i < len(servers); i++ {
        idx := lb.Current % len(servers)
        nextServer := servers[idx]
        lb.Current++

        nextServer.Mutex.Lock()
        isHealthy := nextServer.IsHealthy
        nextServer.Mutex.Unlock()

        if isHealthy {
            return nextServer
        }
    }

    return nil
}

This method loops through the list of servers in a round robin fashion. If the selected server is healthy, it returns that server to handle the incoming request.
We are using Mutex to ensure that only one goroutine can access and modify the Current field of the load balancer at a time. This ensures that the round robin algorithm operates correctly when multiple requests are being processed concurrently.
Each Server also has its own Mutex. When we check the IsHealthy field, we lock the server's Mutex to prevent concurrent access from multiple goroutines.
Without Mutex locking it is possible that another goroutine could be changing the value which could result in reading an incorrect or inconsistent data.
We unlock the Mutex as soon as we have updated the Current field or read the IsHealthy field value to keep the critical section as small as possible. In this way, we are using Mutex to avoid any race condition.

Step 3: Configuring the Load Balancer

Our configuration is stored in a config.json file, which contains the server URLs and health check intervals (more on it in below section).

type Config struct {
    Port                string   `json:"port"`
    HealthCheckInterval string   `json:"healthCheckInterval"`
    Servers             []string `json:"servers"`
}

The configuration file might look like this:

{
  "port": ":8080",
  "healthCheckInterval": "2s",
  "servers": [
    "http://localhost:5001",
    "http://localhost:5002",
    "http://localhost:5003",
    "http://localhost:5004",
    "http://localhost:5005"
  ]
}

Step 4: Health Checks

We want to make sure that the servers are healthy before routing any incoming traffic to them. This is done by sending periodic health checks to each server:

func healthCheck(s *Server, healthCheckInterval time.Duration) {
    for range time.Tick(healthCheckInterval) {
        res, err := http.Head(s.URL.String())
        s.Mutex.Lock()
        if err != nil || res.StatusCode != http.StatusOK {
            fmt.Printf("%s is down\n", s.URL)
            s.IsHealthy = false
        } else {
            s.IsHealthy = true
        }
        s.Mutex.Unlock()
    }
}

Every few seconds (as specified in the config), the load balancer sends a HEAD request to each server to check if it is healthy. If a server is down, the IsHealthy flag is set to false, preventing future traffic from being routed to it.

Step 5: Reverse Proxy

When the load balancer receives a request, it forwards the request to the next available server using a reverse proxy. In Golang, the httputil package provides a built-in way to handle reverse proxying, and we will use it in our code through the ReverseProxy function:

func (s *Server) ReverseProxy() *httputil.ReverseProxy {
    return httputil.NewSingleHostReverseProxy(s.URL)
}

What is a Reverse Proxy?

A reverse proxy is a server that sits between a client and one or more backend severs. It receives the client's request, forwards it to one of the backend servers, and then returns the server's response to the client. The client interacts with the proxy, unaware of which specific backend server is handling the request.

In our case, the load balancer acts as a reverse proxy, sitting in front of multiple servers and distributing incoming HTTP requests across them.

Step 6: Handling Requests

When a client makes a request to the load balancer, it selects the next available healthy server using the round robin algorithm implementation in getNextServer function and proxies the client request to that server. If no healthy server is available then we send service unavailable error to the client.

http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        server := lb.getNextServer(servers)
        if server == nil {
            http.Error(w, "No healthy server available", http.StatusServiceUnavailable)
            return
        }
        w.Header().Add("X-Forwarded-Server", server.URL.String())
        server.ReverseProxy().ServeHTTP(w, r)
    })

The ReverseProxy method proxies the request to the actual server, and we also add a custom header X-Forwarded-Server for debugging purposes (though in production, we should avoid exposing internal server details like this).

Step 7: Starting the Load Balancer

Finally, we start the load balancer on the specified port:

log.Println("Starting load balancer on port", config.Port)
err = http.ListenAndServe(config.Port, nil)
if err != nil {
        log.Fatalf("Error starting load balancer: %s\n", err.Error())
}

Working Demo

TL;DR

In this post, we built a basic load balancer from scratch in Golang using a round robin algorithm. This is a simple yet effective way to distribute traffic across multiple servers and ensure that your system can handle higher loads efficiently.

There's a lot more to explore, such as adding sophisticated health checks, implementing different load balancing algorithms, or improving fault tolerance. But this basic example can be a solid foundation to build upon.

You can find the source code in this GitHub repo.

Building a tiny vector store from scratch

Vivek Alhat — Mon, 26 Aug 2024 20:10:25 +0000

With the evolving landscape of generative AI, vector databases are playing crucial role in powering generative AI applications. There are so many vector databases currently available that are open source such as Chroma, Milvus along with other popular proprietary vector databases such as Pinecone, SingleStore. You can read the detailed comparison of different vector databases on this site.

But, have you ever wondered how these vector databases work behind the scenes?

A great way to learn something is to understand how things work under the hood. In this article, we will be building a tiny in-memory vector store "Pixie" from scratch using Python with only NumPy as a dependency.

Before diving into the code, let's briefly discuss what a vector store is.

What is a vector store?

A vector store is a database designed to store and retrieve vector embeddings efficiently. These embeddings are numerical representations of data (often text but could be images, audio etc.) that capture semantic meaning in a high-dimensional space. The key feature of a vector store is their ability to perform efficient similarity searches, finding the most relevant data points based on their vector representations. Vector stores can be used in many tasks such as:

Semantic search
Retrieval augmented generation (RAG)
Recommendation system

Let's code

In this article, we are going to create a tiny in-memory vector store called "Pixie". While it won't have all the optimizations of a production-grade system, it will demonstrate the core concepts. Pixie will have two main functionalities:

Storing document embeddings
Performing similarity searches

Setting up the vector store

First, we'll create a class called Pixie:

import numpy as np
from sentence_transformers import SentenceTransformer
from helpers import cosine_similarity


class Pixie:
    def __init__(self, embedder) -> None:
        self.store: np.ndarray = None
        self.embedder: SentenceTransformer = embedder

First, we import numpy for efficient numerical operations and storing multi-dimensional arrays.
We will also import SentenceTransformer from sentence_transformers library. We are using SentenceTransformer for embeddings generation, but you could use any embedding model that converts text to vectors. In this article, our primary focus will be on vector store itself, and not on embeddings generation.
Next, we'll initialize Pixie class with an embedder. The embedder can be moved outside of the main vector store but for simplicity purposes, we'll initialize it inside the vector store class.
self.store will hold our document embeddings as a NumPy array.
self.embedder will hold the embedding model that we'll use to convert documents and queries into vectors.

Ingesting documents

To ingest documents/data in our vector store, we'll implement the from_docs method:

def from_docs(self, docs):
        self.docs = np.array(docs)
        self.store = self.embedder.encode(self.docs)
        return f"Ingested {len(docs)} documents"

This method does few key things:

It takes a list of documents and stores them as a NumPy array in self.docs.
It uses the embedder model to convert each document into a vector embedding. These embeddings are stored in self.store.
It returns a message confirming how many documents were ingested. The encode method of our embedder is doing the heavy lifting here, converting each text document into a high-dimensional vector representation.

Performing similarity search

The heart of our vector store is the similarity search function:

def similarity_search(self, query, top_k=3):
        matches = list()
        q_embedding = self.embedder.encode(query)
        top_k_indices = cosine_similarity(self.store, q_embedding, top_k)
        for i in top_k_indices:
            matches.append(self.docs[i])
        return matches

Let's break this down:

We start by creating an empty list called matches to store our matches.
We encode the user query using the same embedder model we used for ingesting the documents. This ensures that the query vector is in the same space as our document vectors.
We call a cosine_similarity function (which we'll define next) to find the most similar documents.
We use the returned indices to fetch the actual documents from self.docs.
Finally, we return the list of matching documents.

Implementing cosine similarity

import numpy as np


def cosine_similarity(store_embeddings, query_embedding, top_k):
    dot_product = np.dot(store_embeddings, query_embedding)
    magnitude_a = np.linalg.norm(store_embeddings, axis=1)
    magnitude_b = np.linalg.norm(query_embedding)

    similarity = dot_product / (magnitude_a * magnitude_b)

    sim = np.argsort(similarity)
    top_k_indices = sim[::-1][:top_k]

    return top_k_indices

This function is doing several important things:

It calculates the cosine similarity using the formula: cos(θ) = (A · B) / (||A|| * ||B||)
First, we calculate the dot product between the query embeddings and all document embeddings in the store.
Then, we compute the magnitudes (Euclidean norms) of all vectors.
Lastly, we sort the found similarities and return the indices of the top-k most similar documents. We are using cosine similarity because it measures the angle between vectors, ignoring their magnitudes. This means it can find semantically similar documents regardless of their length.
There are other similarity metrics that you can explore such as:
1. Euclidean distance
2. Dot product similarity

You can read more about cosine similarity here.

Piecing everything together

Now that we have built all the pieces, let's understand how they work together:

When we create a Pixie instance, we provide it with an embedding model.
When we ingest documents, we create vector embeddings for each document and store them in self.store.
For a similarity search:
1. We create an embedding for the query.
2. We calculate cosine similarity between the query embeddings and all document embeddings.
3. We return the most similar documents. All the magic happens inside the cosine similarity calculation. By comparing the angle between vectors rather than their magnitude, we can find semantically similar documents even if they use different words or phrasing.

Seeing it in action

Now let's implement a simple RAG system using our Pixie vector store. We'll ingest a story document of a "space battle & alien invasion" and then ask questions about it to see how it generates an answer.

import os
import sys
import warnings

warnings.filterwarnings("ignore")

import ollama
import numpy as np
from sentence_transformers import SentenceTransformer

current_dir = os.path.dirname(os.path.abspath(__file__))
root_dir = os.path.abspath(os.path.join(current_dir, ".."))
sys.path.append(root_dir)

from pixie import Pixie


# creating an instance of a pre-trained embedder model
embedder = SentenceTransformer("all-MiniLM-L6-v2")

# creating an instance of Pixie vector store
pixie = Pixie(embedder)


# generate an answer using llama3 and context docs
def generate_answer(prompt):
    response = ollama.chat(
        model="llama3",
        options={"temperature": 0.7},
        messages=[
            {
                "role": "user",
                "content": prompt,
            },
        ],
    )
    return response["message"]["content"]


with open("example/spacebattle.txt") as f:
    content = f.read()
    # ingesting the data into vector store
    ingested = pixie.from_docs(docs=content.split("\n\n"))
    print(ingested)

# system prompt
PROMPT = """
    User has asked you following question and you need to answer it based on the below provided context. 
If you don't find any answer in the given context then just say 'I don't have answer for that'. 
In the final answer, do not add "according to the context or as per the context". 
You can be creative while using the context to generate the final answer. DO NOT just share the context as it is.

    CONTEXT: {0}
    QUESTION: {1}

    ANSWER HERE:
"""

while True:
    query = input("\nAsk anything: ")
    if len(query) == 0:
        print("Ask a question to continue...")
        quit()

    if query == "/bye":
        quit()

    # search similar matches for query in the embedding store
    similarities = pixie.similarity_search(query, top_k=5)
    print(f"query: {query}, top {len(similarities)} matched results:\n")

    print("-" * 5, "Matched Documents Start", "-" * 5)
    for match in similarities:
        print(f"{match}\n")
    print("-" * 5, "Matched Documents End", "-" * 5)

    context = ",".join(similarities)
    answer = generate_answer(prompt=PROMPT.format(context, query))
    print("\n\nQuestion: {0}\nAnswer: {1}".format(query, answer))

    continue

Here is the output:

Ingested 8 documents

Ask anything: What was the invasion about?
query: What was the invasion about?, top 5 matched results:

----- Matched Documents Start -----
Epilogue: A New Dawn
Years passed, and the alliance between humans and Zorani flourished. Together, they rebuilt what had been lost, creating a new era of exploration and cooperation. The memory of the Krell invasion served as a stark reminder of the dangers that lurked in the cosmos, but also of the strength that came from unity. Admiral Selene Cortez retired, her name etched in the annals of history. Her legacy lived on in the new generation of leaders who continued to protect and explore the stars. And so, under the twin banners of Earth and Zorani, the galaxy knew peace—a fragile peace, hard-won and deeply cherished.

Chapter 3: The Invasion
Kael's warning proved true. The Krell arrived in a wave of bio-mechanical ships, each one bristling with organic weaponry and shields that regenerated like living tissue. Their tactics were brutal and efficient. The Titan Fleet, caught off guard, scrambled to mount a defense. Admiral Cortez's voice echoed through the corridors of the Prometheus. "All hands to battle stations! Prepare to engage!" The first clash was catastrophic. The Krell ships, with their organic hulls and adaptive technology, sliced through human defenses like a knife through butter. The outer rim colonies fell one by one, each defeat sending a shockwave of despair through the fleet. Onboard the Prometheus, Kael offered to assist, sharing Zorani technology and knowledge. Reluctantly, Cortez agreed, integrating Kael’s insights into their strategy. New energy weapons were developed, capable of piercing Krell defenses, and adaptive shields were installed to withstand their relentless attacks.

Chapter 5: The Final Battle
Victory on Helios IV was a much-needed morale boost, but the war was far from over. The Krell regrouped, launching a counter-offensive aimed directly at Earth. Every available ship was called back to defend humanity’s homeworld. As the Krell armada approached, Earth’s skies filled with the largest fleet ever assembled. The Prometheus led the charge, flanked by newly built warships and the remaining Zorani vessels that had joined the fight. "This is it," Cortez addressed her crew. "The fate of our species depends on this battle. We hold the line here, or we perish." The space above Earth turned into a maelstrom of fire and metal. Ships collided, energy beams sliced through the void, and explosions lit up the darkness. The Krell, relentless and numerous, seemed unbeatable. In the midst of chaos, Kael revealed a hidden aspect of Zorani technology—a weapon capable of creating a singularity, a black hole that could consume the Krell fleet. It was a desperate measure, one that could destroy both fleets. Admiral Cortez faced an impossible choice. To use the weapon would mean sacrificing the Titan Fleet and potentially Earth itself. But to do nothing would mean certain destruction at the hands of the Krell. "Activate the weapon," she ordered, her voice heavy with resolve. The Prometheus moved into position, its hull battered and scorched. As the singularity weapon charged, the Krell ships converged, sensing the threat. In a blinding burst of light, the weapon fired, tearing the fabric of space and creating a black hole that began to devour everything in its path.

Chapter 1: The Warning
It began with a whisper—a distant signal intercepted by the outermost listening posts of the Titan Fleet. The signal was alien, unlike anything the human race had ever encountered. For centuries, humanity had expanded its reach into the cosmos, colonizing distant planets and establishing trade routes across the galaxy. The Titan Fleet, the pride of Earth's military might, stood as the guardian of these far-flung colonies.Admiral Selene Cortez, a seasoned commander with a reputation for her sharp tactical mind, was the first to analyze the signal. As she sat in her command center aboard the flagship Prometheus, the eerie transmission played on a loop. It was a distress call, but its origin was unknown. The message, when decoded, revealed coordinates on the edge of the Andromeda Sector. "Set a course," Cortez ordered. The fleet moved with precision, a testament to years of training and discipline.

Chapter 4: Turning the Tide
The next battle, over the resource-rich planet of Helios IV, was a turning point. Utilizing the new technology, the Titan Fleet managed to hold their ground. The energy weapons seared through Krell ships, and the adaptive shields absorbed their retaliatory strikes. "Focus fire on the lead ship," Cortez commanded. "We break their formation, we break their spirit." The flagship of the Krell fleet, a massive dreadnought known as Voreth, was targeted. As the Prometheus and its escorts unleashed a barrage, the Krell ship's organic armor struggled to regenerate. In a final, desperate maneuver, Cortez ordered a concentrated strike on Voreth's core. With a blinding flash, the dreadnought exploded, sending a ripple of confusion through the Krell ranks. The humans pressed their advantage, driving the Krell back.
----- Matched Documents End -----


Question: What was the invasion about?
Answer: The Krell invasion was about the Krell arriving in bio-mechanical ships with organic weaponry and shields that regenerated like living tissue, seeking to conquer and destroy humanity.

We have successfully built a tiny in-memory vector store from scratch by using Python and NumPy. While it is very basic, it demonstrates the core concepts such as vector storage, and similarity search. Production grade vector stores are much more optimized and feature-rich.

Github repo: Pixie

Happy coding, and may your vectors always point in the right direction!

Load Balancers in AWS

Vivek Alhat — Sun, 30 Jun 2024 18:12:23 +0000

Load balancers are servers that forward traffic to multiple servers downstream. They are crucial for distributing incoming traffic across different servers such as EC2 instances, in multiple Availability Zones. This increases high availability of your application. A load balancer ensures that no single server bears too much load, thus enhancing the performance and reliability of your application.

AWS Elastic Load Balancer (ELB) is a managed load balancer. It is integrated with many AWS services including EC2, ECS, Route53, and CloudWatch. While it might be costlier than setting up your own load balancer, the time and effort saved in managing and configuring ELB make it a preferred choice for many.

AWS Elastic Load Balancer has following types of managed load balancers:

Classic Load Balancer (old generation)
Application Load Balancer
Network Load Balancer
Gateway Load Balancer

Classic Load Balancer (CLB) comes under the old generation in AWS. It is recommended to use the newer generation of load balancers as they provide more features.

Following are some of the key features of load balancers:

They distribute traffic across multiple downstream instances, ensuring efficient handling of requests.
They provide a single DNS point of access for your application.
They can seamlessly manage failures in downstream instances.
They perform regular health checks on your instances to ensure only healthy instances receive traffic.
They can operate across multiple AZs for high availability.
They can segment public and private traffic.

In this article, we will explore Application Load Balancer (ALB) in detail.

Application Load Balancer (ALB)

An Application Load Balancer (ALB) operates at application layer of the OSI model making it ideal for HTTP/HTTPS traffic. It supports advanced routing mechanisms such as:

Path based routing
Hostname based routing
Query string or header based routing

ALB can route traffic to multiple target groups, including EC2 instances, ECS tasks, Lambda functions, or private IP addresses. It supports modern HTTP, HTTPS, HTTP/2, WebSocket protocols. ALB provides a single DNS name that clients can use to access your application thus simplifying DNS management.

Let's create a simple Application Load Balancer between two EC2 instances. The purpose of this load balancer will be to distribute traffic to EC2 instances. To follow the below steps, you need to create minimum 2 EC2 instances. You can refer this article to learn about creating a new EC2 instance.

Steps for creating a new Application Load Balancer

On EC2 homepage, select Load Balancers option in the menu and click on Create load balancer option.
Select Application Load Balancer as a type and click create.
Give a name to the load balancer, select internet facing as a scheme and IPv4 as IP address type.
Select availability zone mapping in which load balancer will route the traffic.
Select or create a new security group for load balancer. You can add inbound rules specific to your use case to allow traffic to the EC2 instances.
Create a new target group to which load balancer will route the incoming traffic. In this example, we will be routing the traffic via load balancers to two EC2 instances.
On Create target group page, select Instances as a target type and add a name for the target group. You can keep other settings as default.
On the register targets page, select the EC2 instances to which load balancer will route the incoming traffic and click on Include as pending below option. After registering the targets, click on Create target group option.
Select the newly created target group in load balancer configuration page under Listening and routing section.
Click on Create load balancer option to create a new load balancer to route traffic between the selected target group based on the configuration.

After creating a new load balancer, any incoming traffic to the EC2 instances will be handled by the rules mentioned in the load balancer. You can also specify custom rules inside the load balancer. Let's create a new custom rule to handle error route in the application.

Creating a custom rule in Load Balancer

Select the load balancer and under Listeners and rules section, select the default HTTP:80 listener.
Click on Add rule option and add a name to the custom rule.
In Define rule conditions, add a path based condition to match /error path.
Under Define rule actions, select Return fixed response option add a response body to be displayed when the /error route is accessed.
Set rule priority as 1 and click next and create the custom rule.

After creating a custom rule, if you access the /error path on the Load Balancer's DNS address, you will see the custom error response body as configured.

In this way, you can create a load balancer and custom rules using AWS Elastic Load Balancer. You can refer the official user guide to learn more about load balancing in AWS.

Scaling databases with AWS RDS and read replicas

Vivek Alhat — Sun, 30 Jun 2024 07:41:42 +0000

Amazon Web Services (AWS) Relational Database Service (RDS) is a managed service that simplifies the process of configuring, operating, and scaling a relational database in the cloud. AWS RDS is a versatile choice that supports several database engines including PostgreSQL, MySQL, Oracle, MariaDB, Microsoft SQL Server, and Amazon's own Aurora.

Advantages of using RDS over a self hosted database

Managed Service - AWS RDS is a fully managed service, which means that AWS takes care of provisioning, patching, backup, recovery, and even scaling. This allows developers and database administrators to focus on the application development and optimization rather than managing the underlying infrastructure.
Continuous Backups - RDS provides continuous backups based on a defined retention period. It ensures that your data is always protected and can be restored when necessary. This automatic backup process eliminates the risk of data loss due to unforeseen events.
Point in Time Restore - With RDS, you can restore your database to any specific point within your backup retention period.
Read Replicas - RDS supports the creation of read-only replicas of the master database. This helps in reducing the load on the primary database. Read replicas are ideal for read-heavy applications. This helps in distributing the read traffic and increasing the performance.
Monitoring Dashboards - AWS CloudWatch integration allows you to monitor key metrics such as network traffic, CPU utilization, disk I/O, and memory usage without the need of any third-party tools.

Creating a new database

AWS RDS provides two methods for creating a new database.

Standard create - You can set all the configuration options including for availability, security, backups and maintenance. You have full control on configuring your database needs.
Easy create - It uses recommended best-practice configurations. Some configurable options can be changed even after the database is created.

Let's create a new PostgreSQL database using easy create method.

Select Easy create method for database creation.
Select PostgreSQL as a database engine.
Select your preferred database instance size. You can select either production, dev/test, or free tier if applicable.
Add a name for your database instance and for master user.
You can configure the credentials manager as AWS Secrets Manager or self manage the credentials.
Click on Create database option to create your new PostgreSQL database.
It takes a couple of minutes to get a new database up and running on AWS cloud.

Auto Scaling RDS

RDS Auto Scaling helps in dynamically increasing the storage capacity of your database instance without manual intervention. This feature is particularly useful for applications with unpredictable workloads. You can set Maximum Storage Threshold, which is the highest limit for database storage to prevent infinite scaling.

RDS auto scales under the following conditions:

Storage is less than 10%.
Low storage lasts for at least 5 minutes.
6 hours have passed since the last storage modification.

Read Replica

Read replica is one of the powerful feature of RDS that allow you to create up to 15 read-only copies of your primary database. These replicas can be located within the same availability zone (AZ), across different AZs, or even in different regions.

Key benefits of using Read Replicas:

You can offload the read traffic to read replicas. This significantly reduces the load on primary database thus enhances overall performance.
Read replicas can be promoted to a standalone database in case of a failure of the primary database. It helps in disaster recovery.
You can scale your read-heavy applications by distributing read traffic across multiple replicas.

Following things you should consider while creating a replica:

Data replication to read replicas is asynchronous. It means there could be a slight delay which is also known as replication lag before changes in the primary database are reflected in the replicas.
Applications must use the specific connections strings for read replicas to route the read traffic appropriately.
Replication within the same region is free but cross region replication incurs additional cost.

Let's create a read replica of the database that we created before.

Select Create read replica option from the actions menu of database homepage.
Select the replica source and add a replica database instance identifier.
You can select the destination region where the replica will be launched.
Configure other settings as required and click on Create read replica option to create a read replica of your primary database instance.

RDS Multi AZ for Disaster Recovery

RDS multi AZ deployments provide high availability and reliability for database instances. In a multi AZ setup, any changes made to the primary database are synchronously replicated to a standby instance in a different availability zone (AZ). This ensures that the standby instance in another AZ is always up-to-date with the primary database.

Key features of multi AZ deployments:

Automatic Failover - In the event of a failure in the primary database, RDS automatically switches to the standby instance in another AZ. This ensures continuous operation with minimal downtime.
High Availability - Multi AZ deployments are designed for high availability and disaster recovery. They are not meant for scaling read operations. However, you can combine multi AZ setup with read replicas for additional disaster recovery capabilities.

You can refer the official user guide to learn more about AWS RDS.

Beginners guide to AWS EC2

Vivek Alhat — Sun, 31 Mar 2024 11:04:33 +0000

AWS EC2 stands for Elastic Compute Cloud. It is one of the most popular and widely used service offered by AWS. EC2 is a foundational pillar in the digital landscape of AWS ecosystem. It offers flexible and scalable infrastructure for businesses to thrive in the cloud.

With EC2 users can provision virtual servers within minutes, choosing from diverse instance types tailored to specific workload requirements. Before EC2, managing computing resources posed challenges such as upfront investments in hardware and inefficient scaling. With EC2, businesses can provision servers on-demand, paying only for what they use, leading to increased agility, scalability, and innovation.

Following are different types of instance types EC2 provides:

General purpose
Compute optimized
Memory optimized
Storage optimized
Accelerated computing

You can find more details about different instance types here.

Launching a new instance

To launch a new EC2 instance, go to EC2 dashboard page.

Click on “Launch instances” button.

Give a name to the EC2 instance. Under “Application and OS images (AMI)” section. Select any available AMI. I have selected “Amazon Linux” as the AMI for this instance. An AMI, or Amazon Machine Image is a pre-configured template used to create virtual machines (instances) within EC2. It serves as a blueprint for launching EC2 instances, providing the necessary operating system, software packages, configurations, and even data stored on the instance’s root volume.

Select an appropriate instance type as per your use case and workload. For the current instance I am going with t2.micro type of instance. Create a new key pair login credentials. A key pair login is a security credentials that you can use to prove your identity when connecting to the EC2 instance.

Under the “Network settings”, you can configure a firewall and network settings for the EC2 instance. You can create a security group and specify inbound and outbound traffic rules to restrict access to the EC2 instance. We will discuss more about security groups in later sections.

You can also configure storage for EC2 instance. AWS also offers EBS (Elastic Block Store), an easy-to-use, scalable block store service that you can use for EC2. For now, I am keeping the default configuration for storage.

Once you are done configuring the instance, click on “Launch instance” button to start the instance. After the instance is successfully launched. You can see its details in the “instances” section of EC2 dashboard.

Instance Operations

You can stop, terminate, or reboot an EC2 instance from the dashboard by selecting the instance and clicking on “Instance state” option.

Security Groups

A security group acts as a virtual firewall for EC2. Security groups are used to control the inbound and outbound traffic for an EC2 instance.

Inbound rules: these rules define the incoming traffic allowed to reach the EC2 instances. You can configure inbound rules to permit specific IP addresses, ranges, or protocols (such as SSH for remote access or HTTP for web traffic)
Outbound rules: these rules define the traffic allowed to leave the EC2 instances. These rules control the communication initiated by the instances.

A security group can be used with multiple EC2 instances and it can also reference another security group. Security groups are region/VPC specific which means that a security group created in ap-south-1 cannot be used in ap-northeast-3 region.

Connecting to EC2 instance

Primarily there are two ways of connecting to an EC2 instance.

SSH: You can connect to the instance using SSH. To connect using SSH you need to prove your identity using a key value pair that we created when launching the instance.
EC2 Instance Connect: Using this way, you can connect to the instance using browser based SSH client.

To view more options on how to connect to the instance. Click on “Connect” button after selecting the instance from the dashboard page.

DEV Community: Vivek Alhat

The Builder Design Pattern

Why Use the Builder Pattern?

Example: Creating a User Object with a Builder

Step 1: Define the User Class

Step 2: Create a UserBuilder Class

Step 3: Using the UserBuilder

Benefits of Using the Builder Pattern

When to Use the Builder Pattern

Conclusion

The Singleton Design Pattern

What is the Singleton Design Pattern?

Characteristics of the Singleton Pattern

How to Implement the Singleton Pattern

Explanation:

Use Cases of Singleton Pattern

Pros and Cons of Singleton Pattern

Pros:

Cons:

Conclusion

Understanding Design Patterns: A Beginner's Guide

What Are Design Patterns?

Why Use Design Patterns?

Types of Design Patterns

1. Creational Patterns

2. Structural Patterns

3. Behavioral Patterns

Pros and Cons of Design Patterns

✅ Pros:

❌ Cons:

References

Conclusion

Building a Multi-Agent Blog Writer Using Crew AI

What is Crew AI?

Key Concepts

1. Setting Up the Environment

2. Initialize the LLM

3. Defining Agents

4. Defining Tasks

5. Assembling the Crew

6. Kicking Off the Process

How It Works

Building a simple rate limiter in Go

What Is Rate Limiting?

How Does Rate Limiting Work?

Building a Simple Rate Limiter in Go

Step 1: Define the Rate Limiter Structure

Step 2: Implement Token Refill Logic

Step 3: Check If a Request Is Allowed

Step 4: Apply Rate Limiting Per IP

Step 5: Create Middleware for Rate Limiting

Step 6: Set Up the Server

Testing the Rate Limiter

Source Code

Understanding Database Consistency

What Is Database Consistency?

Types of Database Consistency

Differences Between Strong and Eventual Consistency

Why Does Consistency Matter?

Balancing Consistency with Other Factors

Conclusion

Building a simple RAG agent with LlamaIndex

What is context augmentation?

What are agents?

Building a RAG agent

Installing dependencies

Setting up LLM and loading documents

Creating custom functions for agent

Creating tools for the agent

Creating the agent

Running the agent

How the RAG agent works?

Building a simple load balancer in Go

What is a Load Balancer?

Why do we use Load Balancers?

Load balancing algorithms

What is a Round Robin algorithm?

Step 1: Define the Load Balancer and Server

Step 2: Round Robin Algorithm

Step 3: Configuring the Load Balancer

Step 1: Define the `User` Class

Step 2: Create a `UserBuilder` Class

Step 3: Using the `UserBuilder`