DEV Community: nayanraj adhikary

Building Your Own GitHub Copilot: A Step-by-Step Guide to Code Completion Tools

nayanraj adhikary — Sat, 14 Sep 2024 03:30:52 +0000

Ever thought building a code completion tool like GitHub Copilot was complex? Surprisingly, it’s not as hard as it seems!

As an engineer, I’ve always been fascinated by how code completion tools work under the hood. So, I reverse-engineered the process to see if I could build one myself.

Here is one i build myself and published it - LLM-Autocompleter

With AI-assisted tools becoming the norm in software development, creating your own code completion tool is a great way to learn about Language Server Protocol (LSP), APIs, and integration with advanced models like OpenAI's GPT. Plus, it’s an incredibly rewarding project.

Code completion tools essentially combine a Language Server Protocol (LSP) server with inline code completion mechanisms from platforms like VS Code. In this tutorial, we'll leverage VS Code’s inline completion API and build our own LSP server.

Before we dive in, let's understand what an LSP Server is.

Language Server Protocol (LSP)

An LSP server is a backend service that provides language-specific features to text editors or Integrated Development Environments (IDEs). It acts as a bridge between the editor (the client) and the language-specific tools, delivering features like:

Code completion (suggesting snippets of code as you type),
Go-to definition (navigating to the part of the code where a symbol is defined),
Error checking (highlighting syntax errors in real-time).

The idea behind the Language Server Protocol (LSP) is to standardize the protocol for how such servers and development tools communicate. This way, a single Language Server can be re-used in multiple development tools and LSP is just a protocol.

By standardizing how these servers communicate with editors through LSP, developers can create language-specific features that work seamlessly across a variety of platforms, like VS Code, Sublime Text, and even Vim.

Now that you understand the basics of LSP, let’s dive into building our own code completion tool, step by step.

We’ll begin by using a sample inline completion extension provided by VS Code. You can clone it directly from GitHub:

vscode-sample-inlinecompletion

now lets go we setting up the lsp-server, you can follow the below structure



.
├── client // Language Client
│   ├── src
│   │   ├── test // End to End tests for Language Client / Server
│   │   └── extension.ts // Language Client entry point
├── package.json // The extension manifest.
└── server // Language Server
    └── src
        └── server.ts // Language Server entry point

for more information you take a look into as well lsp-sample

Code

I would be giving you bit's of code, You have to stitch things together i want you guys to learn. The below image shows what we are going to build.

Lets go to client/src/extension.ts and remove everything from activate function



export function activate(context: ExtensionContext) {
}

lets start the setup

Creating an lsp client and start it.

serverModule: Points to the path of the language server’s main script.
debugOptions: Useful for running the server in debug mode.

extension.ts



export function activate(context: ExtensionContext) {
    const serverModule = context.asAbsolutePath(
        path.join("server", "out", "server.js")
    );

    const debugOptions = { execArgv: ['--nolazy', '--
 inspect=6009'] };

  // communication with the server using Stdio
    const serverOptions: ServerOptions = {
        run: {
            module: serverModule,
            transport: TransportKind.stdio,
        },
        debug: {
            module: serverModule,
            transport: TransportKind.stdio,
            options: debugOptions
        }
    };

        const clientOptions: LanguageClientOptions = {
        documentSelector: [{ scheme: 'file' }],
        initializationOptions: serverConfiguration
    };


     client = new LanguageClient(
        'LSP Server Name',
        serverOptions,
        clientOptions
    );

  client.start();
}

Receive data on lsp server go to server/src/server.ts

Some bit for information

we have different types of protocol we can follow to communicate between server and client.
for more information you can go to microsoft-lsp-docs

Why stdio? Stdio is one of the most widely supported communication protocols between clients and servers. It allows the LSP server we’re building to work not only in VS Code but also in other editors like Vim and Sublime Text.

server.ts



const methodStore: Record<string, any> = {
  exit, 
  initialize,
  shutdown,
};

process.stdin.on("data", async (bufferChuck) => {
    buffer += bufferChuck;

    while (true) {
        try {
            // Check for the Content-Length line
            const lengthMatch = buffer.match(/Content-Length: (\d+)\r\n/);
            if (!lengthMatch) break;

            const contentLength = parseInt(lengthMatch[1], 10);
            const messageStart = buffer.indexOf("\r\n\r\n") + 4;

            // Continue unless the full message is in the buffer
            if (buffer.length < messageStart + contentLength) break;

            const rawMessage = buffer.slice(messageStart, messageStart + contentLength);
            const message = JSON.parse(rawMessage);


            const method = methodStore[message.method];

            if (method) {
                const result = await method(message);

                if (result !== undefined) {
                    respond(message.id, result);
                }
            }
            buffer = buffer.slice(messageStart + contentLength);
        } catch (error: any) {

            const errorMessage = {
                jsonrpc: "2.0",
                method: "window/showMessage",
                params: {
                    type: 1, // Error type
                    message: `Error processing request: ${error.message}`
                }
            };

            const errorNotification = JSON.stringify(errorMessage);
            const errorNotificationLength = Buffer.byteLength(errorNotification, "utf-8");
            const errorHeader = `Content-Length: ${errorNotificationLength}\r\n\r\n`;

            process.stdout.write(errorHeader + errorNotification);
        }
    }
});

initialize.ts



export const initialize = (message: RequestMessage): InitializeResult => {

    return {
        capabilities: {
            completionProvider: {
                resolveProvider: true
            },
            textDocumentSync: TextDocumentSyncKind.Incremental,
            codeActionProvider: {
                resolveProvider: true 
            }
        },
        serverInfo: {
            name: "LSP-Server",
            version: "1.0.0",
        },
    };
};

exit.ts



export const exit = () => {
    process.exit(0);
  };

shutdown.ts



export const shutdown = () => {
    return null;
  };

Once done with basic function, you can now run the vscode in debugging mode using F5 key on keyboard or follow debugging-guide

Now lets start with adding in-line provider and get the request and response according

Let's add a new method into the methodStore

server.ts



const methodStore: Record<string, any> = {
  exit, 
  initialize,
  shutdown,
  "textDocument/generation": generation
};

generation.ts



export const generation = async (message: any) => {
    if(!message && message !== undefined) return {};

    const text = message.params.textDocument.text as string;

    if(!text) return {};

        const cursorText = getNewCursorText(text, message.params.position.line, message.params.position.character);

  const response = await getResponseFromOpenAI(cursorText, message.params.fsPath);

 return {
    generatedText: response,
  }

}

function getNewCursorText(text: string, line: number, character: number): string {
    const lines = text.split('\n');
    if (line < 0 || line >= lines.length) return text;

    const targetLine = lines[line];
    if (character < 0 || character > targetLine.length) return text;

    lines[line] = targetLine.slice(0, character) + '<CURSOR>' + targetLine.slice(character);
    return lines.join('\n');
}


const getResponseFromOpenAI = async (text: string, fsPath: stiring): Promise<string> => {
     const message = {
          "role": "user",
          "content": text
    };

   const systemMetaData: Paramaters = {
    max_token: 128,
    max_context: 1024,
    messages: [],
    fsPath: fsPath
   } 

   const messages = [systemPrompt(systemMetaData), message]

   const chatCompletion: OpenAI.Chat.ChatCompletion | undefined = await this.open_ai_client?.chat.completions.create({
            messages: messages,
            model: "gpt-3.5-turbo",
            max_tokens: systemMetaData?.max_tokens ?? 128,
        });


        if (!chatCompletion) return "";

        const generatedResponse = chatCompletion.choices[0].message.content;

        if (!generatedResponse) return "";

        return generatedResponse;
}

template.ts



interface Parameters {
    max_tokens: number;
    max_context: number;
    messages: any[];
    fsPath: string;
}

 export const systemPrompt = (paramaters: Parameters | null) => {
    return {
        "role": "system",
        "content": `
        Instructions:
            - You are an AI programming assistant.
            - Given a piece of code with the cursor location marked by <CURSOR>, replace <CURSOR> with the correct code.
            - First, think step-by-step.
            - Describe your plan for what to build in pseudocode, written out in great detail.
            - Then output the code replacing the <CURSOR>.
            - Ensure that your completion fits within the language context of the provided code snippet.
            - Ensure, completion is what ever is needed, dont write beyond 1 or 2 line, unless the <CURSOR> is on start of a function, class or any control statment(if, switch, for, while).

            Rules:
            - Only respond with code.
            - Only replace <CURSOR>; do not include any previously written code.
            - Never include <CURSOR> in your response.
            - Handle ambiguous cases by providing the most contextually appropriate completion.
            - Be consistent with your responses.
            - You should only generate code in the language specified in the META_DATA.
            - Never mix text with code.
            - your code should have appropriate spacing.

            META_DATA: 
            ${paramaters?.fsPath}`
    };  
};

Let's now register the inline providers

extension.ts




import {languages} from "vscode";


function getConfiguration(configName: string) {
    if(Object.keys(workspace.getConfiguration(EXTENSION_ID).get(configName)).length > 0){
        return workspace.getConfiguration(EXTENSION_ID).get(configName);
    }
    return null;
}

const inLineCompletionConfig = getConfiguration("inlineCompletionConfiguration");

export function activate(context: ExtensionContext) {
 // OTHER CODE

  languages.registerInlineCompletionItemProvider(
        { pattern: "**" },
        {
            provideInlineCompletionItems: (document: TextDocument, position: Position) => {
                const mode = inLineCompletionConfig["mode"] || 'slow';
                return provideInlineCompletionItems(document, position, mode);
            },
        }

    );

} 



let lastInlineCompletion = Date.now();
let lastPosition: Position | null = null;
let inlineCompletionRequestCounter = 0;

const provideInlineCompletionItems = async (document: TextDocument, position: Position, mode: 'fast' | 'slow') => {
    const params = {
        textDocument: {
            uri: document.uri.toString(),
            text: document.getText(),
        },
        position: position,
        fsPath: document.uri.fsPath.toString()
    };

    inlineCompletionRequestCounter += 1;
    const localInCompletionRequestCounter = inlineCompletionRequestCounter;
    const timeSinceLastCompletion = (Date.now() - lastInlineCompletion) / 1000;
    const minInterval = mode === 'fast' ? 0 : 1 / inLineCompletionConfig["maxCompletionsPerSecond"];

    if (timeSinceLastCompletion < minInterval) {
        await new Promise(r => setTimeout(r, (minInterval - timeSinceLastCompletion) * 1000));
    }

    if (inlineCompletionRequestCounter === localInCompletionRequestCounter) {
        lastInlineCompletion = Date.now();

        let cancelRequest = CancellationToken.None;
        if (lastPosition && position.isAfter(lastPosition)) {
            cancelRequest = CancellationToken.Cancelled;
        }
        lastPosition = position;

        try {
            const result = await client.sendRequest("textDocument/generation", params, cancelRequest);


            const snippetCode = new SnippetString(result["generatedText"]);
            return [new InlineCompletionItem(snippetCode)];
        } catch (error) {
            console.error("Error during inline completion request", error);
            client.sendNotification("window/showMessage", {
                type: 1, // Error type
                message: "An error occurred during inline completion: " + error.message
            });
            return [];
        }
    } else {
        return [];
    }
};

This blog provides the foundation you need to build your own code completion tool, but the journey doesn’t end here. I encourage you to experiment, research, and improve upon this code, exploring different features of LSP and AI to tailor the tool to your needs.

Whoever is trying to implement this i want them to learn, research and stitch things together.

What You've Learned

Understanding LSP Servers: You’ve learned what an LSP server is, how it powers language-specific tools, and why it’s critical for cross-editor support.
Building VS Code Extensions: You’ve explored how to integrate code completions into VS Code using APIs.
AI-Driven Code Completion: By connecting to OpenAI’s GPT models, you’ve seen how machine learning can enhance developer productivity with intelligent suggestions.

If you reach here, i love to know what you have learned.

Please Hit a like if you learned something new today from my blog.

Connect with me- linked-In

Boost Performance: Essential Caching Strategies for Web and Mobile

nayanraj adhikary — Sun, 14 Jul 2024 15:16:29 +0000

Introduction

Caching is a game-changer for enhancing the speed and responsiveness of web and mobile applications. In this blog, we’ll explore essential caching strategies for frontend applications, tackle large data handling, and delve into the intricacies of Backward/Forward (B/F) caching.

Key Caching Strategies for Frontend Applications

Browser Caching

Browser caching leverages the browser's ability to store copies of web assets locally, reducing load times and server requests. Here are some crucial aspects:

Cache-Control: This HTTP header dictates the caching policies. For example, Cache-Control: max-age=3600 tells the browser to cache the resource for 3600 seconds.
Expires: This header specifies an exact expiration date/time for the cached resource. It's often used alongside Cache-Control.
ETag: The ETag header provides a unique identifier for resource versions. When a resource changes, its ETag changes, enabling efficient cache validation.

Cache-Control: public, max-age=86400
Expires: Wed, 21 Oct 2024 07:28:00 GMT
ETag: "33a64df5"

Service Workers

Service Workers are scripts that run in the background, providing advanced caching capabilities. They can intercept network requests and serve cached responses, even allowing offline access.

Cache First: Serve from cache if available; if not, fetch from the network.
Network First: Fetch from the network first; if the network is unavailable, serve from cache.
Stale-While-Revalidate: Serve from cache and simultaneously fetch and update the cache in the background.

self.addEventListener('fetch', event => {
  event.respondWith(
    caches.match(event.request).then(response => {
      return response || fetch(event.request);
    })
  );
});

Local Storage and IndexedDB

Local Storage and IndexedDB are browser-based storage solutions for persisting data on the client side.

Local Storage: Ideal for storing small amounts of data as key-value pairs. It is synchronous and has a storage limit of about 5MB.
IndexedDB: Suitable for storing larger amounts of structured data. It supports transactions and complex queries, making it ideal for more substantial and complex data.

Example

Local Storage

localStorage.setItem('key', 'value');
let value = localStorage.getItem('key');

IndexedDB

let request = indexedDB.open('database', 1);
request.onupgradeneeded = event => {
  let db = event.target.result;
  db.createObjectStore('store', { keyPath: 'id' });
};

Browser itself have some techniques of caching, here is one of them.

Deep Dive into Backward/Forward (B/F) Caching

What is B/F Caching?

B/F caching refers to the mechanism where browsers store the state of a web page in the browser's history, enabling users to navigate back and forth without reloading the entire page.

Most of browsers have them, you can explore this from the inspect tab

How B/F Caching Works

Page Cache: The browser stores the complete state of the page, including the DOM, JavaScript context, and in-memory data.
BFCache: Modern browsers (like Chrome and Firefox) use BFCache to preserve the page state in memory, which allows instant navigation.

Benefits of B/F Caching

Faster Navigation: Instant page loads when using the browser's back and forward buttons.
Improved User Experience: Seamless transitions enhance the overall user experience.
Reduced Server Load: Fewer requests to the server as the page state is stored and reused.

Conclusion

Implementing efficient caching strategies can dramatically improve the performance of web and mobile applications. From browser caching and service workers to tackling large data and utilizing B/F caching, these techniques ensure your apps are fast, responsive, and user-friendly. Start leveraging these strategies today to revolutionize your app’s performance!

Hope you have learn something new from this blog. Follow me for short, crisp, deep, unique tech blogs. Thanks!

Mastering Caching in Distributed Systems: Strategies for Consistency and Scalability

nayanraj adhikary — Sun, 30 Jun 2024 14:57:57 +0000

Handling Caching in a Distributed System is difficult but not impossible.

This is going to be long but, informative

I would be referring to Distributed System --> DS

For a Basic Understanding of Caching refer to my previous blogs

Let's not waste any more time here and deep dive into it.

What are you going to learn here

Benefits of caching in DS(performance, latency reduction, load balancing)
Handling Consistency in DS
Ensuring Performance in DS
Ensuring Availability in DS
Implementing Caching at Scale
Real World Example (Netflix, Facebook, Twitter[X.com])

Benefits of Caching

Performance

Caching significantly enhances the performance of distributed systems by storing frequently accessed data in a faster, more accessible location. This reduces the need to fetch data from slower, more distant data sources, such as databases or external services. The performance benefits include:

Reduced Data Retrieval Time
Decreased Server Load
Improved Throughput

Latency Reduction

Latency refers to the time it takes for a request to travel from the client to the server and back. Caching helps reduce latency in several ways:

Proximity of Data / CDN
Elimination of Redundant Processing
Quick Access to Data

Load Balancing

Load balancing ensures that no single server or node becomes overwhelmed with requests, distributing the load evenly across the system. Caching contributes to effective load balancing by:

Spreading Data Requests
Reducing Hotspots
Distributing Cache Loads

Handling Consistency

Consistency Models

Consistency in distributed systems refers to the degree to which different nodes or clients see the same data at the same time. There are several consistency models to consider:

Strong Consistency : Guarantees that all nodes see the same data simultaneously. This model is easiest to reason about but can be challenging to implement at scale due to performance trade-offs.
Eventual Consistency : Ensures that all nodes will eventually see the same data, but not necessarily at the same time. This model is more performant but can lead to temporary inconsistencies.
Causal Consistency : Ensures that causally related operations are seen by all nodes in the same order. This model strikes a balance between strong and eventual consistency.

Techniques for Maintaining Consistency

To maintain consistency across distributed caches, several techniques can be employed:

Cache Invalidation Strategies : Ensure that outdated or stale data is removed from the cache. Common strategies include time-to-live (TTL), manual invalidation, and automatic invalidation based on data changes.
Write-Through, Write-Behind, and Write-Around Caching : These policies define how and when data is written to the cache and the backing store. Don't Know Policy Check Out !
Distributed Consensus Algorithms : Algorithms like Paxos and Raft help maintain consistency by ensuring that all nodes agree on the order of operations.
Conflict Resolution Techniques : Approaches like last-write-wins or vector clocks can help resolve conflicts when concurrent updates occur.

Ensuring Performance

Caching Strategies

Cache-Aside : The application checks the cache before fetching data from the source. If the data is not in the cache, it retrieves and stores it there.
Read-Through : The cache itself loads data from the backend store on a cache miss.
Write-Through : Updates go to both the cache and the backend store simultaneously.
Write-Behind : Updates go to the cache immediately, and the backend store is updated asynchronously

Performance Optimization Techniques

Efficient Cache Eviction Policies : Implementing policies like Least Recently Used (LRU) or Least Frequently Used (LFU) helps manage limited cache space effectively.
Use of In-Memory Caching : In-memory caching solutions like Redis and Memcached offer high-speed data access.
Data Compression : Compressing cached data can save space and reduce I/O times.
Load Balancing and Sharding : Distributing cache data and requests evenly across multiple nodes enhances performance.

Latency Reduction

Geographically Distributed Caches : Using Content Delivery Networks (CDNs) to place caches closer to users reduces latency.
Multi-Tiered Caching : Implementing caching at multiple levels (client-side, edge, server-side) optimizes performance.
Prefetching and Cache Warming : Preloading data into the cache based on anticipated demand reduces cache miss rates.

Ensuring Availability

High Availability Techniques

Replication Strategies : Implementing master-slave or multi-master replication ensures data availability during node failures.
Failover Mechanisms : Automatic failover to backup nodes maintains service continuity during failures.
Data Redundancy : Storing multiple copies of data across different nodes increases fault tolerance.

Fault Tolerance

Handling Node Failures : Using techniques like quorum-based approaches ensures system resilience.
Graceful Degradation Strategies : Ensuring that the system continues to function, albeit with reduced performance, during partial failures.

Monitoring and Alerts

Implementing Health Checks : Regular health checks ensure the cache is functioning correctly.
Real-Time Monitoring Tools : Using tools like Prometheus and Grafana for real-time monitoring.
Automated Alerting Systems : Setting up automated alerts for issues like high latency or node failures.

Implementing Caching at Scale

Scaling caching solutions in distributed systems presents several challenges:

Data Distribution and Partitioning : Distributing data across multiple nodes to ensure even load distribution and high availability.
Load Balancing : Ensuring that no single node becomes a bottleneck by evenly distributing requests across the system.

Several techniques and tools can help implement caching at scale:

Sharding : Dividing the dataset into smaller, manageable pieces (shards) that can be distributed across multiple nodes.
Distributed Caching Solutions : Tools like Memcached, Redis, and Apache Ignite provide robust distributed caching capabilities.
Multi-Tiered Caching : Implementing caching at multiple levels (e.g., client-side, edge, server-side) to optimize performance and resource utilization.

Real-World Examples

In case you have reached reading the blog till here. some of the awesome use-case

Netflix

Netflix is a prime example of a company that leverages distributed caching to efficiently deliver content to millions of users worldwide. There are many more optimizations done here some of them

Content Delivery Network (CDN) : Netflix uses its own CDN, called Open Connect, to cache video content closer to the users. By deploying servers at ISPs (Internet Service Providers), Netflix reduces latency and bandwidth costs while ensuring high-quality video streaming.
Multi-Tiered Caching : Netflix employs a multi-tiered caching strategy, including client-side caches (on users’ devices), edge caches (within the ISP networks), and regional caches. This layered approach ensures that content is served quickly from the nearest cache, minimizing latency and buffering. This includes the Buffer data for Videos.
Personalization and Recommendations : Netflix caches personalized recommendations and metadata about shows and movies. This allows the recommendation engine to quickly provide relevant suggestions without repeatedly querying the backend systems.

Facebook

Facebook uses distributed caching extensively to handle its massive user base and the high volume of interactions on its platform.

Memcached Deployment : Facebook is known for its large-scale deployment of Memcached to cache data retrieved from its databases. This caching layer helps reduce the load on databases, allowing them to scale horizontally and handle more queries efficiently.
TAO (The Associations and Objects) : Facebook developed TAO, a geographically distributed data store that caches and manages the social graph (relationships and interactions between users). TAO ensures that frequently accessed data, such as friend lists and likes, are served quickly, improving the overall user experience. read more.
Edge Caching : To further reduce latency, Facebook employs edge caches that store static content like images, videos, and JavaScript files closer to users. This helps in serving content rapidly, reducing the load on central servers, and improving the site’s responsiveness.

X (formerly know as Twitter)

Twitter faces the challenge of delivering real-time updates to millions of users, which requires efficient caching strategies.

Timeline Caching : Twitter caches timelines (feeds of tweets) to ensure that users see updates quickly. By caching these timelines, Twitter reduces the need to query the database for every user request, significantly improving response times.
Redis for In-Memory Caching : Twitter uses Redis for various caching purposes, including caching user sessions, trending topics, and other frequently accessed data. Redis’s in-memory storage provides fast read and write operations, essential for real-time applications.
CDN for Static Content : Like many other large-scale web services, Twitter uses a CDN to cache static assets, such as images and stylesheets, closer to users. This reduces latency and ensures that content loads quickly.

Lessons Learned and Best Practices

Strategic Placement of Caches : Placing caches at different levels (client-side, edge, server-side) and strategically within the network (e.g., using CDNs) can significantly reduce latency and improve performance.
Efficient Cache Invalidation : Implementing effective cache invalidation strategies is crucial to ensure data consistency. Techniques like TTL, manual invalidation, and automatic invalidation based on data changes are commonly used.
Balancing Consistency and Performance : Understanding the trade-offs between strong consistency and performance is essential. Companies often choose eventual consistency for high-performance use cases while using strong consistency for critical data.
Monitoring and Metrics : Continuous monitoring and metrics collection are vital for understanding cache performance and identifying issues. Tools like Prometheus, Grafana, and custom dashboards are commonly used.
Scalability and Fault Tolerance : Implementing sharding, replication, and failover mechanisms ensures that the caching layer can scale with the system and remain highly available even during failures.

Conclusion

In summary, caching is a powerful tool in distributed systems, but it requires careful consideration of consistency, scalability, and writing policies. By understanding these aspects and implementing best practices, you can design caching solutions that significantly enhance system performance and reliability.

Hope you have learned something from this blog.

Follow me for such interesting content. I keep reading and implementing stuff.

Why CPU was not enough: Need for GPU in the picture of AI

nayanraj adhikary — Mon, 24 Jun 2024 04:04:06 +0000

The CPUs are the jack-of-all-trades but masters of none regarding parallel processing tasks.

Normal People: GPUs are generally used for video games. rty?

Developers: Do GPUs are generally used in ML? rty?

You all are correct. let's dive deep into GPU's

GPU's are used in many different use cases

Graphics Rendering :Rendering high-resolution graphics in video games, simulations, and visual effects. Because nobody wants to play a 3D game that looks like it’s from the ‘90s.
Machine Learning : Training large neural networks for deep learning applications. CPUs trying to handle this is like bringing a knife to a gunfight.
Scientific Computing: Performing large-scale simulations and computations in fields like physics, chemistry, and biology. CPU.
Cryptocurrency Mining: Solving complex cryptographic puzzles in mining cryptocurrencies like Bitcoin. CPUs would need a lifetime supply of energy drinks to keep up.
Data Parallelism : Processing large datasets where the same operation needs to be applied to many data points simultaneously. Think of it as the CPU trying to clone itself a thousand times – it's not going to happen.

I think a lot of questions when it comes to GPU coming into the picture from the perspective of AI/ML

What CPU was not able to handle?
How does CPU and GPU work hand in hand?

Before we go deep, let's see some key milestones

Key milestones in GPU development

1980s : Early graphics accelerators were introduced to offload simple drawing tasks from the CPU.
1990s : The rise of 3D graphics in gaming and professional visualization led to the development of more advanced graphics hardware.
2006 : NVIDIA introduced CUDA (Compute Unified Device Architecture), enabling GPUs to be used for general-purpose computing.

Why CPU was not enough ?

The increasing demand for higher-quality graphics in video games and professional applications exposed the CPU's limitations in handling parallel processing tasks. CPUs are optimized for sequential serial processing, which is ideal for a wide range of general-purpose computing tasks. However, they struggle with the highly parallel nature of graphics rendering and the massive computational requirements of AI.

CPU was not enough in

Parallel Processing : CPUs typically have a few cores optimized for sequential processing, whereas GPUs have thousands of smaller, efficient cores designed for parallel processing.
Task Specialization : CPUs are versatile and can handle a wide range of tasks, but this general-purpose nature limits their efficiency in specialized tasks like rendering graphics or performing large-scale matrix operations.
Performance Bottlenecks : The complex computations required for rendering high-quality graphics and processing large datasets created bottlenecks that CPUs could not efficiently overcome.

GPU and CPU Coordination

Despite their differences, CPUs and GPUs are designed to work together, complementing each other’s strengths. In a typical computing task, the CPU handles the general-purpose processing and decision-making tasks, while the GPU handles the parallelizable tasks.

How they coordinate:

Task Division : The CPU offloads specific computationally intensive tasks to the GPU.
Data Transfer : Data is transferred between the CPU and GPU through a high-speed interface such as PCIe (Peripheral Component Interconnect Express).
Synchronization : Both units work in synchronization, with the CPU often preparing data and instructing the GPU on the operations needed, then processing the results.

GPUs in AI and Large Language Models

In the realm of AI, GPUs have become indispensable. Training AI models, especially large language models (LLMs) like GPT-3 and beyond, involves processing vast amounts of data through complex computations. CPUs simply can't keep up with the demands.

How GPUs are used in AI:

Massive Parallelism : AI training involves performing millions of matrix multiplications simultaneously. GPUs, with their thousands of cores, can handle these operations in parallel, significantly speeding up the training process.
High Throughput : For inference (running the trained model on new data), GPUs provide the high throughput necessary to process large amounts of data quickly.
Energy Efficiency : Despite their power, GPUs can be more energy-efficient than CPUs for specific tasks, making large-scale AI training more feasible.
Optimized Libraries : Libraries like TensorFlow and PyTorch are optimized for GPU acceleration, enabling researchers and engineers to leverage GPU power easily.

Hardware-Based Accelerators:

TPUs (Tensor Processing Units) : Developed by Google, TPUs are custom-built application-specific integrated circuits (ASICs) designed to accelerate machine learning workloads. They are highly efficient for training and running AI models.
FPGA (Field-Programmable Gate Arrays) : These are reconfigurable hardware devices that can be programmed to perform specific computations efficiently. They offer a balance between flexibility and performance, suitable for specialized AI tasks.

These are some of them, this accelerators help us in training the model, which would take years for a CPU to do.

Python is recommended to have a lot of libraries and people use it a lot, internally they use the help of GPU when specified to be used.

If you reading here, I hope this helped you in some way.

Follow me for some interesting blogs. Helps me to stay motivated and create some more interesting blogs.

Implementing Caching Strategies: Techniques for High-Performance Web Apps

nayanraj adhikary — Sat, 22 Jun 2024 06:42:32 +0000

Caching Strategies? What's that?

Note: in-case you haven't read Deep Dive into Caching: Techniques for High-Performance Web Apps the previous blog.

Before we go deep, let us understand some common policies here

Write Through: The data is written in cache and backing store / DB simultaneously (parallelly).
Write Around: The Data is written only to the Backing Store / DB, not caches.
Write Background/ Write Behind: The Data is written to the Cache first then the Backing Store / DB in the background.
Read Through: The Data is Written to the Backing Store / DB. if the data is ever read it's written on the cache. This makes the first data read to be time taking but subsequent reads are faster.

Each of the policies above has advantages and disadvantages.

In the case of Distributes / Microservice Architecture, the thing would be distributed more based on the scale of the whole system, and other techniques such as sharding, etc are involved. Will be writing about this on some other blogs.

Problem

Ahh, When to which Writing Policies?

Let's understand some use cases for each of them

1. Write-Through Policy

Data Consistency : When strong consistency between the cache and the underlying data store is required. Any data written to the cache is immediately available in the backing store.
Simple Implementation : Easy to implement and understand since every write operation is propagated to the underlying data store.
Read-Heavy Workloads : Suitable for scenarios where read operations are more frequent than write operations, as the data in the cache is always consistent with the data store.

Examples:

Session Management : In web applications, session data needs to be consistent and immediately available across multiple nodes.
Configuration Data : Configuration settings that are frequently read but rarely changed.

2. Write Around Policy

Write-Heavy Workloads : Suitable for applications with frequent writes and less frequent reads, reducing the number of write operations to the cache.
Cold Data : Ideal for scenarios where data is not frequently accessed after being written. The cache is not burdened with rarely accessed data.

Examples:

Bulk Data Imports : Applications that periodically import large datasets where the data is not immediately needed for reading.
Logging Systems : Systems that write log data directly to storage but only occasionally read the data for analysis.

3. Write Behind (Write Back) Policy

Performance : Improves write performance by quickly acknowledging write operations and deferring the actual write to the data store.
Batch Processing : Suitable for scenarios where data can be written in batches to the underlying store, reducing the write load.
Data Freshness : Suitable when immediate consistency is not critical, and slight delays in data propagation to the data store are acceptable.

Examples:

User Activity Logging : Applications that log user actions where the logs are periodically flushed to the database.
E-commerce : Shopping cart data that is written to the cache for quick access and periodically synchronized with the database.

4. Read Through

Lazy Loading : Useful for loading data on demand, caching it only when it is actually needed.
Read-Heavy Workloads : Suitable for applications where read operations significantly outnumber write operations, and data needs to be quickly accessible after the first access.

Examples:

Product Catalogs : E-commerce applications where product details are read frequently but updated infrequently.
Content Management Systems (CMS): Systems where articles or media are read frequently after the initial publication.

When to select what

Need Consistency

Write Through : Ensures strong consistency as data is written to both the cache and the store simultaneously.
Write Around : Can lead to stale cache data until the data is read and cached.
Write Behind : Provides eventual consistency with potential lag between cache and store.
Read Through : Ensures data is cached on first access, potentially leading to stale data if not frequently updated.

Need Performance

Write Through : This can be slower for write operations due to double writes (cache and store).
Write Around : Reduces write load on the cache, faster write operations.
Write Behind : Improves write performance, but read operations may suffer if cache and store are not in sync.
Read Through : Fast read operations after initial cache miss, good for read-heavy scenarios.

Need Simplicity

Write Through : Simple to implement and ensures immediate consistency.
Write Around : Simple for write operations but requires cache management for reads.
Write Behind : More complex due to the need for asynchronous write handling and potential consistency issues.
Read Through : Straightforward for reads, requires handling of initial cache misses.

It depends on the use case you are solving.

Now Let's go deep, into the strategy used while implemeting cache.

Developers: I commonly use the OG, LRU cache most of the time.

LRU (Least Recently Used) is a popular caching strategy, but it's not always the best fit for every use case. There are several alternative caching strategies, each with its own strengths and suitable scenarios.

There are many with their own use cases, would be naming them here

LRU : Best for scenarios where the most recently accessed items are most likely to be accessed again soon.
LFU : Best when the access frequency is a good predictor of future accesses.
FIFO : Simple, best when the oldest data is the least useful.
Random Replacement (RR) : Simple, good for unpredictable access patterns.
Time-To-Live (TTL) : Best for time-sensitive data that becomes stale after a certain period.
Adaptive Replacement Cache (ARC) : Adapts well to changing access patterns, more complex.
Least Recently/Frequently Used (LRFU) : Balances between recency and frequency, tunable.
Segregated LRU (SLRU) : Useful for multi-segment caches with different types of data.
Most Recently Used (MRU) : Useful in specific scenarios where the most recent data is less useful.
Clock Algorithm : A variant of LRU that approximates its behavior using a circular buffer (clock) and a use bit for each page.
2Queue : Balances recency and frequency with separate queues.

Some frameworks by default support this strategy such as Django, and Spring Boot, and many more.

If you reached here read the Blog. Thanks for Reading. Hope this blog is about caches.

What did we learn

When to use what policy which implementing caching
How to not overload the caches by using various techniques for implementing caches.

Follow for more interesting blogs, Follow me to make me motivated to write some more interesting stuff.

Here are my social - Linked In

Deep Dive into Caching: Techniques for High-Performance Web Apps

nayanraj adhikary — Thu, 20 Jun 2024 14:55:27 +0000

Normal Developers: Wait Caching I know what's that, just saving information locally.

I mean you are correct in a way but...

In today's fast-paced digital world, users expect web applications to be fast and responsive. One of the key techniques to achieve high performance in web applications is caching. Caching can drastically reduce load times, decrease server load, and enhance the overall user experience.

Let's go with the basics

Caching

Caching is the process of storing copies of files or data in a temporary storage location so that they can be accessed more quickly. When a request is made, the system first checks the cache; if the requested data is present, it can be served immediately without needing to retrieve it from the original source. This process reduces the time and resources required to deliver the data.

When to Cache?

The data that doesn't change frequency can be cached.

Types of Caching

1. Client-Side Caching:

Browser Caching : Browsers store static assets like HTML, CSS, JavaScript, and images locally. Using cache control headers, you can define how long assets should be cached.

Cache-Control: max-age=3600

The browser already uses multiple techniques to cache. Most of the GET responses are cached by default.

2. Server-Side Caching:

HTTP Caching : Utilize HTTP headers such as ETag, Cache-Control, Expires, and Last-Modified to control caching behavior.
Content Delivery Network (CDN): CDNs cache your content at various geographically distributed servers, reducing latency and improving load times for users around the globe.

Here is a small knowledge regarding the CDN,

Jio cinema is a streaming platform, Whenever there is an IPL(Indian Premier League) going on and the servers are on heavy load, the response of a user's home screen is cached using CDN and client.

Reverse Proxy Cache : Tools like Nginx can act as intermediaries that cache responses from your server, reducing the load on your web server.

3. Database Caching

Query Caching: Store the results of expensive database queries to speed up subsequent requests. Most database systems, like MySQL and PostgreSQL, offer built-in query caching like using Indexing on the primary key, which makes a HashMap with the address of the location of the data.
Object Caching: Use in-memory data stores like Redis or Memcached to cache objects retrieved from a database, reducing the need to perform repetitive and expensive queries.

4. Application-Level Caching:

Page Caching: Store the entire rendered HTML of pages that don't change frequently.
Fragment/Component Caching: Cache parts of a page (like sidebar widgets) that change less frequently than the main content.
Data Caching: Cache expensive data computations or API calls.

There are many more techniques to implement this caching strategy

Thanks for reading a simple and short blog about caching in Web applications. Follow to learn the real magics of programming and make me motivated.

Behind the Scenes with Redis: The Power of RESP Protocol

nayanraj adhikary — Wed, 19 Jun 2024 05:06:29 +0000

Ever wonder how Redis maintains its simplicity and efficiency over the network?

Redis (Remote Dictionary Server) is a widely used in-memory data structure store that can be used as a database, cache, and message broker. One of the key aspects of Redis's efficiency and speed is its communication protocol, RESP (REdis Serialization Protocol).

This RESP helps the Redis client to communicate with the Redis server. This protocol was designed specifically for Redis, but you can use it in any client-server project.

RESP can serialize different data types including integers, strings, and arrays. It also features an error-specific type. A client sends a request to the Redis server as an array of strings.

RESP versions

As every protocol we have versions for RESP. Here are some of them.
Redis 2.0 supports RESP2 which become the standard
Redis 2 - Redis 7 supports RESP2 and RESP3

RESP Magic

We have many data types that RESP supports. Let's look into some of them

Each of the data types has SET and GET commands for more inside you can look in the Redis Docs.

Simple String: Simple strings are encoded with a + prefix followed by the string and terminated with \r\n.

 +OK\r\n

Errors: Errors are similar to simple strings but start with - prefix

-Error message\r\n

Integers: Integers are encoded with a : prefix followed by the integer and terminated with \r\n.

:1000\r\n

Arrays: Arrays are used to transmit multiple RESP types. They start with a * prefix followed by the number of elements in the array, a \r\n, and then the actual elements in RESP format.

*2\r\n$3\r\nfoo\r\n$3\r\nbar\r\n

If you notice each of the elements are separated by \r\n

Communication Process

When a client sends a command to the Redis server, it uses the RESP protocol to format the command. The server parses the RESP message, executes the command, and sends a RESP-formatted response back to the client.

Example
Let's consider a simple GET command:

*2\r\n$3\r\nGET\r\n$3\r\nkey\r\n

Server response (if the key exists):

$5\r\nvalue\r\n

Server response (if the key does not exist):

$-1\r\n

Each of these is translated so that it is readable by removing the \r\n.

One example with a data type

Hashes
This is the most used data type, Hashes are maps between string fields and string values, which are perfect for representing objects.

*4\r\n$4\r\nHSET\r\n$6\r\nmyhash\r\n$4\r\nname\r\n$5\r\nAlice\r\n

Server response

:1\r\n

Getting hash field

Client request:

*3\r\n$4\r\nHGET\r\n$6\r\nmyhash\r\n$4\r\nname\r\n

Server response:

$5\r\nAlice\r\n

Conclusion

RESP plays a crucial role in Redis's performance and efficiency, providing a simple yet powerful way to encode and decode messages between clients and servers. Understanding RESP and how it handles different Redis data types can help you make the most of Redis in your applications. Whether you're dealing with strings, lists, hashes, or sets, RESP ensures that communication is fast, reliable, and easy to work with.

What we learned

How does the RESP help us serialize Redis command over the network and help us achieve the performance?

Wanna know more about RESP more

Thanks for reading a small and simple blog for RESP. Giving a Like and Follow would make me motivated.

Something Crazy about localhost: Unveiling the Inner Workings

nayanraj adhikary — Tue, 18 Jun 2024 05:36:20 +0000

Ever wonder when you type localhost into your browser, you might take for granted that it magically knows to connect your local computer. However, there's a fascinating mechanism behind how localhost works.

There were a bunch of questions I had in my mind.

How does it map to an IP address?
Does it function similarly to DNS?

Let's dive into the details of localhost

Localhost is a hostname that refers to the current device used to access it. The name resolves to the IP address by default as 127.0.0.1 for IPv4 and ::1 for IPv6, pointing to the loopback network interface.

Wait what's a loopback Network Interface?

The loopback interface is a kind of special network interface used by the local computer to send network traffic to itself. Imagine it's like a virtual network device that software on your computer uses to test network applications without needing access to a physical network

IPv4 -> 127.0.0.1
IPv6 -> ::1

That's great, So how does the loopback interface know this information ?

Mapping of Localhost to IP

The Mapping varies based on the operating system.

Mac / Linux System -> /etc/hosts
Windows: C:\Windows\System32\drivers\etc\hosts

Now you can play around with this hosts file and see how it's mapping magically.

Does Localhost Work Like DNS?

While both localhost and DNS (Domain Name System) serve the purpose of resolving hostnames to IP addresses, they operate quite differently.

Hosts File: The resolution of localhost is handled locally through the hosts file. When you access localhost, the operating system reads the hosts file and maps localhost to 127.0.0.1 or ::1.

DNS: DNS is a hierarchical and decentralized naming system used for resolving domain names to IP addresses on the Internet. When you type a domain name (e.g., example.com) into your browser, the DNS resolver contacts multiple DNS servers to find the corresponding IP address.

Playing around with localhost using Python

import socket
print(socket.getaddrinfo('localhost', 0))

output:

[(<AddressFamily.AF_INET6: 30>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('::1', 0, 0, 0)), (<AddressFamily.AF_INET6: 30>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::1', 0, 0, 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_DGRAM: 2>, 17, '', ('127.0.0.1', 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]

Noticed the localhost carries both IPv4 and IPv6. So by using the IP address directly would be more efficient which would reduce the lookup. Interesting.

Conclusion

The simplicity and reliability of localhost mask the sophisticated mechanisms working behind the scenes. By mapping to 127.0.0.1 or ::1 through the hosts file, localhost provides an essential tool for developers and system administrators.

What did we learn

IP address directly would be more efficient which would reduce the lookup
IP addresses are just present within a hosts file.
DNS and local mapping of IP's are different

Thanks for reading my first blog about something we use daily.