DEV Community: Pratik Shivaraikar

Demystifying Connection Pools: A Deep Dive

Pratik Shivaraikar — Tue, 25 Apr 2023 01:30:42 +0000

Connection pools are a critical aspect of software engineering that allows applications to efficiently manage connections to a database or any other system. If your application requires constant access to a system, establishing a new connection to the system for every request can quickly become resource-intensive, causing your application to slow down or even crash. This is where connection pools come in.

As engineers, we often don't spend a lot of time thinking about connections. A single connection is typically inexpensive, but as things scale up, the cost of creating and maintaining these connections increase accordingly. This is why I believe understanding the world of connection pooling is important as it will enable us to build more performant and reliable applications, especially at scale.

Typical connections

Before jumping to connection pooling, let us understand how an application typically connects to a system to perform any operation:

The application attempts to open a connection.
A network socket is opened to connect the application to the system.
Authentication is performed.
Operation is performed.
Connection is closed.
Socket is closed.

As you can see, opening and closing the connection and the network socket is a multi-step process that requires resource computation. However, not closing the connection, or keeping it idle also consumes resources. This is why we need connection pooling. While you will mostly see connection pooling being used in database systems, the concept can be extended to any application which communicates with a remote system over a network.

What are Connection Pools?

Connection pools are typically a cache of connections that can be reused by an application. Instead of creating a new connection each time an application needs to interact with the system, a connection is borrowed from the pool, and when it's no longer needed, it is returned to the pool to be reused later. This approach ensures that the application always has access to a ready-to-use connection, without the need to create new connections continuously.

Connection pooling reduces the cost of opening and closing connections by maintaining a pool of open connections that can be passed from one operation to another as needed. This way, we are spared the expense of having to open and close a brand new connection for each operation the system is asked to perform.

In this blog post, we'll demystify connection pools, explain how they work, how to implement them, and explore some of the common issues associated with connection pools. We'll also discuss connection pooling in the cloud and why it's important for modern-day applications. By the end of this blog post, you should have a good understanding of connection pools and how they can help you build more efficient and robust applications.

How Connection Pools Work

The basic principle of connection pooling is to maintain a pool of connections that are ready for use, rather than creating and destroying connections as required. When a client requests a connection from the pool, the connection pool manager checks if there are any available connections in the pool. If an available connection exists, the connection pool manager returns the connection to the client. Otherwise, the connection pool manager creates a new connection, adds it to the pool, and returns the new connection to the client.

Connection pooling algorithms are used to manage the pool of connections. These algorithms determine when to create new connections and when to reuse existing connections. The most common algorithms used for connection pooling are LRU (Least Recently Used), and round-robin or FIFO (First In, First Out).

In LRU, the connection pool manager keeps track of the time that each connection was last used. When a new connection is required, the connection pool manager selects the least recently used connection from the pool and returns it to the user.

In FIFO, the connection pool manager manages connections in the order they were added to the pool. When a new connection is required, the connection pool manager selects the connection that has been in the pool the longest and returns it to the user.

Connection pooling configurations are used to set the parameters for the connection pool. These configurations include settings such as the minimum and maximum number of connections in the pool, the maximum time a connection can be idle before it is closed, and the maximum time a connection can be used before it is returned to the pool.

Overall, the basic principles of connection pooling involve creating a pool of database connections, managing the pool using algorithms and configurations, and reusing the connections as required to reduce overhead and improve performance.

Implementing Our Own Connection Pool

To implement connection pooling in a specific programming language or framework, developers typically use connection pool libraries or built-in connection pool features. Code snippets and examples for implementing connection pools are often available in library documentation or online resources.

However, simply integrating an existing library in some dummy application is no good for us. Additionally, as a software engineer, implementing our own connection pool can bring a wealth of knowledge benefits. Firstly, it can significantly improve the performance of our application by reducing the overhead associated with establishing new connections. Additionally, it can help to prevent connection leaks and other issues that can arise from improperly managed connections.

Moreover, it provides us with fine-grained control over connection creation, usage, and destruction, allowing us to optimize our application's resource utilization. By implementing our own connection pooling, we can gain a deeper understanding of how our application works and thereby improve its scalability and reliability.

Building Blocks

For ease of demonstration, we can use SQLite3 DB and implement our own custom pooling for the same. I'll be using Go language here because of its simplicity. You can use any language of your choice.

To start with, our ConnectionPool struct will look something like this:

type ConnectionPool struct {
    queue chan *sql.DB
    maxSize int
    currentSize int
    lock sync.Mutex
    isNotFull *sync.Cond
    isNotEmpty *sync.Cond
}

Here, the ConnectionPool struct contains the queue, maxSize, currentSize, lock, isNotFull, and isNotEmpty fields. The queue field is a channel that holds pointers to sql.DB connections. sql.DB type belongs to Go's built in database/sql package. The database/sql provides a generic interface around SQL or SQL-like databases. This interface is implemented by the github.com/mattn/go-sqlite3 package which we will be using as an SQLite3 driver.

The maxSize field represents the maximum number of connections that the pool can have, and the currentSize field represents the current number of connections in the pool. The lock field is a mutex that ensures that concurrent access to shared memory is synchronized. The isNotFull and isNotEmpty fields are condition variables that allow for efficient waiting and are used to signal when the pool is not full and not empty, respectively.

sync.Cond is a synchronization primitive in Go that allows multiple goroutines to wait for a shared condition to be satisfied. It is often used in conjunction with a mutex, which provides exclusive access to a shared resource (in this case the queue), to coordinate the execution of multiple goroutines.

Yes. Channels can also be used for synchronization, but they come with some overhead in terms of memory usage and complexity. In this case, the use of sync.Cond provides a simpler and more lightweight alternative as they allow for efficient signaling for waiting goroutines.

By using sync.Cond, the implementation can ensure that goroutines waiting on the condition will be woken up only when the condition is actually met, rather than relying on a buffer channel that might have stale data. This improves the overall performance and reduces the likelihood of race conditions or deadlocks.

Getting Connection Object from the Pool

Next, we will implement a Get method which will return a database object from an existing ConnectionPool:

func (cp *ConnectionPool) Get() (*sql.DB, error) {
    cp.lock.Lock()
    defer cp.lock.Unlock()

    // If queue is empty, wait
    for cp.currentSize == 0 {
        fmt.Println("Waiting for connection to be added back in the pool")
        cp.isNotEmpty.Wait()
    }

    fmt.Println("Got connection!! Releasing")
    db := <-cp.queue
    cp.currentSize--
    cp.isNotFull.Signal()

    err := db.Ping()
    if err != nil {
        return nil, err
    }

    return db, nil
}

This function, Get(), retrieves a connection from the pool. First, it acquires the lock to ensure exclusive access to the shared state of the connection pool. If the pool is currently empty, the function waits until a connection is added back to the pool.

Once a connection is available, the function dequeues it from the queue, decrements currentSize, and signals that the pool is not full. It then checks whether the connection is still valid by calling Ping(). If the connection is not valid, an error is returned, and the connection is not returned to the caller. If the connection is valid, it is returned to the caller.

Adding Connection Object to the Pool

Moving on, we add an Add method whose responsibility will be to add the connection object to the pool once it has been used:

func (cp *ConnectionPool) Add(db *sql.DB) error {
    if db == nil {
        return errors.New("database not yet initiated. Please create a new connection pool")
    }

    cp.lock.Lock()
    defer cp.lock.Unlock()

    for cp.currentSize == cp.maxSize {
        fmt.Println("Waiting for connection to be released")
        cp.isNotFull.Wait()
    }

    cp.queue <- db
    cp.currentSize++
    cp.isNotEmpty.Signal()

    return nil
}

This function, Add(), adds a connection to the pool. It first checks whether the connection is nil and returns an error if it is. Then, it acquires the lock to ensure exclusive access to the shared state of the connection pool. If the pool is currently full, the function waits until a connection is released from the pool.

Once there is space in the pool, the function enqueues the connection onto the queue, increments currentSize, and signals that the pool is not empty. The function returns nil to indicate success

Closing the Connection Pool

As the name suggests, we will implement a Close function which will be responsible for closing all database connections in the pool. It starts by acquiring a lock and then it iterates through the all connections in the pool and closes them one by one. After closing each connection, it decrements the currentSize counter and signals any waiting goroutines that there is space that is now available in the pool.

func (cp *ConnectionPool) Close() {
    cp.lock.Lock()
    defer cp.lock.Unlock()

    for cp.currentSize > 0 {
        db := <-cp.queue
        db.Close()
        cp.currentSize--
        cp.isNotFull.Signal()
    }

    close(cp.queue)
}

Initializing the Connection Pool

We will implement a NewConnectionPool function as a constructor for a new connection pool. It takes the driver, dataSource, and maxSize arguments and returns a pointer to a new ConnectionPool instance. It first checks if the provided driver and dataSource arguments are valid by opening a connection to the database. If the connection is successful, it initializes a new connection pool with the provided maxSize argument. It then creates a new channel of *sql.DB objects and pre-populates it with maxSize database connections by creating a new database connection for each iteration of a loop. Finally, it returns the new ConnectionPool instance.

func NewConnectionPool(driver, dataSource string, maxSize int) (*ConnectionPool, error) {

    // Validate driver and data source
    _, err := sql.Open(driver, dataSource)
    if err != nil {
        return nil, err
    }

    cp := &ConnectionPool{
        queue: make(chan *sql.DB, maxSize),
        maxSize: maxSize,
        currentSize: 0,
    }

    cp.isNotEmpty = sync.NewCond(&cp.lock)
    cp.isNotFull = sync.NewCond(&cp.lock)

    for i := 0; i < maxSize; i++ {
        conn, err := sql.Open(driver, dataSource)
        if err != nil {
            return nil, err
        }
        cp.queue <- conn
        cp.currentSize++
    }

    return cp, nil
}

Putting it All Together

This is what our final custom Connection Pool implementation looks like:

package pool

import (
    "database/sql"
    "errors"
    "fmt"
    "sync"

    _ "github.com/mattn/go-sqlite3"
)

type ConnectionPool struct {
    queue chan *sql.DB
    maxSize int
    currentSize int
    lock sync.Mutex
    isNotFull *sync.Cond
    isNotEmpty *sync.Cond
}

func (cp *ConnectionPool) Get() (*sql.DB, error) {
    cp.lock.Lock()
    defer cp.lock.Unlock()

    // If queue is empty, wait
    for cp.currentSize == 0 {
        fmt.Println("Waiting for connection to be added back in the pool")
        cp.isNotEmpty.Wait()
    }

    fmt.Println("Got connection!! Releasing")
    db := <-cp.queue
    cp.currentSize--
    cp.isNotFull.Signal()

    err := db.Ping()
    if err != nil {
        return nil, err
    }

    return db, nil
}

func (cp *ConnectionPool) Add(db *sql.DB) error {
    if db == nil {
        return errors.New("database not yet initiated. Please create a new connection pool")
    }

    cp.lock.Lock()
    defer cp.lock.Unlock()

    for cp.currentSize == cp.maxSize {
        fmt.Println("Waiting for connection to be released")
        cp.isNotFull.Wait()
    }

    cp.queue <- db
    cp.currentSize++
    cp.isNotEmpty.Signal()

    return nil
}

func (cp *ConnectionPool) Close() {
    cp.lock.Lock()
    defer cp.lock.Unlock()

    for cp.currentSize > 0 {
        db := <-cp.queue
        db.Close()
        cp.currentSize--
        cp.isNotFull.Signal()
    }

    close(cp.queue)
}

func NewConnectionPool(driver, dataSource string, maxSize int) (*ConnectionPool, error) {

    // Validate driver and data source
    _, err := sql.Open(driver, dataSource)
    if err != nil {
        return nil, err
    }

    cp := &ConnectionPool{
        queue: make(chan *sql.DB, maxSize),
        maxSize: maxSize,
        currentSize: 0,
    }

    cp.isNotEmpty = sync.NewCond(&cp.lock)
    cp.isNotFull = sync.NewCond(&cp.lock)

    for i := 0; i < maxSize; i++ {
        conn, err := sql.Open(driver, dataSource)
        if err != nil {
            return nil, err
        }
        cp.queue <- conn
        cp.currentSize++
    }

    return cp, nil
}

Of course, there are many ways in which this implementation can be improved upon. Typically, you can use any variation of the Bounded-queue to implement your own connection pool. Most connection pool implementation use Bounded-queue as the underlying data structure.

The complete implementation along with its usage is open-sourced here in case you wish to play around. I'll suggest running it in debug mode to watch the signaling magic of sync.Cond unfold.

Common Connection Pooling Issues

While connection pooling can bring many benefits to an application, it is not without its challenges. Here are some common issues that can arise with connection pooling:

Overuse of Connection Pools : Connection pools should be used judiciously, as an overuse of pools can result in a decrease in application performance. This is because the connection pool itself can become a bottleneck if too many connections are being opened and closed, causing delays in database transactions.
Pool Size Configuration Errors : Connection pool size is an important consideration when implementing connection pooling. If the pool size is too small, there may not be enough connections available to handle peak traffic, resulting in errors or delays. On the other hand, if the pool size is too large, it can lead to unnecessary resource consumption and potential performance issues.
Connection Leaks : Connection leaks occur when a connection is not properly closed and returned to the pool after it has been used. This can lead to resource exhaustion, as unused connections will remain open and tie up valuable resources. Over time, this can result in degraded application performance and, in extreme cases, cause the application to crash.

To avoid these issues, it is important to monitor connection pool usage and performance regularly. Best practices such as setting appropriate pool size, tuning timeout and idle settings, and configuring automatic leak detection and recovery can help minimize the impact of these issues. Additionally, logging and alerting mechanisms can be put in place to help identify and remediate any issues that do occur.

Connection Pooling in Cloud Environments

Connection pooling is an important consideration when designing applications for the cloud. Cloud environments offer several unique challenges, such as elastic scalability and dynamic resource allocation. Connection pooling can help address some of these challenges, but there are additional considerations to take into account.

In a cloud environment, applications may be running on multiple instances or virtual machines. This means that a single connection pool may not be sufficient to handle the load from all of these instances. Instead, it may be necessary to implement multiple connection pools, each handling a subset of the total workload.

Another consideration is the dynamic nature of cloud environments. Instances can be added or removed from the environment at any time, which means that the size of the connection pool may need to be adjusted accordingly. This can be achieved through automation tools or by implementing dynamic scaling rules based on metrics such as CPU usage or network traffic.

Security is also an important consideration when implementing connection pooling in the cloud. In a shared environment, it is important to ensure that connections are secure and cannot be accessed by unauthorized parties. This may involve implementing encryption or access control measures, such as IP filtering.

Finally, it is important to ensure that connection pooling is properly configured for the specific cloud environment being used. Each cloud provider may have its own specific requirements and recommendations for connection pooling, such as maximum pool size or connection timeouts. It is important to consult the provider's documentation and best practices guides to ensure that connection pooling is properly configured for optimal performance and reliability.

In summary, connection pooling can be a valuable tool for optimizing performance and managing resources in cloud environments. However, there are additional considerations that must be taken into account to ensure that connection pooling is properly implemented and configured for the specific cloud environment being used.

Final Thoughts

In conclusion, connection pooling is a crucial concept in modern software development that can help to improve application performance and scalability while reducing resource usage. By caching and reusing database connections, connection pooling can reduce the overhead of creating and destroying connections, leading to faster application response times and increased throughput.

However, connection pooling is not a silver bullet and must be used carefully and thoughtfully. Common issues such as overuse of connection pools, pool size configuration errors, and connection leaks can cause performance degradation and even application crashes.

When using connection pooling in cloud environments, additional considerations must be taken into account, such as the network latency between the application and the database, and the dynamic nature of cloud resources.

To sum up, connection pooling is an important tool for improving database performance in modern software applications. By understanding how connection pooling works, common issues to look out for, and best practices for implementation, software engineers can harness the power of connection pooling to build more performant, scalable, and reliable applications.

Revolutionizing Data Security by Design

Pratik Shivaraikar — Sun, 16 Aug 2020 16:01:39 +0000

For decades, we have benefited from modern cryptography to protect our sensitive data during transmission and storage. However, we have never been able to keep the data protected while it is being processed.

Nearly 4 billion data records were stolen in 2016. Each one cost the record holder almost $158. If we do the simple math, in 2016 alone, attackers amassed a whopping $632 billion. The very scale, sophistication, and cost of cyber-attacks escalate every year. Cyber-attacks will continue this exploitation, and today’s technologies will not be able to keep pace. In such times, we need encryption technology to disorient and discourage bad actors.

For example, many years from now, a fault-tolerant, a universal quantum computer with millions of qubits could quickly sift through the probabilities and decrypt even the strongest common encryption, rendering this foundational security methodology, that we know as of today, obsolete.

This is where Homomorphic Encryption comes in. Homomorphic encryption helps us in solving a lot of problems that today's elliptic-curve cryptography (ECC) algorithms fail to address in our cloud infrastructure security.

Shortcomings of today's encryption techniques

When it comes to cloud security, our data is encrypted in two states: during transit and on storage.

In transit, the encryption techniques that we use today suffer from a problem called TLS / SSL termination. Interestingly, this problem that we're talking about is also very proudly marketed as a feature by reverse proxies such as Nginx, Envoy, etc.

TLS termination is used by reverse proxies for handling incoming connections and decrypting the TLS to pass on the unencrypted request to the appropriate servers. This is precisely the infrastructure limitation that attackers take advantage of. The whole threat model revolves around exploiting the fact of the availability of unencrypted data past this TLS termination phase.

In the case of storage, there are two ways in which we do things today. We either store the data in our databases mostly unencrypted in plain text; or, in some cases, by doing some form of encryption. In the case of cloud providers like GCP, AWS, Azure, etc., this encryption is done using some Key Management Service (KMS). Even in this case, while the data may be stored encrypted, there always comes a time where the application needs to decrypt the data if it wants to perform any operation on it.

Every service that we know, as of today, runs on unencrypted data. The trends that Twitter shows cannot be obtained by operating on encrypted data. The recommendations system on YouTube, the news feed on Facebook, all the predictions of every application that we see out there operate on unencrypted data.

It is these very shortfalls that Homomorphic encryption aims to address.

Homomorphic encryption

Imagine if you could compute on encrypted data without ever decrypting it.
What would you do?

― Flavio Bergamaschi

Lattice-based cryptography proves it's superiority as it uses complicated math problems to hide data. By the time computers are strong enough to crack today's encryption, the world can be prepared with lattice cryptography. Lattice cryptography, as of this day, to the best of our knowledge, is quantum resistant. It means that there does not exist any quantum algorithm that can decrypt this type of cryptography. Lattice cryptography is also the basis of Homomorphic Encryption (FHE).

Homomorphic encryption is the ability to perform arithmetic operations on encrypted data. None of our existing encryption techniques allow us to do that. Because of this ability, we don't need to decrypt our data, ever! It does, quite conveniently, address the shortcomings of our existing encryption techniques. In transit, the TLS termination problem never occurs as the reverse proxy need not decrypt the data. It can perform all its operations on the encrypted data itself and make all the necessary decisions without ever terminating the TLS. Even in a persistent store, all database queries can very well be performed on encrypted data.

Fully Homomorphic Encryption (FHE) protects us from these honest-but-curious threat models. An honest-but-curious (HBC) adversary is a legitimate participant in a communication protocol who will not deviate from the defined protocol but will attempt to learn all possible information from legitimately received messages. To get an idea of what this means, a useful comparison can help us great bounds.

With the way that we do things today, the common consensus is that Alice encrypts some data and sends it as an input to Bob. Bob can decrypt that data, process, and store it at his end. Just like Alice, even Bob can encrypt some data and send it over to Alice, where she can decrypt and process it at her end. Such a mechanism protects us against man-in-the-middle (MITM) attacks. Which is why Eve can't eavesdrop on any communication between Alice and Bob. But Bob, on the other hand, has access to all this unencrypted data. Here, Bob is the honest-but-curious actor.

For the sake of convenience, we are assuming Bob to be an honest-but-curious actor in this case without any malicious intent. For the threat models involving Bob, sitting inside our cloud infrastructure, having malicious intentions, and free access to all this unencrypted data, there are other protocols that we can use in combination with homomorphic encryption to counter such scenarios. But at this moment, for the sake of convenience, we will just be assuming Bob to be an honest-but-curious actor with non-malicious intent.

Interestingly, in the case of Homomorphic encryption, along with protection against eavesdropping and MITM, we get the added protection of not allowing Bob to sit on a gold mine of unencrypted data by encrypting everything that gets stored. This, however, does not steal away Bob's ability to perform operations on the data as he used to. One of the real benefits of homomorphic encryption is that unlike all the encryption techniques that we've seen till now, we need not decrypt the data. We can perform all the operations on the encrypted data itself.

Applications of Homomorphic encryption

Right off the bat, some of the use-cases that we can consider for such an encryption technique are:

Oblivious queries. Allowing searching without intent. For example, today, while requesting weather info, we need to reveal our location to cloud providers. In case of homomorphic encryption, since our location too, will always be encrypted, we need not reveal a lot of our data
Set intersections. Today, to determine an overlap, we need to share both the sets completely. Using homomorphic encryption, we can determine the overlaps without disclosure of the entire sets.
Extracting value from private data. We can now use all the machine learning models like traditional, regression or neural network models, etc. to perform the computation of all of our private data
Secure outsourcing. Even today, there still exist quite a few enterprises that maintain on-prem infrastructure due to a lack of trust with the cloud providers. Homomorphic encryption, because of its data privacy features by design, can encourage wider cloud adoption.

Proof of Concept

Without making this article sound like an ad, let us get our hands dirty and watch how Homomorphic Encryption can be implemented. Microsoft has a SEAL library, which supports homomorphic encryption. IBM, too recently released a Fully Homomorphic Encryption toolkit for Linux. For the sake of simplicity, since IBM's FHE toolkit is based on Docker container, we will be using it for our POC

First, we need to clone the repo:

$ git clone https://github.com/IBM/fhe-toolkit-linux.git

Once cloned, we need to run the FetchDockerImage.sh shell script. We also need to provide container OS as an argument to the shell script. For simplicity, we will be using Ubuntu:

$ cd fhe-toolkit-linux
$ ./FetchDockerImage.sh ubuntu

The download and setup of the toolkit will take some time, depending on the bandwidth speed and hardware.

Next, we need to run the IBMCOM pre-built toolkit from Docker Hub:

$ ./RunToolkit.sh -p ubuntu

The output of the above command should be something similar to:

$ ./RunToolkit.sh -p ubuntu
WARNING: No swap limit support
INFO:    Using system default persistent storage path...
INFO:    Persistent data storage: "/home/pratik/Projects/fhe/fhe-toolkit-linux/FHE-Toolkit-Workspace"
INFO:    CMake: Deleting cached built settings and reconfigure
INFO:    Launching FHE tookit: 


         docker run -d --name fhe-toolkit-ubuntu  -v /home/pratik/Projects/fhe/fhe-toolkit-linux/FHE-Toolkit-Workspace:/opt/IBM/FHE-Workspace  -p 8443:8443 ibmcom/fhe-toolkit-ubuntu


8fdcd97b1d203f0e71e4602ce6d24a76cd768c5fc2f8c5ee6b99ed7acb1a7886

CONTAINER ID        IMAGE                       COMMAND                  CREATED             STATUS                  PORTS                    NAMES
8fdcd97b1d20        ibmcom/fhe-toolkit-ubuntu   "code-server --bind-…"   6 seconds ago       Up Less than a second   0.0.0.0:8443->8443/tcp   fhe-toolkit-ubuntu

FHE Development is open for business: https://127.0.0.1:8443/

We now have a web server running at https://127.0.0.1:8443/. All our next operations will be through the browser.

On opening the browser and accepting the prompt because of the self-signed certificate, it will open the VS code interface in the browser. Soon, it will ask us to select a kit, make sure to choose the option which says GCC for x86_64-linux-gnu 9.3.0

Next, click Build in the CMake Tools status bar to build the selected target.

If you look into the examples/BGV_country_db_lookup directory, you can find the countries_dataset.csv file. It is a list of countries and their capital cities from the continent of Europe. When we are running the toolkit, it will be using the BGV_country_db_lookup.cpp file to encrypt the contents of CSV. It also contains code that allows us to search on encrypted data. On providing the country name as input, the script will look up through the encrypted list of countries and output it's matching capital.

Let's proceed to run the toolkit:

Following the text instructions, if we go ahead and enter any country, it goes through the databases and outputs the capital of the same.

Final thoughts

Though Homomorphic Encryption is a great and extremely promising technology, is it ready for out-of-the-box use? Absolutely not. This is very much evident from the POC that we did. For searching an encrypted database with around 47 entries, it took almost 2-3 minutes. There is no denying that this is an impressive start and definitely in the right direction, but we still have a long way to go. Having said that, Homomorphic Encryption can very well be the next big breakthrough in the computer science industry. We can only imagine the endless possibilities when the first FHE-enabled database would be implemented. Or the first FHE-supported proxy. Nonetheless, we're surely in for some exciting times ahead!

daily.dev delivers the best programming news every new tab. We will rank hundreds of qualified sources for you so that you can hack the future.

Evolution of Microservices

Pratik Shivaraikar — Mon, 20 Jul 2020 15:20:40 +0000

The central idea behind microservices is that some types of applications become easier to build and maintain when they are broken down into smaller, composable pieces which work together. Each component is continuously developed and separately maintained, and the application is then simply the sum of its constituent components. This is in contrast to a traditional, monolithic application which is developed all in one piece.

Applications built as a set of modular components are easier to understand, easier to test, and most importantly easier to maintain over the life of the application. It enables organizations to achieve much higher agility and be able to vastly improve the time it takes to get working improvements to production. This approach has proven to be superior, especially for large enterprise applications which are developed by teams of geographically and culturally diverse developers.

There are some other benefits as well for a microservice architecture, which include:

Developer independence: Small teams work in parallel and can iterate faster than large teams.
Isolation and resilience: If a component dies, you spin up another while and the rest of the application continues to function.
Scalability: Smaller components take up fewer resources and can be scaled to meet increasing demand of that component only.
Lifecycle automation: Individual components are easier to fit into continuous delivery pipelines and complex deployment scenarios not possible with monoliths.

But how did we reach here? Believe it or not, but we've come a long way in the past decade to design microservices the way that we do today. To understand why things are done the way they are in the microservice land, I believe it is important to understand the process of evolution of the microservice architecture.

Origins

Traditional application design is often called monolithic because the whole thing is developed in one piece. Even if the logic of the application is modular, it is deployed as one group, like for example a Go application, which, when built, gives us an executable file. We can imagine this as if all of notes of different subjects of a college student were compiled into one long stream.

This type of code writing and deploying is convenient because it all happens in one spot, but it incurs significant technical debt over time. That’s because successful applications have a tendency of getting bigger and more complex as the product grows, and that makes it harder and harder to run.

As these systems had a tight coupling process, any changes made to the code could potentially endanger the performance of the entire application. The functionalities were too interdependent for a new technological age that demanded constant innovations and adaptation.

Another issue with monolithic architecture was its inability to scale individual functionalities. One crucial aspect of successful businesses is their ability to keep up with consumer demands. Naturally, these demands depend on various factors and fluctuate over time.

At some point, the product will need to scale only a certain function of its service to respond to a growing number of requests. With monolithic apps, you weren’t able to scale individual elements but rather had to scale the application as a whole.

Enter microservices. However, the idea of separating applications into smaller parts is not new. There are other programming paradigms which address this same concept, such as Service Oriented Architecture (SOA). However, recent technology advances coupled with an increasing expectation of integrated digital experiences have given rise to a new breed of development tools and techniques used to meet the needs of modern business applications.

But this initial microservice / SOA architecture, which just simply took monoliths and broke them up into smaller units, had some problems of it's own. After being broken down to smaller units, these microservices needed to communicate among themselves to function. The first natural choice to facilitate this communication was, and in many cases, still remains, REST APIs.

This worked to a point. But then, synchronous request-response communication led to tight point-to-point coupling. This brought us all the way back to where we were. It became so tightly coupled that this problem was coined a term called as distributed monoliths. So basically you have microservices for the namesake, but you still have all the problems of monoliths like having to co-ordinate with teams, dealing with big fat releases, and a lot of the fragility that comes along. Some of the problems of such a distributed monolith architecture are:

Clients knowing a bit too much

Initially, the clients — which could be a mobile app, a web app, or any client of that sort — used to get a big fat documentation containing information about the APIs to be integrated. This resulted in the client knowing a bit too much than what they were supposed to. This essentially resulted in a bottle-neck when it came to making changes to the microservices. Adding a new microservice now meant changes to be introduced to the client as well. Making changes to existing microservices also forced changes to be made in the client.

Unavoidable redundancies

When breaking monoliths into smaller units, It becomes tricky to decide who will be responsible for what function of the system. If the system's architecture fails to address these issues properly, it would often result in some unavoidable redundancies. For example, If a microservice sends a request to another microservice and it fails to respond, suddenly the questions like what happens then becomes of paramount importance. This meant that the microservice from where the request originated, had to now take responsibility of being able to do something intelligent. This applied to every other microservice in the system, and even before we knew it, it became a viscous cycle. In order to handle such cases, every team ended up solving a lot of common problems. Such problems of shared infrastructure once again lead to the same issues which we were facing in monoliths.

Making changes is risky

As a microservice may not always know about the other microservices that communicate with it since they only communicate with each other using RESTful APIs, it may become hard to determine which microservices may end up breaking if introduce some changes to our microservice. Even with good API contracts such as OpenAPI, it is not an easy job. A lot of validation is required for all the microservices that are involved.

Evolution

Now that we've seen the challenges that we initially faced with microservice, or rather the distributed monoliths pattern that we used the first few years since the introduction of microservice as an architectural pattern, we can now have a better understanding of the problems that we aimed at solving, one-by-one, thereby evolving the microservice architecture in general.

API Gateways

Clients know a bit too much? Microservices end up having unavoidable redundancies in order to address common problems? Enter API Gateways. As simple as it sounds, but the introduction of a simple API gateway, really does end up solving a lot of problems. For starters, it frees all microservices from having to worry about authentication, encryption, routing etc. The client does not have to worry about changes that happen in the microservice land as it only communicates with the API gateway. This hugely simplifies things at the client as well as at the server side of things.

Responsibilities of an API gateway:

Authentication: Microservices don't have to worry about the overhead of authenticating the request again and again as the API Gateway will only let through authenticated requests
Routing: Since the client only knows about the API Gateway, it doesn't need to know about IPs or domains of all the microservices involved in the system. This also enables microservices to change freely as they don't have to worry about letting the client know about the internal changes as they are virtually transparent to the client
Rate limiting: One of the important advantages of having an API gateway is it's ability to rate-limit incoming requests. This hugely helps in spam prevention and also avoiding DOS attacks.
Logging and analytics: Since all the requests go through a single entity, important analytics, such as, who is accessing, what is being accessed, which is the most used endpoint, etc. can be easily obtained and a lot of meaningful insights can be derived from it

But wait a minute. Let us take a step back and analyse. Doesn't such a pattern resemble to one of the most basic problems that any good architecture aims to solve? No points for guessing the right answer: Single Point of Failures (SPOF). This API gateway now suddenly becomes a big bottleneck. It becomes a big engineering dependency.

Service mesh

Service mesh has been around from more than a couple of years now. In simple terms, a service mesh can be imagined as a distributed internal API gateway. An API gateway basically handles what we call as the North-South traffic. North-south traffic is basically the traffic that flows from host to servers. It is like the traffic that flows from top-down, or vertically. In order to remove the SPOF introduced due to a single API gateway, we want to take this north-south traffic and apply it as the east-west traffic within our cluster. Similar to north-south, east-west traffic is the traffic that flows within the servers. This can be imagined as the traffic that flows horizontally within the individual microservices.

Service mesh uses what we call as the sidecar pattern in architecture. The sidecar pattern is a single-node pattern made up of two containers. The first is the application container. It contains the core logic for the application. Without this container, the application would not exist. In addition to the application container, there is a sidecar container. The role of the sidecar is to augment and improve the application container, often without the application container’s knowledge. In its simplest form, a sidecar container can be used to add functionality to a container that might otherwise be difficult to improve.

This sidecar is usually language-agnostic. There could be sidecars for collecting logs, side-cars for monitoring, etc. To address the centralized problem caused by using a single API gateway, we can use sidecars as proxies attached to services. These proxies can have appropriate intelligence to carry out the function of routing to other microservices. It can also have service discovery. So in case if any microservice's IP changes, it automatically knows about it. Other features such as rate limiting can also be possible because of side-cars. For example, retries can be dropped so that the other services doesn't face a DOS attack and prevent it from drowning in case of unfortunate blips.

Event driven

To solve many of the issues which stemmed from use of RESTful APIs to communicate within microservices, we take the request-response architecture and split it in an event-driven architecture. In a request-driven architecture, microservices either tell others what to do (commands) or ask specific questions to get things done (queries), using RESTful APIs. In an event-driven architecture, microservices broadcast all events to every other microservice. You can think of events as not just facts but also triggers.

To understand the difference between both types, let us consider an example. Suppose you want to buy an item online, we can say that we have microservices that handle orders, shipping and customers. If a customer places an order, a request is made to the order service. This order service places the order, and co-ordinates with the shipping service to provision shipping of the product. The shipping service, in turn, communicates with the customer service to fetch customer details. These customer details when returned by the customer service, may contain address details of the customer to where the shipping of the item will be triggered.

However, there are some challenges that need to be solved with such an architectural pattern. What if the shipping service suddenly goes down? How long should the order service keep retrying for? Questions like these, and more, can be solved using the event-driven pattern.

In the case of event-driven architectures, when an order is received, the corresponding event, also called as fact, is written to this huge log of events. Other services, read all these events that are being written to the log, and then act on the events that are relevant to them. This happens in case of all the microservices. For example, if customer changes his address, the customer services publishes this fact in the event log. The shipping service, sensing a change in address, carries out the necessary actions.

Since the events are persisted in the log, in case if any service goes down, all that it needs to do is to read events from this stream whenever it comes back up, in order to catch up with the missed events.

It is important to note that all these events are stateful. Every microservice maintains a DB of it's own. This DB may not be a full-blown DB. It can even be something as simple as a key-value store. These DB may or may not contain redundant data. But the bottom line is that every DB will contain information that is relevant to that microservice. These DBs can also act as local caches to further reduce latency and thereby increase performance.

Serverless

Serverless is an architectural pattern where the cloud provider is responsible for executing a piece of code by dynamically allocating the resources. This eventually results in lesser number of resources used to run the code. The code is typically run inside stateless containers that can be triggered by a variety of events including HTTP requests, database events, queuing services, etc. The code that is sent to the cloud provider for execution is usually in the form of a function. Hence, serverless is also referred as Function-as-a-service (FaaS) as opposed to the traditional Backed-as-a-Service (BaaS) pattern. Since everything happens on-demand, these containers are ephemeral. They are dynamically spun up on receiving an event, and also conveniently destroyed after having served it's purpose. This hugely helps in scaling

However, there's a catch. One essential thing is missing from this pattern. States! In the case of other architectural patterns, we discussed how every microservice maintains a database of it's own, and how it helps in serving the purpose of a local cache, thereby reducing latency and increasing performance. But in case of serverless patterns, we do not maintain states for any of our containers as they themselves are ephemeral.

Future

Having seen the origins of the SOA pattern, and it's process of evolution up to the Serverless pattern, we can now see what the future holds for us. At this moment, we work around the problem of having to maintain states in our serverless functions by using a cloud store. Right now this is doing the job for us, but it is not really ideal. Maintaining a separate cloud store is expensive and introduces unnecessary overhead. We want something more traditional, where every microservice maintained a database of it's own thereby maintaining their own state. Microsoft's Azure, has Durarable Functions, which has taken a step in this direction, with the aim of solving this problem. Other problems that we still need to solve, include having triggers and data from data stores to functions. There are various uses cases which demand this requirement. A unified view of the current state, compiled from the states of all the microservices can also help us in many ways. These problems, are one of the hardest and most interesting part in serverless right now. There is a lot of active research and development going on in this field. There is no doubt that serverless will be a big part of the future.

Originally published at blog.pratikms.com

Demystifying Containers

Pratik Shivaraikar — Fri, 19 Jun 2020 18:53:46 +0000

Ever since Docker released it's first version back in 2013, it triggered a major shift in the way the software industry works. "Lightweight VMs" suddenly caught the attention of the world and opened opportunities of unlimited possibilities. Containers provided a way to get a grip on software. You can use Docker Containers to wrap up an application in such a way that its deployment and runtime issues— how to expose it on a network, how to manage its use of storage and memory and I/O, how to control access permissions, etc. — are handled outside of the application itself, and in a way that is consistent across all “containerized” apps.

Containers offers many other benefits besides just handy encapsulation, isolation, portability, and control. Containers are small (megabytes). They start instantly. They have their own built-in mechanisms for versioning and component reuse. They can be easily shared via the public or private repositories.

Today, Containers are an essential component of the Software Development process. Many of us use it on a day-to-day basis. In spite of all this, there is still a lot of "magic" involved for all many who want to venture into the world of Containers in general. Even till date, there is a lot of ambiguity in how exactly a container works. Today we will demystify a lot of that "magic". But before that, I believe it is necessary for us to understand the process of evolution which lead to the

The world before Containers

For many years now, enterprise software has typically been deployed either on “bare metal” (i.e. installed on an operating system that has complete control over the underlying hardware) or in a virtual machine (i.e. installed on an operating system that shares the underlying hardware with other “guest” operating systems). Naturally, installing on bare metal made the software painfully difficult to move around and difficult to update — two constraints that made it hard for IT to respond nimbly to changes in business needs.

Then virtualization came along. Virtualization platforms (also known as “hypervisors”) allowed multiple virtual machines to share a single physical system, each virtual machine emulating the behavior of an entire system, complete with its own operating system, storage, and I/O, in an isolated fashion. IT could now respond more effectively to changes in business requirements, because VMs could be cloned, copied, migrated, and spun up or down to meet demand or conserve resources.

Virtual machines also helped cut costs, because more VMs could be consolidated onto fewer physical machines. Legacy systems running older applications could be turned into VMs and physically decommissioned to save even more money.

But virtual machines still have their share of problems. Virtual machines are large (gigabytes), each one containing a full operating system. Only so many virtualized apps can be consolidated onto a single system. Provisioning a VM still takes a fair amount of time. Finally, the portability of VMs is limited. After a certain point, VMs are not able to deliver the kind of speed, agility, and savings that fast-moving businesses are demanding.

Containers

Containers work a little like VMs, but in a far more specific and granular way. They isolate a single application and its dependencies — all of the external software libraries the app requires to run — both from the underlying operating system and from other containers. All of the containerized apps share a single, common operating system, but they are compartmentalized from one another and form the system at large.

Taking an example of docker, in the image below, you can see that my host OS has a hostname of it's own. It has it's own set of processes running. When I run an Ubuntu container, we can see that it has it's own hostname and it's own set of processes:

This means that our Ubuntu container is running in an isolated environment. The PID 1 confirms this fact. Similarly we can provide a mounted storage to our container, or allocate a particular number of processes or a certain amount of RAM to run with. But what exactly is all this? What exactly is process isolation? What is a containerized environment? What do metered resources mean?

We will try to make sense of all this jargon. We will try to replicate the behavior of docker run <image> as close as possible. To make it all happen, we will be using Go for this purpose. There is no specific reason behind the selection of Go in this case. You can literally choose any language like Rust, Python, Node, etc. The only requirement is that the language should support syscalls and namespaces. The reason why I picked Go for this purpose is just a personal preference. The fact that Docker is built on Go also helps my case.

Building a container from scratch

As mentioned earlier, we will try to replicate something as close to docker as possible. Just like docker run <image> cmd args we will go for go run main.go cmd args. To start with, we will proceed with the basic snippet that most Go plugins of all the major editors has to offer:

package main

func main() {

}

Now we will add support for execution of basic commands like echo and cat

func must(err error) {
    // If error exists, panic and exit
    if err != nil {
        panic(err)
    }
}

func run() {
    fmt.Printf("Running %v\n", os.Args[2:])

    // Execute the commands that follow 'go run main.go run'
    cmd := exec.Command(os.Args[2], os.Args[3:]...)
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    must(cmd.Run())
}

func main() {
    // Make sure that the first argument after 'go run main.go' is 'run'
    switch os.Args[1] {
    case "run":
        run()
    default:
        panic("I'm sorry, what?")
    }
}

Let's see what that boils down to:

Now that we can run simple commands with our script, we will try running a bash shell. Since it can get confusing as we are already in a shell, we will try to run ps before and after running our script.

It is still difficult to say anything. To confirm if we have isolation like an actual container, let us try by simply changing the hostname from within our bash shell launched using our script. To modify hostname, we need to be root:

Just to summarize, we did the following in the specified order:

Check the processes running on our host OS, by running the command ps
Check hostname of our host OS by running the hostname command
Run our script to launch a bash shell
Check the processes running in our launched bash shell using the ps command
Check the hostname from within our launched bash shell
Try to modify the hostname and set it to arbitrary string
Verify if the hostname was modified successfully within our launched bash shell. It indeed did.
Exit to return to our host OS shell
Check hostname in our host OS
The hostname change within our bash shell launched using our script unfortunately persisted causing the hostname to change in our host OS as well

This means that we do not have isolation as of yet. To address this, we need the help of namespaces

Namespaces

Namespaces provide the isolation needed to run multiple containers on one machine while giving each what appears like it’s own environment. There are six namespaces. Each can be independently requested and amounts to giving a process (and its children) a view of a subset of the resources of the machine.

PID

The PID namespace gives a process and its children their own view of a subset of the processes in the system. This is in analogous to a mapping table. When a process of a PID namespace asks the kernel for a list of processes, the kernel looks in the mapping table. If the process exists in the table the mapped ID is used instead of the real ID. If it doesn’t exist in the mapping table, the kernel pretends it doesn’t exist at all. The PID namespace makes the first process created within it PID 1 (by mapping whatever its host ID is to 1), giving the appearance of an isolated process tree in the container. This is a really interesting concept.

MNT

In a way, this one is the most important. The mount namespace gives the process’s contained within it their own mount table. This means they can mount and unmount directories without affecting other namespaces including the host namespace. More importantly, in combination with the pivot_root syscall it allows a process to have its own filesystem. This is how we can have a process think it’s running on Ubuntu, CentOS, Alpine, etc — by swapping out the filesystem that the container sees.

NET

The network namespace gives the processes that use it their own network stack. In general only the main network namespace (the one that the processes that start when you start your computer use) will actually have any real physical network cards attached. But we can create virtual ethernet pairs — linked ethernet cards where one end can be placed in one network namespace and one in another creating a virtual link between the network namespaces. Kind of like having multiple IP stacks talking to each other on one host. With a bit of routing magic this allows each container to talk to the real world while isolating each to its own network stack.

UTS

The UTS namespace gives its processes their own view of the system’s hostname and domain name. After entering a UTS namespace, setting the hostname or the domain name will not affect other processes.

IPC

The IPC Namespace isolates various inter-process communication mechanisms such as message queues. This particular namespace deserves a blog post of it's own. There's so much to IPC than what I can comprehend myself. Which is why I will encourage you to check out the namespace docs for more details.

USER

The user namespace was the most recently added, and is the likely the most powerful from a security perspective. The user namespace maps the UIDs to different set of UIDs (and GIDs) on the host. This is extremely useful. Using a user namespace we can map the container's root user ID (i.e. 0) to an arbitrary and unprivileged UID on the host. This means we can let a container think it has root access without actually giving it any privileges in the root namespace. The container is free to run processes as uid 0 - which normally would be synonymous with having root permissions, but the kernel is actually mapping that UID under the covers to an unprivileged real UID belonging to the host OS.

Most container technologies place a user’s process into all of the above namespaces and initialize the namespaces to provide a standard environment. This amounts to, for example, creating an initial internet card in the isolated network namespace of the container with connectivity to a real network on the host. In our case, for satisfying our immediate requirement, we will add the UTS namespace to our script so that we can modify hostname.

func run() {
    // Stuff that we previously went over

    cmd.SysProcAttr = &syscall.SysProcAttr{
        Cloneflags: syscall.CLONE_NEWUTS,
    }

    must(cmd.Run())
}

Running it, returns:

Awesome! We now have the ability to modify hostname in our container-like environment without letting the host environment change. But, if we observe closely, our process IDs within the container are still the same. We're able to see the processes running in our host OS even from within our container. To fix this, we need to use the PID namespace. As discussed above, the PID namespace will allow us process isolation.

func run() {
    // Stuff that we previously went over

    cmd.SysProcAttr = &syscall.SysProcAttr{
        Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID,
    }

    must(cmd.Run())
}

However, unlike the case of UTS namespace, simply adding the PID namespace here like this won't help. We will have to create another copy of our process so that it can be run with PID 1.

func run() {
    cmd := exec.Command("/proc/self/exe", append([]string{"child"}, os.Args[2:]...)...)
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    cmd.SysProcAttr = &syscall.SysProcAttr{
        Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID,
    }

    must(cmd.Run())
}

func child() {
    fmt.Printf("Running %v as PID %d\n", os.Args[2:], os.Getpid())

    cmd := exec.Command(os.Args[2], os.Args[3:]...)
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    must(cmd.Run())
}

func main() {
    switch os.Args[1] {
    case "run":
        run()
    case "child":
        child()
    default:
        panic("I'm sorry, what?")
    }
}

What we're basically doing is that whenever we will run go run main.go run bash, our main() function will be called. As the value of os.Args[1] will be 'run' at this instance, it will call our run() function. Within run(), we are using /proc/self/exe to create a copy our current process. We are essentially creating a copy and calling it again by appending the string 'child' to it followed by the rest of the arguments that we received in run(). When we do this, our main() function will be invoked again with the difference being that the value of os.Args[1] will be 'child' this time. From there on, the rest of the script executes as we saw before.

Unfortunately, even after doing all this, the results that we get are not that different. To understand why, we need to know what exactly goes on behind the scenes when we run the ps command. It turns out that ps looks at /proc directory to find out what processes are currently running on the host. Let us observe the contents of the /proc directory from our host and also from our container.

As we can see, the contents of the /proc directory when observed from the host and even from the container are one and the same. To overcome this, we wan't the ps of our container to be looking at a /proc directory of it's own. I other words we need to provide our container it's own filesystem. This brings us to an important concept of containers: layered filesystems

Layered Filesystems

Layered Filesystems are how we can efficiently move whole machine images around. They're the reason why the ship floats and does not sinks. At a basic level, layered filesystems amount to optimizing the call to create a copy of the root filesystem for each container. There are numerous ways of doing this. Btrfs uses copy on write (COW) at the filesystem layer. Aufs uses “union mounts”. Since there are so many ways to achieve this step, we will just use something horribly simple. We’ll do a copy of the filesystem. It’s slow, but it works.

To do this, I have a copy of the Lubuntu filesystem copied in the path specified below. The same can be seen in the screenshot provided below as I have touched HOST_FS and CONTAINER_FS as two files within the root of the host and within the copy of our Lubuntu FS.

We will now have to let our container know about this filesystem and ask it to change it's root to this copied filesystem. We will also have to ask the container to change it's directory to / once it's launched.

func child() {
    // Stuff that we previously went over

    must(syscall.Chroot("/home/lubuntu/Projects/make-sense-of-containers/lubuntu-fs"))
    must(syscall.Chdir("/"))
    must(cmd.Run())
}

Running this we get our intended FS. We can confirm it as we can see CONTAINER_FS, the file that we created in our container:

However, once again, in-spite of all of these efforts, ps still remains a problem.

This is because while we provided a new filesystem for our container using chroot, we forgot that /proc, in itself, is a special type of virtual filesystem. /proc is sometimes referred to as a process information pseudo-file system. It doesn't contain 'real' files but runtime system information like system memory, devices mounted, hardware configuration, etc. For this reason it can be regarded as a control and information center for the kernel. In fact, quite a lot of system utilities are simply calls to files in this directory. For example, lsmod is the same as cat /proc/modules. By altering files located in this directory you can even read/change kernel parameters like sysctl while the system is still running.

Hence, we need to mount /proc for our ps command to be able to work.

func child() {
    // Stuff that we previously went over

    must(syscall.Chroot("/home/lubuntu/Projects/make-sense-of-containers/lubuntu-fs"))
    must(syscall.Chdir("/"))
    // Parameters to this syscall.Mount() are:
    // source FS, target FS, type of the FS, flags and data to be written in the FS
    must(syscall.Mount("proc", "proc", "proc", 0, ""))

    must(cmd.Run())

    // Very important to unmount in the end before exiting
    must(syscall.Unmount("/proc", 0))
}

You can think of syscall.Mount() and syscall.Unmount() as the functions that are called when you plug-in and safely remove a pen-drive. In the same analogy, we mount and unmount our /proc filesystem in our container.

Now if we run ps from our container:

There! After all these efforts, we finally have PID 1! We have finally achieved process isolation. We can see our /proc filesystem has been mounted by doing ls /proc which lists the current process information of our container.

One small thing that we need to check is to see the mount points of proc. We will do that by first running mount | grep proc from our host OS. We will then launch our container and again run the same command. With our container still running, we will once again run mount | grep proc to check the mount points of proc with our container running.

As we can see, if we run mount | grep proc from our host OS with our container running, the host OS can see where proc is mounted in our container. This should not be the case. Ideally, our containers should be as transparent to the host OS as possible. To fix this, all we need to do is to add MNT namespace to our script:

func run() {
    // Stuff we previously went over

    cmd.SysProcAttr = &syscall.SysProcAttr{
        Cloneflags:   syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS,
        Unshareflags: syscall.CLONE_NEWNS,
    }

    must(cmd.Run())
}

Now if we observe the mount points from our host OS with the container running, we get:

There! With this, now we can say that we have a truly isolated environment. Just so that there is a better distinction between our host and our containerized environments, we can assign our container some arbitrary hostname

 func child() {
    // Stuff that we previously went over

    must(syscall.Sethostname([]byte("container")))
    must(syscall.Chroot("/home/lubuntu/Projects/make-sense-of-containers/lubuntu-fs"))
    must(syscall.Chdir("/"))
    must(syscall.Mount("proc", "proc", "proc", 0, ""))

    must(cmd.Run())

    must(syscall.Unmount("/proc", 0))
}

Running it gives:

This gives us a fully running, fully functioning container!

There is, however, one more important concept which we haven't yet covered. While Namespaces provide isolation, and Layered Filesystems provide us with a root filesystem for our container, we need Cgroups for resource sharing.

Cgroups

Cgroups, also known as Control Groups, previously known as Process Groups is perhaps one of the most prominent contribution of Google to the software world. Fundamentally, cgroups collect a set of process or task ids together and apply limits to them. Where namespaces isolate a process, cgroups enforce resource sharing between processes.

Just like /proc, Cgroups too, are exposed by the kernel as a special file system that we can mount. We add a process or thread to a cgroup by simply adding process ids to a tasks file, and then read and configure various values by essentially editing files in that directory.

func cg() {
    // Location of the Cgroups filesystem
    cgroups := "/sys/fs/cgroup/"
    pids := filepath.Join(cgroups, "pids")

    // Creating a directory named 'pratikms' inside '/sys/fs/cgroup/pids'
    // We will use this directory to configure various parameters for resource sharing by our container
    err := os.Mkdir(filepath.Join(pids, "pratikms"), 0755)
    if err != nil && !os.IsExist(err) {
        panic(err)
    }

    // Allow a maximum of 20 processes to be run in our container
    must(ioutil.WriteFile(filepath.Join(pids, "pratikms/pids.max"), []byte("20"), 0700))

    // Remove the new cgroup after container exits
    must(ioutil.WriteFile(filepath.Join(pids, "pratikms/notify_on_release"), []byte("1"), 0700))

    // Add our current PID to cgroup processes
    must(ioutil.WriteFile(filepath.Join(pids, "pratikms/cgroup.procs"), []byte(strconv.Itoa(os.Getpid())), 0700))
}

func child() {
    fmt.Printf("Running %v as PID %d\n", os.Args[2:], os.Getpid())

    // Invoke cgroups
    cg()

    cmd := exec.Command(os.Args[2], os.Args[3:]...)
    // Stuff that we previously went over

    must(syscall.Unmount("/proc", 0))
}

On running our container, we can see the directory 'pratikms' created inside /sys/fs/cgroup from our host. It has all the necessary files in it control resource sharing within our container.

When we cat pids.max from our host, we can see that our container is limited to running a maximum of 20 processes at a time. If we cat pids.current, we can see the number of processes currently running in our container. Now, we need to test the resource limitation that we applied on our container.

:() { : | : & }; :

No, this is not a typo. Neither did you read it wrong. It's essentially a fork bomb. A fork bomb is a denial-of-service attack wherein a process continuously replicates itself to deplete available system resources, slowing down or crashing the system due to resource starvation. To make more sense of it, you can literally replace the : in it with anything. For example, :() { : | : & }; : can also be written as forkBomb() { forkBomb | forkBomb &}; forkBomb. It means that we're declaring a function forkBomb() who's body recursively calls itself with forkBomb | forkBomb and runs it in background using &. Finally, we call it using forkBomb. While this works, a fork bomb is conventionally written as :() { : | : & }; :, and that is what we will proceed with:

As we can see, our the current number of processes running within our container were 6. After we triggered the fork bomb, the current number of running processes increased to 20 and remained stable there. We can confirm the forks by observing the output of ps fax:

Putting it all together

So here it is, a super super simple container, in less than 100 lines of code. Obviously this is intentionally simple. If you use it in production, you are crazy and, more importantly, on your own. But I think seeing something simple and hacky gives us a really useful picture of what’s going on.

package main

import (
    "fmt"
    "io/ioutil"
    "os"
    "os/exec"
    "path/filepath"
    "strconv"
    "syscall"
)

func must(err error) {
    if err != nil {
        panic(err)
    }
}

func cg() {
    cgroups := "/sys/fs/cgroup/"
    pids := filepath.Join(cgroups, "pids")
    err := os.Mkdir(filepath.Join(pids, "pratikms"), 0755)
    if err != nil && !os.IsExist(err) {
        panic(err)
    }
    must(ioutil.WriteFile(filepath.Join(pids, "pratikms/pids.max"), []byte("20"), 0700))
    // Remove the new cgroup after container exits
    must(ioutil.WriteFile(filepath.Join(pids, "pratikms/notify_on_release"), []byte("1"), 0700))
    must(ioutil.WriteFile(filepath.Join(pids, "pratikms/cgroup.procs"), []byte(strconv.Itoa(os.Getpid())), 0700))
}

func child() {
    fmt.Printf("Running %v as PID %d\n", os.Args[2:], os.Getpid())

    cg()

    cmd := exec.Command(os.Args[2], os.Args[3:]...)
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    must(syscall.Sethostname([]byte("container")))
    must(syscall.Chroot("/home/lubuntu/Projects/make-sense-of-containers/lubuntu-fs"))
    must(syscall.Chdir("/"))
    must(syscall.Mount("proc", "proc", "proc", 0, ""))

    must(cmd.Run())

    must(syscall.Unmount("/proc", 0))
}

func run() {
    cmd := exec.Command("/proc/self/exe", append([]string{"child"}, os.Args[2:]...)...)
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    cmd.SysProcAttr = &syscall.SysProcAttr{
        Cloneflags:   syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS,
        Unshareflags: syscall.CLONE_NEWNS,
    }

    must(cmd.Run())
}

func main() {
    switch os.Args[1] {
    case "run":
        run()
    case "child":
        child()
    default:
        panic("I'm sorry, what?")
    }
}

Again, as stated before, this is in no way a production-ready code. I do have some hard-coded values in it. For example value of the path to the filesystem, and also the hostname of the container. If you wish to play around with the code you can get it from my GitHub repo. But, at the same time, I do believe this is a wonderful exercise to understand what goes on behind the scenes when we run that docker run <image> command in our terminal. It introduces us to some of the important OS concepts that Containers, in general leverage. Like Namespaces, Layered Filesystems, Cgroups, etc. Containers are important – and its prevalence in the job market is incredible. With Cloud, Docker and Kubernetes becoming more linked every day, that demand will only grow. Going forward, it is only imperative to understand the inner workings of a Container. And this was my small attempt in doing the same.

This post first appeared at https://blog.pratikms.com

You Don't Know Deno?

Pratik Shivaraikar — Fri, 22 May 2020 16:05:14 +0000

When Brendan Eich, during his time at Netscape created JavaScript in 1995, I doubt that he seldom had any idea of what the language will grow out to be in the coming future. When Netscape partnered with Sun to take on their competitor Microsoft, Brendan Eich decided to surf the tidal wave of hype surrounding Java. He found this reason compelling enough to rename Mocha - the language that he created to turn the web into a full-blown application platform - to JavaScript. He envisioned JavaScript to be marketed as a companion language to Java, in the same was as Visual Basic was to C++. So the name was a straightforward marketing ploy to gain acceptance.

By the 2000s, when Doughlas Crockford invented the JSON data format using a subset of JavaScript syntax, a critical mass of developers emerged who started viewing JavaScript as a serious language. However, due to some early design choices like: automatic semicolon insertion (ASI), the event loop, lack of classes, unusual prototypical inheritance, type coercion etc. turned out to be tools for developers to laugh at and to ridicule those who were using this language. This cycle still continues.

It was only until a few years earlier due to "Web 2.0" applications such as Flickr, Gmail etc. when the world realized what a modern experience on the web could be like. It was also due to a still ongoing healthy competition between many browsers who competed to offer users a better experience and a better performance that the JavaScript engines also started becoming considerably better. Development teams behind major browsers worked hard to offer better support for JavaScript and find ways to make JavaScript run faster. This triggered significant improvements in a particular JavaScript engine called as V8 (also known as Chrome V8 for being the open-source JavaScript engine of The Chromium Project).

It was in 2009, when Ryan Dahl paid special attention to this V8 engine to create Node.js. His focus, initially was heavily on building event-driven HTTP servers. The main aim of event-driven HTTP servers is resolving the C10k problem. Simply put, the event-driven architecture provides relatively better performance while consuming lesser resources at the same time. It achieves this by avoiding spawning additional threads and the overheads caused by thread context-switching. It instead uses a single process to handle every event on a callback. This attempt of Ryan Dahl turned out to be crucial for the popularity that server-side JavaScript enjoys today.

Node.js, since then, has proved to be a very successful software platform. People have found it useful for building web development tooling, building standalone web servers, and for a myriad of other use-cases. Node, however, was designed in 2009 when JavaScript was a much different language. Out of necessity, Node had to invent concepts which were later taken up by the standards organizations and added to the language differently. Having said that, there have also been a few design decisions that Node suffers from. These design mistakes, compelled Ryan step down from the Node.js project. He has, since then, been working on another runtime which aims at solving these issues: Deno . In this blog post, we will look at two of the major JavaScript runtimes that enable server-side JavaScript: Node.js and Deno. We will have a look at the problems with Node, and how Deno aims at resolving those.

Design mistakes in Node

A lot of the discussion that is about to follow is inspired from a talk that Ryan Dahl delivered at a JSConf. In the talk, he discusses about the problems that Node has. This doesn't necessarily mean that all Node projects should be abandoned at this very instance. It is important to note that Node is not going anywhere and that it is here to stay. It is only because of some of the inherent problems that Node has because of the not-so-rich JavaScript that was available at the time of it's design. This was in addition to some features and functionalities which were added on top of Node which made it a huge monolith thereby making things hard to change.

Event-emitters

Promises in Node.js promised to do some work and then had separate callbacks that would be executed for success and failure as well as handling timeouts. Another way to think of promises in Node.js was that they were emitters that could emit only two events: success and error. At the time of designing Node, JavaScript did not have the concept of Promises or async / await. Node's counterpart to promises was the EventEmitter, which important APIs are based around, namely sockets and HTTP. Async / await was later introduced more as a syntactic sugar to implement Promises. When implemented the right way, Promises are a great boon for the event-driven architecture.

Node's implementation of using EventEmitter though, has a small problem called as 'back-pressure'. Take a TCP socket, for example. The socket would emit "data" events when it received incoming packets. These "data" callbacks would be emitted in an unconstrained manner, flooding the process with events. Because Node continues to receive new data events, the underlying TCP socket does not have proper back-pressure, the remote sender has no idea the server is overloaded and continues to send data.

Security

The V8 engine, by itself, is a very good security sandbox. However, Node failed to capitalize big on this. In it's earlier days, there was no way telling what a package can do with the underlying file system unless and until someone really looked into it's code. The trust comes from community usage.

Build system

Build systems are very difficult and very important at the same time. Node uses GYP as it's build system. GYP is intended to support large projects that need to be built on multiple platforms (e.g., Mac, Windows, Linux), and where it is important that the project can be built using the IDEs that are popular on each platform as if the project is a “native” one. If a Node module is linking to a C-library, GYP is used to compile that C-library and link it to Node. GYP was something that Chrome used at that time when Node was designed. Chrome, eventually, for various reasons, abandoned GYP for GN. This left Node as the sole GYP user.

Node modules

When npm version 1 was released by Isaac Schlueter, it soon became the defacto standard. It solved some problems like ' dependency hell '. Before npm, a 'dependency hell' usually occurred if one tried to install two versions of a package within the same folder. This resulted in the app to break. Thanks to npm, dependencies were now stored within the node_modules folder. But an unintended side-effect of this was that now every project had a 'node_modules' directory in it. This resulted in increasing consumption of disk space. In addition to it, it added some overhead to the Module Resolution Algorithm. Node has to first look out in one of the local folders, followed by the project's node_modules, failing which it had to search in the global node_modules. More complexity was added to this as the modules didn't have any extensions to it. The module loader has to query the file system at multiple locations trying to guess what the user intended.

Having said all this, it is important to mention that there are no inherent breaking faults in Node. Node.js is a time-tested and proven runtime. It recently completed ten years of it's existence. The awesome community has been instrumental in the humongous success that node enjoys today. npm, today, is one of the biggest package repositories ever. But as a developer who cannot unsee the bugs that he himself introduced in the system, Ryan couldn't help but move on to a different endeavor. The above reasons motivated him to work on Deno: A secure runtime for Javascript and Timescript .

Deno

The name, Deno is actually derived as an anagram of Node. It is best described as per it's website:

Deno is a simple, modern and secure runtime for JavaScript and TypeScript that uses V8 and is built in Rust.

There are a lot of things to pay attention to in this simple description. Let's go over them one-by-one:

Security

Security is one of the biggest USPs of Deno. Deno aims to mimic the browser. And just like any browser, the JavaScript running in it does not have any access to the underlying file-system, etc., by default. Deno, in the same way, provides a secure sandbox for JavaScript to run in. By default, the JavaScript running within the runtime has no permissions. The user has to explicitly grant each and every individual permission which his app requires.

Module system

At the moment, there is no package.json in Deno, neither there is any intention to bring anything like that anytime sooner. Imports will always be via relative or absolute URLs only. At the time of this writing, Deno does not support any of the npm package. During the early stage of it's design, it was made clear that there are no plans to support Node modules due to the complexities involved. However, there have been some discussions making rounds about the same, but it has not arrived at any conclusion yet.

TypeScript Support

Deno's standard modules are all written in TypeScript. The TypeScript compiler is directly compiled into Deno. Initially, this caused the startup time to be almost around ~1 minute. But this problem was quickly addressed, thanks to V8 snapshots. This greatly brought down the startup times. This enabled TS compilers to start-up scripts very quickly. TypeScript is treated as a first class language. Users can directly import TypeScript code (with the .ts extension) immediately.

Rust

In it's early days, Deno was prototyped in Go. Now, however, for various reasons, Deno has been converted in a solid Rust project. Unlike Node, Deno is not a huge monolith, but rather a collection of Rust crates. This was done to facilitate opt-in functionality for users who may not desire to have the entire Deno executable packaged into one, but would rather be happy with only a collection of selective modules. This allows users to build their own executables.

Limitations

It should be noted that Deno is not a fork of Node. While Node is over a decade old, Deno has been in development only from the past two years. At the time of this writing, Deno v1.0.0 was released only a few days ago, on the 13th of May, 2020. Deno may not be suitable for many use-cases today as it still has some limitations:

at this moment, Deno is not compatible with Node (NPM) package managers
accessing native systems beyond that which is provided by Deno is difficult. Hence it has a very nascent plugins / extensions system at the moment
the TypeScript compiler may prove to be a bottleneck in some cases. Plans are in place to port TSC to Rust
the HTTP server performance is just at par with that of Node (25k requests served by Deno vs 34k requests served by Node for a hello-world application)

Final Thoughts

The history of JavaScript has been long and full of bumps. Today, it is one of the most trending and fastest growing languages. The community is as active as ever. Node.js, V8 and other projects have brought JavaScript to places it was never thought for. With Deno, another important chapter is being written in the history of JavaScript. As of now, according to me, Deno cannot be looked at as a replacement of Node. It can definitely be considered as an alternative to Node. But even for that, we may have to wait for some future releases of Deno for it to mature. Having said that, this is a great time to be alive as a JavaScript developer. With the ecosystem thriving, today a JavaScript developer can function at any vertical of the system, be it front-end, back-end, database, etc. With the release of Deno, we can easily bet on runtimes enabling JavaScript to be run on servers for many years that are yet to come.

This blog first appeared on: https://blog.pratikms.com