DEV Community

ButterflyI8
ButterflyI8

Posted on

The Secret Behind Supporting Massive Traffic with WAF: SafeLine's Utilization of Nginx

Web Application Firewall (WAF), as a product designed for securing HTTP/HTTPS traffic, essentially conducts malicious signature recognition, blocks attack traffic, and allows filtered, secure traffic to proceed to upstream business servers. WAF typically supports various deployment modes, among which the reverse proxy mode remains a popular choice.

Image description

In this mode, WAF receives HTTP requests from the Internet, inspects the traffic, and then forwards the requests to the upstream business servers. At this point, WAF presents itself as a Web server to clients, shielding them from the true existence of backend servers. Given that all HTTP traffic is first routed through WAF for processing, it imposes significant demands on WAF's HTTP traffic handling performance.

Both the reverse proxy and transparent proxy modes employed by SafeLine (a WAF solution) leverage Nginx for HTTP traffic proxying and forwarding. This article will provide a brief introduction to Nginx and SafeLine's utilization of Nginx.

What is Nginx?

As described on the official Nginx website.

nginx [engine x] is an HTTP and reverse proxy server, a mail proxy server, and a generic TCP/UDP proxy server.
Enter fullscreen mode Exit fullscreen mode

Nginx offers functionalities such as HTTP server/HTTP reverse proxy, mail proxy, and generic TCP/UDP proxy. Initially developed by Igor Sysoev, Nginx has since been acquired by F5 Networks, which has also introduced a commercial version of Nginx. Despite the existence of the commercial version, the open-source version of Nginx remains active and vibrant. According to the latest report by Netcraft, Nginx continues to hold the largest market share in the web server space.

Image description
As an outstanding and high-performance web server, Nginx boasts numerous remarkable design features in its code architecture. In the following, we will briefly introduce these designs.

Secrets Behind Nginx's High Performance

1.Separation of Master and Worker Processes

Nginx employs a multi-process model, specifically "single master + multiple workers". The specific roles of the master and worker processes are outlined below:

The master process does not handle network events or execute specific business logic. Its primary responsibilities are managing worker processes, enabling services to restart, seamlessly upgrade, rotate log files, and apply configuration changes in real-time. The master process controls the worker processes through signals.

The worker processes are where the actual business requests are processed. Each worker process runs in a single-threaded environment, and the number of worker processes can be customized. In production environments, it is common to configure the number of worker processes equal to the number of CPU cores and bind them to specific cores, maximizing the utilization of multi-core CPUs and minimizing the overhead of process switching.

In the networking domain, there are often concepts of "control plane" and "data plane." Based on the above description of Nginx's process model, the master process serves as Nginx's "control plane," focusing solely on managing the workers without handling specific business requests. When a worker process exits abnormally, the master process promptly creates a new worker process, minimizing the impact on service availability.

On the other hand, the worker processes act as Nginx's "data plane," responsible for processing actual business requests. With each worker process running in a single-threaded mode, Nginx avoids the overhead of intra-process multi-threading synchronization. Additionally, a single HTTP request is processed by one worker process throughout its lifecycle, eliminating the need for communication and data synchronization between different worker processes.

It is evident that Nginx's "master/worker" process model forms the cornerstone of its high reliability and performance.

2.Event-Driven Architecture

A high-performance web server typically needs to handle various events such as network, disk, and timer events, with network I/O events being the most crucial among them. In traditional network programming, some servers employ a "one thread per connection" or "one process per connection" model for network I/O, where the processes or threads act as consumers of network events. In contrast, Nginx adopts a fully event-driven architecture for handling its operations, characterized by the following features:

  1. Event Collection and Dispatch: Nginx's event-driven framework is responsible for gathering and distributing events such as network and timer events.

  2. Event Registration: Specific business modules, as event consumers, need to register their interest in particular types of events beforehand.

  3. Event Handling: When an event occurs (e.g., a readable/writable event on a network connection), the event-driven framework invokes the corresponding event consumer, allowing the business module to execute.

  4. Optimal Multiplexing Interface: Nginx's event-driven architecture selects the most optimal multiplexing interface based on the operating system. For instance, in Linux, it defaults to using epoll.

  5. Performance Boost and Non-Blocking Requirements: This event-driven approach significantly enhances network performance and throughput. However, it imposes stringent requirements on business modules, which must execute without blocking, as blocking behavior can delay the response to other events, ultimately reducing the overall throughput of the worker process.

3.Excellent Modular Design

In Nginx, almost everything is a module. Nginx defines a highly abstract interface for "modules," which all modules adhere to through the "ngx_module_t" interface design specification. Additionally, all modules are organized in a hierarchical and categorical manner. The following provides a brief introduction to Nginx's modular design:

  • The Nginx framework directly defines "core modules" and "configuration modules." The "configuration modules" implement the basic parsing functionality for Nginx's configuration items, while the "core modules" serve as the foundation of the Nginx framework. The Nginx framework code interacts directly only with the "core modules," and other modules do not communicate directly with the framework.

  • The "ngx_http_module" is a "core module" in Nginx that realizes HTTP-related functionalities. It defines a new module type, namely, the HTTP module type. All "HTTP modules" must follow the "ngx_http_module_t" interface specification.

  • Among all "HTTP modules," the "ngx_http_core_module" stands out as a unique module responsible for implementing the most fundamental logic of HTTP functionalities.

Nginx's excellent modular design ensures that even with its complexity, it maintains a clear code structure. It also allows developers to extend Nginx by developing their own modules based on the module interfaces, providing flexibility and extensibility.

4.Other Outstanding Designs

As a paradigm of high-performance servers, Nginx boasts numerous remarkable designs. Here are a few briefly listed:

Nginx is developed in C language. Since the C standard library does not provide universal data structures, Nginx has implemented its own data structures such as double-ended queues, red-black trees, hash tables, etc.

Nginx realizes a memory pool to manage memory resources efficiently, simultaneously reducing the cognitive burden on developers in memory management.

Nginx provides the "ngx_buf_t" data structure for efficient buffer operations.

Nginx offers various balancer algorithms to distribute requests among a group of upstream servers, including the round-robin algorithm, hash algorithm, and more.
...

SafeLine T1K Moudle

Since Nginx is an outstanding reverse proxy server in itself, SafeLine utilizes Nginx as the traffic forwarding engine for WAF (Web Application Firewall), achieving the reverse proxy function for HTTP traffic. However, WAF's core functionality extends beyond mere traffic forwarding; it primarily involves traffic inspection. Leichi has developed a proprietary HTTP module within Nginx, named the t1k module, which is designed to send the received traffic to Leichi's inspection engine within Nginx. Based on the response from the inspection engine, the t1k module decides whether to directly return a 403 error to block the request or to continue forwarding the request.

Image description

Implementing the aforementioned functionalities within Nginx is not complicated. The primary working principle of Leichi's t1k module is as follows:

  • During the access phase of the Nginx HTTP request processing flow, the request inspection function is implemented. It generates a subrequest, which is redirected to a special, internal location typically named "@safeline".

  • The handler for this special location, when processing the subrequest, utilizes Nginx's upstream mechanism to establish a connection with Leichi's inspection engine and sends the content to be inspected to the engine.

  • The results of the inspection returned by the engine are parsed, and the subrequest processing concludes.

  • The original request resumes its execution flow, and t1k decides based on the inspection results whether to return a 403 error page to block the request or to allow the request to proceed to the next phase for processing.

  • Response inspection is achieved through Nginx's HTTP body filter mechanism.

Moreover, thanks to Nginx's high degree of customizability and dynamic module loading capabilities, the t1k module can also be embedded into existing Nginx/OpenResty servers (including API gateway products such as Kong and APISIX, which are based on OpenResty), which is our "embedded deployment mode."

SafeLine Website

Top comments (0)