DEV Community

yangbongsoo
yangbongsoo

Posted on

Webclient timeout and connection pool Strategy

1. Webclient timeout

Let's look at the code below. There are many timeout options.



new ReactorClientHttpConnector(
    reactorResourceFactory,
    httpClient -> httpClient
        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 5000)
        .doOnConnected(connection ->
            connection.addHandlerLast(new ReadTimeoutHandler(5)
            ).addHandlerLast(new WriteTimeoutHandler(5))
        ).responseTimeout(Duration.ofSeconds(5)) // added since version 0.9.11
);


Enter fullscreen mode Exit fullscreen mode

ChannelOption.CONNECT_TIMEOUT_MILLIS is a waiting time for establishing a connection with the server and It is httpClient level.

responseTimeout is a timeout purely for HTTP request/response time.

Using ReadTimeoutHandler/WriteTimeoutHandler as a substitute for responseTimeout is not appropriate.
cf) responseTimeout was added starting from version 0.9.11, so alternative timeouts had to be utilized prior to that.

Especially since it applies at the TCP level, it also affects during the TLS handshake. Therefore, due to the associated encryption operations, it may take longer than a typical HTTP response. Consequently, the timeout should be set higher than that for pure HTTP responses.

Furthermore, the ReadTimeoutHandler/WriteTimeoutHandler handlers operate even when an HTTP request is not performed. For example, the handlers can close a connection in the connection pool, even if it could be swiftly used by another request later on.

In summary, ReadTimeoutHandler is not directly related to HTTP. It should be understood as a standard Netty handler that checks the time between different read operations.

The timeout() method of reactive streams is also insufficient for use as a responseTimeout. It covers not only the time the client takes to receive a response but also includes the operations of obtaining a connection from the connection pool and creating new connections within the reactive stream (including the TLS handshake process).

Therefore, responseTimeout must always be greater than the connection timeout plus the time to obtain a connection from the connection pool.

In other words, the response timeout limits only the pure HTTP request/response time, without considering the time to close idle connections or establish new connections.

Reference: https://github.com/reactor/reactor-netty/issues/1159

2. Webclient connection pool



ConnectionProvider provider = ConnectionProvider.builder("ybs-pool")
    .maxConnections(500)
    .pendingAcquireTimeout(Duration.ofMillis(0))
    .pendingAcquireMaxCount(-1)
    .maxIdleTime(Duration.ofMillis(8000L))
    .maxLifeTime(Duration.ofMillis(8000L))
    .build();


Enter fullscreen mode Exit fullscreen mode

maxLifeTime : "The maximum lifetime of a connection that can exist in the connection pool
maxIdleTime : The duration for which idle connections are maintained in the connection pool
pendingAcquireMaxCount : The maximum time to wait for obtaining a connection from the connection pool (inserting -1 means no timeout)

Caution: Depending on the target Tomcat server's connectionTimeout and keepAliveTimeout settings, it may cause the connection to close.

Therefore, it's advisable to set the maxIdleTime on the client side with the Tomcat keepAliveTimeout in mind.

In other words, it is better for the maxIdleTime to be shorter than the keepAliveTimeout. Otherwise, Reactor Netty can receive a close event at any time 'between obtaining a connection from the pool and sending the actual request'.

To address this issue, a LIFO (Last In, First Out) strategy, which always uses the most recently used connection, can be employed.
Reference: https://github.com/reactor/reactor-netty/issues/1092#issuecomment-648651826

Many issues arise due to connections being closed because of timeouts. This has led to the ability to switch the pool's release strategy. FIFO is the default, and LIFO was added starting from version 0.9.5.

cf) FIFO stands for First In, First Out, a common example being a queue. LIFO stands for Last In, First Out, with a stack being an example.

LIFO + max idle timeout operates as follows.

  • When a connection is obtained, the most recent one is used.
  • If it reaches the max idle timeout, this connection will be closed, and since this connection was the most recently used, it means that the other remaining ones (not active) in the pool will also be closed. A new connection is created and used for the request.
  • If the connection is closed by the remote peer (between acquisition and actual use), a 'connection reset by peer' message will be sent, and we will retry the request. Since this connection was the most recently used and was closed by the remote peer, all the other remaining ones (not active) in the pool will also be closed. Therefore, a new connection is created and used for the second attempt.

Image description

FIFO always obtains the oldest connection. Therefore, just because the oldest connection has reached the max idle time does not mean the next connection will also hit the max idle time.

3. reactor.netty.http.client.PrematureCloseException: Connection prematurely closed BEFORE response

I'll introduce a case where a PrematureCloseException occurred.

The structure involved three servers: A -> B -> C

  • A Server (making polling requests every 3 seconds)
  • B Server (making requests to C Server using WebClient)
  • C Server (configured with an Nginx keepalive timeout of 3 seconds)

In this setup, if the request from B to C takes slightly longer than 3 seconds, C Server times out and terminates the connection (sending an RST), resulting in a PrematureCloseException error.

This issue was resolved by changing the keepAlive setting of reactor.netty.http.client.HttpClient to false.

The PrematureCloseException occurs due to the TCP RST flag. Here's more detail about the TCP RST flag:

RST : The Reset flag indicates that the connection should be rest, and must be sent if a segment is received which is apparently not for the current connection. On receipt of a segment with the RST bit set, the receiving station will immediately aobrt the connection. Where a connection is aborted, all data in transit is considered lost, and all buffers allocated to that connection are released.

There are various scenarios where a TCP RST segment can occur, meaning the causes of a PrematureCloseException can be diverse.

Scenario1: Sending Duplicate SYNs

Image description

  • TCP A sends a SYN with a sequence number of 200, but the transmission of the segment is delayed.
  • TCP B receives the delayed SYN segment with sequence number 90(line 3).
  • Since it's the first SYN received by TCP B, it sends an ACK, indicating that it will use an initial sequence number of 500. However, TCP A detects that the ACK field sent by TCP B is incorrect, thinking its SYN segment hasn't reached the destination, and thus sends a Reset to refuse the segment(line 5).
  • For credibility of the segment, TCP A uses a sequence number of 91 in this reset segment(line 5).
  • Upon receiving the segment with the Reset flag set, TCP B re-enters the LISTEN state(line 5).
  • TCP B responds with an ACK in a normal manner using its new sequence number(line 7).
  • TCP A confirms that the connection has been established with the ISN(Initial Sequence Number) in line 8.

Scenario2: When one side crashes and is restarted(error recovery mechanism)

Image description

  • TCP A and TCP B are operating normally when TCP B sends a segment, but then a crash occurs at TCP A.
  • TCP A closes and sends a SYN again for a three-way handshake.
  • TCP B believes it is synchronized (thinks the connection is still established). Thus, upon receiving the SYN segment, it checks the sequence number and realizes there is a problem (line 3).
  • TCP B sends back an ACK indicating it was expecting sequence number 150 (line 4).
  • TCP A thinks the received segment does not match what it sent (as it sent a SYN for the first step of the three-way handshake) and sends a Reset segment (line 5).
  • TCP B is disrupted and closes.
  • TCP A can now attempt to connect again with a three-way handshake (line 7).

Scenario3:

Image description

  • TCP A experiences a crash.
  • However, TCP B, not recognizing the crash event at TCP A, sends a segment containing data.
  • Since TCP A is unaware of such a connection's existence upon receiving this segment, it responds with a Reset.

Scenario4:

Image description

  • Scenario 4 starts with both parties in the LISTEN state.
  • At this point, the duplicate SYN issue seen in Scenario 1 occurs, and TCP A sends a Reset.
  • TCP B then returns to the LISTEN state.

Reference: TCP/IP The Ultimate Protocol Guide
Reference: Effective tcp/ip programming

Top comments (0)