NGINX TCP Health Checks
Nginx Plus and Nginx can continually test our TCP upstream servers, avoid the servers that have failed, and gracefully include the recovered servers into the load-balanced group.
Let's see the configuration of TCP health checks:
1. We have configured an upstream group of TCP servers in the stream context, for example:
2. We have configured a server that passes TCP connections to the server group:
Passive TCP Health Checks
If an attempt to connect to upstream server times out or results in an error, Nginx Plus or Nginx open source can mark the server as unavailable and stop sending requests to it for a defined amount of time. To determine the conditions under which nginx considers an upstream server unavailable, add the following parameters to the server directive:
Server Slow Start
An upstream server can be easily overwhelmed by connections, which may cause the server to be marked as unavailable again. Slow start allows an upstream server to gradually recover its weight from zero to its nominal value after it has been recovered or become available. This can be done with the slow_start parameter of the upstream server directive:
Active TCP Health Checks
Health checks can be configured to test a wide range of failure types. E.g. Nginx Plus can continually check upstream servers for responsiveness and avoid servers that have failed.
Nginx Plus sends special health check requests to each upstream server and tests for a response that satisfies certain conditions. If a connection to the server can't be established, the health check fails, and the server is considered unhealthy.
Nginx Plus does not proxy connections of the clients to unhealthy servers. If several health checks are configured for an upstream group, the failure of any health check is enough to consider the corresponding server unhealthy.
To enable active health checks:
1. Specify a shared memory zone - A special area where the Nginx Plus worker processes share state information about connections and counters. Add the zone directive to the upstream server group and define the zone (here, stream_backend) and the amount of memory (64 KB).
2. Enable active health check for the upstream group with the health_check directive.
3. If required, reduce a timeout between two consecutive health checks with the health_check_timeout directive. This health_check_timeout directive overrides the proxy_timeout value for health checks, as for health checks, this timeout requires to be significantly shorter.
4. By default, Nginx Plus sends health check messages to the port specified by the server directive in the block of upstream. To override the port, define the port parameter of the health_check directive.
Fine Tuning TCP health Checks
By default, Nginx Plus tries to connect to each server in a group of upstream servers every 5 seconds. If the connection cannot be established, Nginx Plus considers the health checks failed, marks the server as unhealthy, and stop forwarding client connections to the server.
To change the default behavior, add parameters to the health_check directive:
interval: It defines how often Nginx Plus sends health check requests within seconds (default is 5 seconds).
passes: several consecutive health checks that the server must respond to be considered healthy. The default value is 1.
fails: several consecutive health checks that the server must fail to respond to be considered unhealthy. The default value is 1.
In the above example, the time between TCP healthy checks is increased to 10 seconds. The server is considered unhealthy after three consecutive failed health checks, and the server needs to pass two consecutive checks to be considered healthy again.