NGINX TCP Health Checks

Nginx Plus and Nginx can continually test our TCP upstream servers, avoid the servers that have failed, and gracefully include the recovered servers into the load-balanced group.

Let's see the configuration of TCP health checks:

1. We have configured an upstream group of TCP servers in the stream context, for example:

stream {
    #...
    upstream stream_backend {
    server backend1.example.com:12345;
    server backend2.example.com:12345;
    server backend3.example.com:12345;
   }
    #...
}

2. We have configured a server that passes TCP connections to the server group:

stream {
    #...
    server {
        listen     12345;
        proxy_pass stream_backend;
    }
    #...
}

Passive TCP Health Checks

If an attempt to connect to upstream server times out or results in an error, Nginx Plus or Nginx open source can mark the server as unavailable and stop sending requests to it for a defined amount of time. To determine the conditions under which nginx considers an upstream server unavailable, add the following parameters to the server directive:

fail_timeout
max_fails

upstream stream_backend {
    server backend1.example.com:12345 weight=5;
    server backend2.example.com:12345 max_fails=2 fail_timeout=30s;
    server backend3.example.com:12346 max_conns=3;
}

Server Slow Start

An upstream server can be easily overwhelmed by connections, which may cause the server to be marked as unavailable again. Slow start allows an upstream server to gradually recover its weight from zero to its nominal value after it has been recovered or become available. This can be done with the slow_start parameter of the upstream server directive:

upstream backend {
    server backend1.example.com:12345 slow_start=30s;
    server backend2.example.com;
    server 192.0.0.1 backup;
}

Active TCP Health Checks

Health checks can be configured to test a wide range of failure types. E.g. Nginx Plus can continually check upstream servers for responsiveness and avoid servers that have failed.

Nginx Plus sends special health check requests to each upstream server and tests for a response that satisfies certain conditions. If a connection to the server can't be established, the health check fails, and the server is considered unhealthy.

Nginx Plus does not proxy connections of the clients to unhealthy servers. If several health checks are configured for an upstream group, the failure of any health check is enough to consider the corresponding server unhealthy.

To enable active health checks:

1. Specify a shared memory zone - A special area where the Nginx Plus worker processes share state information about connections and counters. Add the zone directive to the upstream server group and define the zone (here, stream_backend) and the amount of memory (64 KB).

stream {
    #...
    upstream stream_backend {
        zone   stream_backend 64k;
        server backend1.example.com:12345;
        server backend2.example.com:12345;
        server backend3.example.com:12345;
    }
    #...
}

2. Enable active health check for the upstream group with the health_check directive.

stream {
    #...
    server {
        listen        12345;
        proxy_pass    stream_backend;
        health_check;
        #...
    }
}

3. If required, reduce a timeout between two consecutive health checks with the health_check_timeout directive. This health_check_timeout directive overrides the proxy_timeout value for health checks, as for health checks, this timeout requires to be significantly shorter.

stream {
    #...
    server {
        listen               12345;
        proxy_pass           stream_backend;
        health_check;
        health_check_timeout 5s;
    }
}

4. By default, Nginx Plus sends health check messages to the port specified by the server directive in the block of upstream. To override the port, define the port parameter of the health_check directive.

stream {
    #...
    server {
        listen               12345;
        proxy_pass           stream_backend;
        health_check         port=12346;
        health_check_timeout 5s;
    }
}

Fine Tuning TCP health Checks

By default, Nginx Plus tries to connect to each server in a group of upstream servers every 5 seconds. If the connection cannot be established, Nginx Plus considers the health checks failed, marks the server as unhealthy, and stop forwarding client connections to the server.

To change the default behavior, add parameters to the health_check directive:

interval: It defines how often Nginx Plus sends health check requests within seconds (default is 5 seconds).

passes: several consecutive health checks that the server must respond to be considered healthy. The default value is 1.

fails: several consecutive health checks that the server must fail to respond to be considered unhealthy. The default value is 1.

stream {
    #...
    server {
        listen       12345;
        proxy_pass   stream_backend;
        health_check interval=10 passes=2 fails=3;
    }
    #...
}

In the above example, the time between TCP healthy checks is increased to 10 seconds. The server is considered unhealthy after three consecutive failed health checks, and the server needs to pass two consecutive checks to be considered healthy again.

Next TopicNGINX UDP Health Checks

← prev next →