Notice
This guide only applies if the number of Gateways Nodes complies with / is within the limit your license allows, but you experience the telltale signs.
Telltale Signs
Gateway Logs like this:
time="Nov 30 20:22:54" level=info msg="Registering gateway node with Dashboard" prefix=dashboard
time="Nov 30 20:22:54" level=error msg="Response failed with code 404; retrying in 5s" prefix=dashboard
time="Nov 30 20:22:55" level=warning msg="DRL not ready, skipping this notification"
Dashboard Logs like this:
time="Feb 08 14:34:12" level=info msg="Got configuration for nodeID: ae553da9-873a-4f7f-6feb-5dea6eb2a260" prefix=pub-sub
time="Feb 08 14:34:14" level=error msg="No nodes available"
time="Feb 08 14:34:20" level=info msg="Client added" clientsNum=1 orgID=5fbbb5ee57055ed03d10f7de prefix=ui-notifications
time="Feb 08 14:34:24" level=error msg="No nodes available"
time="Feb 08 14:34:29" level=error msg="No nodes available"
Gateway(s) unable to proxy requests
Common Scenarios Around Occurrence
- Few nodes available in the Dashboard license (e.g 2-node license)
- Fairly frequent gateway (pod) crash/restart
Fix
- Stop the Dashboard
- Search for and delete all tyk-node* keys in the Redis instance
E.g
127.0.0.1:6379> keys tyk-node*
1) "tyk-node-prefix1fcededb-864a-4b5d-44b3-ca5c131a6e56"
2) "tyk-nodeid-prefixnode-ids"
3) "tyk-node-prefix22460d13-ead3-4969-5ba7-2463f17ed467"
127.0.0.1:6379> del tyk-nodeid-prefixnode-ids tyk-node-prefix1fcededb-864a-4b5d-44b3-ca5c131a6e56 tyk-node-prefix22460d13-ead3-4969-5ba7-2463f17ed467
(integer) 3
127.0.0.1:6379> - Start the Dashboard
Why it Happens
The tyk-nodeid-prefixnode-ids
key in Redis keeps a record of nodes and the Dashboard assigns these to Gateways on their startup/registration. When a Gateway is stopped, it sends a de-register signal to the Dashboard and frees up the node it previously used.
You would typically see such logs in the gateway when it is shutting down:
time="Jul 19 10:43:19" level=info msg="Stop signal received." prefix=main
time="Jul 19 10:43:19" level=info msg="Stopping heartbeat..." prefix=main
time="Jul 19 10:43:21" level=info msg="Stopped Heartbeat" prefix=dashboard
time="Jul 19 10:43:21" level=info msg=De-registered. prefix=dashboard
time="Jul 19 10:43:21" level=info msg=Terminating. prefix=main
In the event of an unexpected crash, a gateway might not get to de-register itself properly, and the node-id remains 'in use'. If this occurs frequently enough, it could get to a point that the Dashboard will think all nodes are used up and respond with a 404 to any gateway that wishes to register and obtain a node id from the licence.
Depending on the number of nodes a license allows, the chances of this happening could range from very unlikely (e.g. 100-node license) to very likely (e.g. 2-node license). Whichever the case, however, with increased environment instability, it is not impossible.
If you run into this in your environment, the steps detailed in Fix above would quickly resolve it, as the Dashboard will be forced to start with a clean slate of node records. It is safe to remove the tyk-node*
keys from Redis as they will be regenerated when the Dashboard is started.
Prevention
- Proper management and allocation of resources
- Periodic cleanup of
tyk-node*
keys in Redis
Comments
0 comments
Please sign in to leave a comment.