Description
Some recent Gravity releases may experience an issue where a cluster may enter “degraded” state after removing a node with the following error mentioning the removed node:
overlay packet loss for node <node-ip> is higher than the allowed threshold of 20.00%: 100.00%
This will happen if the overlay checker detects a networking issue while the node is being removed. The warning will stay permanently even after the node has been removed.
The following Github ticket describes the issue in more detail: https://github.com/gravitational/gravity/issues/1403.
Affected versions
The following versions may experience this issue:
- 5.5.40-5.5.41
- 6.1.21-6.1.22
- 6.3.10-6.3.13
- 7.0.1-7.0.3
Workaround
Recreating “nethealth” pods in the monitoring namespace will clear the bogus warnings:
kubectl -nmonitoring delete pods -lk8s-app=nethealth