We have a cluster (really just one node) that is using 5.5.19, and we’re noticing the following behavior.
On cluster creation (i.e. when installing), the node had two DNS servers defined in /etc/resolv.conf, let’s call these dns1.company.com and dns2.company.com. These DNS servers got encoded in /etc/coredns/coredns.conf in the Planet container, as well as in the coredns
ConfigMap for the CoreDNS pods in Kubernetes itself.
At some later point in time, this node had its DNS servers changed, and so now in /etc/resolv.conf on the node, it just has dns3.company.com as the only server, and the other two have actually been decommissioned.
We see a few issues on the cluster:
- Inside the Planet container, /etc/resolv.conf is unchanged (it still has 127.0.0.2, dns1.company.com, dns2.company.com).
- Inside the Planet container, /etc/coredns/coredns.conf is unchanged (i.e. it still refers to dns1.company.com and dns2.company.com), so even when resolving against 127.0.0.2 inside the container it doesn’t work.
- Inside the coredns ConfigMap, it doesn’t change, so it still refers to dns1.company.com and dns2.company.com
We tried rebooting the machine, restarting the Gravity service using systemctl
, but no luck. We had to manually go and edit these configurations to get things to work.
To be specific, the following things start failing:
- Fetching images from Docker failed since Docker inside the Planet container could not resolve the Docker registry URL at registry.docker.com (or whatever the specific DNS name is).
- Communication from inside the cluster to external services (like RDS) broke, since the DNS entry could not be resolved.
My expectation was that if the DNS resolution configuration on the host changed that Gravity would update it, or at least provide a way for me to update it so I didn’t have to go and edit all these files manually (I missed editing /etc/coredns/coredns.conf on the first try, for example).
Is this expected? Is it a bug? Anything we can do to make this easier?