Moving from github at request.
https://github.com/gravitational/gravity/issues/353
Install repeatedly fails and hangs on “site-app-post-install” job.
Version: 5.5.3 (issue was not present on 5.3.5 - it looks to be an issue with switch to coredns)
Environment: 3 RHEL 7.6 EC2 VMs on AWS
I have created a duplicate of the “site-app-post-install” with changing the gravity site status command so that I have access to the container.
I have also added the upstream dns from AWS onto --dns-zone on install.
In that container the when doing a nslookup on the service that is called via gravity site status it is not resolved.
Address 1: 10.100.14.135 kube-dns.kube-system.svc.cluster.local
/ # nslookup gravity-site.kube-system.svc.cluster.local
Server: 10.100.14.135
Address 1: 10.100.14.135 ip-10-100-14-135.eu-central-1.compute.internal
nslookup: can't resolve 'gravity-site.kube-system.svc.cluster.local'
the /etc/resolv.conf of the job container is
/ # cat /etc/resolv.conf
nameserver 10.100.14.135
search default.svc.cluster.local svc.cluster.local cluster.local eu-central-1.compute.internal
options ndots:5
(note) 10.100.14.135 is the cluster ip of the kube-dns service.
When I exec into the “gravity-site” container I am able to lookup gravity-site.kube-system.svc.cluster.local.
All of the coredns daemonset containers are running. I turned on logs on coredns settings and services from the monitoring namespace are reaching the coredns, but the requests from the “site-app-post-install” do not reach the coredns logs.