Platform <= 5.5.0 unavailable?

Today I tried to build a package using tele 5.4.7 with the manifest specifying runtime version 5.4.7. The last time I tried this (~10 days ago) these versions worked flawlessly. I went ahead and tried the version I started with (5.2.7) and had the same issue. Any of the 5.5.x releases work fine.

We’re aiming to use 5.5.x but I wanted to compare some differences with older versions. We’re using nginx-ingress as a DaemonSet with host networking on so ports 80 & 443 can be bound to the node itself. This worked using 5.4.7 (~10 days ago), but since beginning testing with 5.5.x the service is only accessible from within a gravity shell.

Original Error: *trace.NotFoundError package( not found
Stack Trace:
    /gopath/src/ main.main
    /go/src/runtime/proc.go:207 runtime.main
    /go/src/runtime/asm_amd64.s:2362 runtime.goexit
User Message: package( not found

Related to the attempts at using hostPort/hostNetwork, the ports 80/443 are definitely bound on the host with runtime 5.5.16 (even though the manifest specifies 5.5.5 in systemOptions.runtime.version) but connections time out sending requests to them.

Ah, building in docker before 5.5 doesn’t work? I swapped back to a VM and it’s finding the earlier versions now.

edit: looks like the issue may have been related to docker volumes. I removed a local volume used for --state-dir and builds of older versions are working in docker again. I’m confused as to why, though, as I nuked the cache folder several times.

Hi James!

I think it may have been related to the issue with tele build that was fixed in version 5.5.2 - before that, if you specified --state-dir, tele would assume it to be the “package source” so would attempt to find runtime in that state dir. The workaround in earlier versions is to explicitly set --repository flag e.g. --repository=s3:// (if you’re using OSS). But as I mentioned the issue is fixed since 5.5.2.

Regarding the service issues - first of all, there’s no runtime 5.5.16 - the latest is 5.5.5 at the moment - so that’s kind of weird :slight_smile: Have you tried narrowing down the issue? I.e. what exactly is failing - DNS resolution, overlay networking, etc.? Are all pods running in the cluster?


Ah, that’s good to know! Setting the --repository flag does indeed fix builds in docker for the older versions. Yes, we are currently using the OSS release.

I may be confusing “runtime version” with the tags on the planet images:

ExecStart=/usr/bin/gravity package command start musinghamilton1004/planet-config-17231730musinghamilton1004:5.2.22-11108

I don’t think that’s an issue here, just a misunderstanding on my end of what “runtime version” actually means.

Rolling back to 5.2.10 was a step in trying to narrow down the issue I’m seeing. Everything is running fine, but the hostPort setup we are attempting to use is having issues.

Our goal here is to simplify single-node air-gapped installations by providing an installation of nginx-ingress configured as a DaemonSet using hostPorts to remove the need for an additional reverse proxy/metal LoadBalancer to access our system.

From the node itself (outside of gravity shell) I am able to access the nginx-ingress service over the NodePorts that are assigned without issue. The hostPorts (80 and 443) time out, but do appear to be bound when looking at netstat -tlpn. However, while inside of a gravity shell the hostPorts (80 and 443) respond perfectly fine.

I definitely had this working at one point (on 5.2.10) but a fresh build with that version exhibited the same issue. I’m currently spooling up a new test-box to see if a clean slate helps, but I expect it won’t.

I might have figured out my issue… currently we aren’t using an install hook to setup the application. When doing a gravity leave --force with the application still installed I think the routing may remain in a broken state. Doing another install of the platform + our application on the same node then has the issue. If I’m sure to uninstall our application before destroying the cluster a fresh install works just fine!

Thanks for the response nonetheless. The solution for docker builds was useful, but now that I know how it’s breaking we can stick to 5.5.x!