Is Teleport client timeout settings configurable?

So we have some servers having bad connections back to our Teleport Auth server due to the nature of our environments.
This is one example that we encountered currently.

Jan 10 16:04:50 eagle-hamilton teleport[22568]: ERRO [AUTH] Failed to dial auth server :3080: dial tcp: lookup : Temporary failure in name resolution. auth/clt.go:150
Jan 10 16:04:56 eagle-hamilton teleport[22568]: ERRO [PROC:1] Node failed to establish connection to cluster: Get https://:3080/v1/webapi/find: EOF. time/sleep.go:149
Jan 10 16:14:50 eagle-hamilton teleport[22568]: ERRO [PROC:1] Node failed to establish connection to cluster: EOF. time/sleep.go:149
Jan 10 16:40:11 eagle-hamilton teleport[22568]: ERRO [PROC:1] Node failed to establish connection to cluster: EOF. time/sleep.go:149
Jan 10 18:04:21 eagle-hamilton teleport[22568]: ERRO [PROC:1] Node failed to establish connection to cluster: EOF. time/sleep.go:149
Jan 10 18:05:45 eagle-hamilton teleport[22568]: ERRO [PROC:1] Node failed to establish connection to cluster: ssh: handshake failed: read tcp :53956->:3080: i/o timeout. time/sleep.go:149
Jan 10 18:06:56 eagle-hamilton teleport[22568]: ERRO [PROC:1] Node failed to establish connection to cluster: ssh: handshake failed: read tcp :54394->:3080: i/o timeout. time/sleep.go:149
Jan 10 18:25:11 eagle-hamilton teleport[22568]: ERRO [PROC:1] Node failed to establish connection to cluster: EOF. time/sleep.go:149
Jan 11 02:16:32 eagle-hamilton systemd[1]: Stopping Teleport SSH Service…
Jan 11 02:16:32 eagle-hamilton systemd[1]: Stopped Teleport SSH Service.
Jan 11 02:16:32 eagle-hamilton systemd[1]: Started Teleport SSH Service.
Jan 11 02:16:57 eagle-hamilton teleport[16442]: ERRO [PROC:1] Node failed to establish connection to cluster: Get https://:3080/v1/webapi/find: net/http: TLS handshake timeout. time/sleep.go:149

Will it possible for us to configure the timeout value to be larger?
And when a node failed to connect back to auth server, how does it do retries actually?

Thanks for any help on this!

There aren’t currently any exposed settings to configure the length of Teleport’s timeout when joining a cluster. The default timeout is 30 seconds which is set here.

If a connection attempt fails, the node will continually retry after timeout until a connection is established.

Hi @gus ,

Thank you for confirmation.
30s for timeout should be enough but not sure why the TLS connections kept failing. Meanwhile, operations such as upload/download to/from Google Drive will work fine.
We will try to check further on our side then…

Cheers.

Are there any proxies or firewalls between the servers or anything that would cause interference with the traffic? From the logs, it seems like the connection is failing to complete, possibly due to an incomplete TLS handshake or similar. What is the latency between the servers?

Because these servers are on maritime vessels where internet is only available through satelite, so the latency, bandwidth and even connectivity are pretty unstable from time to time. It’s just we noticed one of the servers we have had this consistent failure to connect back to main cluster.

We are looking at packet loss that can be up to 50% depending on weather conditions, ping time up to 1000+ms, and bandwidth is like limited to 2Mbps.

Still trying out teleport on more servers to have a better view on the performance when we have more and more servers setup.

That’s a really interesting use case for Teleport! I’ve opened an issue to track the request to allow Teleport’s default timeout to be configurable. I can’t promise we’ll get to it any time soon, unfortunately.

In the meantime though, as a workaround, you could check out the Teleport code, modify the default timeout and then compile Teleport yourself (with make release) to increase the timeout. You can also use make -C build.assets release to build in a Docker container if desired.