tailscale loopback: kai-server host userspace cannot reach its own ts-proxy at 100.115.195.2:8428 #89
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally filed by @coilysiren on 2026-04-28T09:23:32Z - https://github.com/coilysiren/infrastructure/issues/71
The eco-server systemd unit on kai-server cannot reach the vmsingle ts-proxy at
100.115.195.2:8428despite kai-server itself being a tailnet member. Confirmed via in-process synchronous HTTP probe at startup:Probe target was the exact URL configured for OtlpMetricsEndpoint. Same URL responds 200 from a laptop on the tailnet. So the failure is specific to host-userspace traffic going to a ts-proxy peer that lives on the same physical host as the originating tailscale daemon. Tailscale subnet routing back to a local-host peer doesn't appear to work from outside-cluster userspace.
This explains everything in coilysiren/eco-telemetry#5: the OTLP exporter was firing on schedule, the SDK was generating valid metric payloads (eco_players_online + the System.Runtime fan), the URL construction was correct, vmsingle was reachable from the rest of the network. The TCP packets just weren't arriving.
Reachable paths from eco-server's userspace:
Not reachable:
Suggested fixes, ranked:
vmsingle-host-service.ymlwithtype: NodePort(or extend the existing tailscale ClusterIP service). Eco-server points athttp://localhost:<nodeport>/opentelemetry/api/v1/push. No tailscale traversal at all - same-host traffic goes through k3s's iptables rules.tailscale set --advertise-routesconfiguration that's missing the cluster CIDR, or a sysctl thing. Dependent on Tailscale's behavior with self-hosted ts-proxies.For tonight's session I've kept the failing endpoint configured because flipping to a different one needs a cluster-side change that's destructive enough to wait on a clear-headed pass. Diagnostic surface added in eco-telemetry will keep working once a reachable endpoint is configured.
What ships in eco-telemetry from tonight (closes the upstream half of #5):
IModKitPlugindeclaration. Eco's scanner now sees the plugin. (Bug behind everything else.)EmitConsoleAlongsideOtlpconfig flag - mirror metrics to console for debugging.OtlpMetricExporter+PeriodicExportingMetricReaderconstruction. TheAddOtlpExporterhelper was silently no-op'ing in our setup; manual wiring is what put the OTLP reader in the pipeline at all.OTEL_DIAGNOSTICS.jsonself-diagnostics enabled byinstall-eco-mod.sh.Logs/EcoTelemetry/smoke-probe.txt(this is how I caught the loopback issue).