WSL k3s agent registers Hyper-V NIC as InternalIP, breaks apiserver-mediated kubectl from kai-server #147

Open
opened 2026-05-26 20:17:22 +00:00 by coilysiren · 0 comments
Owner

Symptom

From kai-server (the k3s control plane), every apiserver-mediated operation against a pod on the kai-desktop-tower-wsl agent node fails:

$ coily ops kubectl logs -n openclaw deploy/openclaw --tail=30
Error from server: Get "https://172.27.244.126:10250/containerLogs/openclaw/openclaw-7b479cc4df-l552m/gateway?tailLines=30": proxy error from 127.0.0.1:6443 while dialing 172.27.244.126:10250, code 502: 502 Bad Gateway

$ coily ops kubectl port-forward -n openclaw svc/openclaw 18789:18789
error: error upgrading connection: error dialing backend: proxy error from 127.0.0.1:6443 while dialing 172.27.244.126:10250, code 502

Pod-local traffic is fine - the openclaw pod's own readiness probe (which runs inside the pod) passes, the pod is 1/1 Ready, and Ollama on the Windows side of the same tower is reachable from the tailnet.

Cause

The WSL k3s agent has registered its InternalIP as 172.27.244.126, which is the address of its Hyper-V NIC inside Windows. That address is not reachable from kai-server over the tailnet (it's a Microsoft-assigned RFC1918 NAT inside the WSL guest). The k3s apiserver uses the kubelet's InternalIP when proxying exec / logs / port-forward, so all of those break for pods scheduled on this node. The kubelet itself is fine - traffic that goes through the CNI (pod-to-pod, the in-pod liveness probe) works.

This was masked until VHGC's openclaw rollout because every other workload on the tower has either run in-pod-only or been driven from a session on the WSL node directly.

Fix options

  1. Re-register the kubelet with --node-ip=<wsl-tailnet-ip> - the cleanest fix. The WSL box's tailnet IP is 100.107.172.77 (resolved from tailscale status; SSM the FQDN, not the IP). Set INSTALL_K3S_EXEC / K3S_NODE_IP or systemd unit override on the WSL agent so kubelet advertises the tailnet IP as its InternalIP. apiserver then dials kubelet over the tailnet, which already works for any other tailnet-to-tailnet traffic.

  2. Accept it and document - kubectl exec / logs / port-forward to WSL-node pods only work from a session on that node. Workable but every cross-node verification step then has to coordinate through the agent channel.

Option 1 is the right call. Option 2 is what we're doing today, and it added 20 minutes of channel-handoff overhead to a routine verify.

Acceptance

  • coily ops kubectl logs -n openclaw deploy/openclaw from kai-server returns logs.
  • coily ops kubectl port-forward -n openclaw svc/openclaw 18789:18789 from kai-server stays up long enough for a curl.
  • Document the fix in docs/k3s-deploy-notes.md under the existing MagicDNS / tailnet-IP traps section.

Provenance

Surfaced while applying deploy/openclaw.yml for VHGC. Comms #38 on the channel has the full apply context. The deploy itself is up; only the apiserver-mediated verification path is broken.

**Symptom** From kai-server (the k3s control plane), every apiserver-mediated operation against a pod on the `kai-desktop-tower-wsl` agent node fails: ``` $ coily ops kubectl logs -n openclaw deploy/openclaw --tail=30 Error from server: Get "https://172.27.244.126:10250/containerLogs/openclaw/openclaw-7b479cc4df-l552m/gateway?tailLines=30": proxy error from 127.0.0.1:6443 while dialing 172.27.244.126:10250, code 502: 502 Bad Gateway $ coily ops kubectl port-forward -n openclaw svc/openclaw 18789:18789 error: error upgrading connection: error dialing backend: proxy error from 127.0.0.1:6443 while dialing 172.27.244.126:10250, code 502 ``` Pod-local traffic is fine - the openclaw pod's own readiness probe (which runs inside the pod) passes, the pod is 1/1 Ready, and Ollama on the Windows side of the same tower is reachable from the tailnet. **Cause** The WSL k3s agent has registered its `InternalIP` as `172.27.244.126`, which is the address of its Hyper-V NIC inside Windows. That address is not reachable from kai-server over the tailnet (it's a Microsoft-assigned RFC1918 NAT inside the WSL guest). The k3s apiserver uses the kubelet's InternalIP when proxying exec / logs / port-forward, so all of those break for pods scheduled on this node. The kubelet itself is fine - traffic that goes through the CNI (pod-to-pod, the in-pod liveness probe) works. This was masked until VHGC's openclaw rollout because every other workload on the tower has either run in-pod-only or been driven from a session on the WSL node directly. **Fix options** 1. **Re-register the kubelet with `--node-ip=<wsl-tailnet-ip>`** - the cleanest fix. The WSL box's tailnet IP is `100.107.172.77` (resolved from `tailscale status`; SSM the FQDN, not the IP). Set `INSTALL_K3S_EXEC` / `K3S_NODE_IP` or systemd unit override on the WSL agent so kubelet advertises the tailnet IP as its InternalIP. apiserver then dials kubelet over the tailnet, which already works for any other tailnet-to-tailnet traffic. 2. **Accept it and document** - kubectl exec / logs / port-forward to WSL-node pods only work from a session on that node. Workable but every cross-node verification step then has to coordinate through the agent channel. Option 1 is the right call. Option 2 is what we're doing today, and it added 20 minutes of channel-handoff overhead to a routine verify. **Acceptance** - `coily ops kubectl logs -n openclaw deploy/openclaw` from kai-server returns logs. - `coily ops kubectl port-forward -n openclaw svc/openclaw 18789:18789` from kai-server stays up long enough for a curl. - Document the fix in `docs/k3s-deploy-notes.md` under the existing MagicDNS / tailnet-IP traps section. **Provenance** Surfaced while applying `deploy/openclaw.yml` for VHGC. Comms #38 on the channel has the full apply context. The deploy itself is up; only the apiserver-mediated verification path is broken.
coilysiren added
P3
and removed
P2
labels 2026-05-31 07:00:38 +00:00
Sign in to join this conversation.
No labels
P0
P1
P2
P3
P4
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
coilyco-flight-deck/infrastructure#147
No description provided.