Stretch Kubernetes cluster with K3S and Netbird
- tags
- #Kubernetes #Wireguard #Netbird #Netmaker #Tailscale #K3s #Raspberry #Pi #Rpi
- published
- reading time
- 7 minutes
So I want to have my self-hosted Kubernetes cluster. I want to have more than one node with as little budget as possible. I want to have nodes in different geographical locations. I don’t want to expose my nodes to the Big Scary Internet if possible.
This requirements sound reasonable (esp. when you have some spare Raspberry Pis that can be put in different apartments). Described cluster architecture is known as stretched Kubernetes cluster Let’s build it!
Decision process
Network
Our nodes that are spread in different locations need to talk to each other. We have couple of options to achieve that:
- Wireguard seems to be a natural choice, there
are only two caveats though:
- each node will need to have a public IP
- they would have to expose at least one UDP port
- adding new nodes comes with some overhead, because public keys would need to be shared between all nodes
- NetBird is an open source overlay network that’s using WireGuard under the hood. It has many nice features but the one that we care the most now is ability to automatically create point-to-point WireGuard tunnels between the nodes
- Tailscale probably the most popular WireGuard VPN nowadays. Some of its components are open source. Feature-wise it’s very similar to NetBird.
Let’s go with NetBird. Tailscale has pretty good support and various usecases are well documented. Not a case with NetBrid that I happen to use. Fun fact is that NetBrid uses kernel WireGuard, while Tailscale uses userland WireGuard.
Kubernetes distro
We want to have something lightweight and because of limited budget we want to self-host all components. On top of that it must support ARM architecture – we have some spare Raspberry Pi’s, remember?
Let’s go with K3S – it seems to be more popular nowadays.
Entrypoint
Because we don’t want to expose too much to the internet let’s have public facing server hosted on some cloud provider. There are a couple of budget options, let’s name at least 3:
Let’s go with Hetzner – for ARM with 2 vCPUs, 4GB of RAM and IPv4 (currenlty) it costs $4.59 monthly for European location. Not bad! We also get 40 GB of NVME SSD and 20 TB for outgoing traffic.
Implementation
OK – for the simplicity let’s cover a scenario where we have one VPS on Hetzner and one Raspberry Pi. We can ssh to both of them and run commands as root. Both of our machines have Debian or Debian-derived distro (e.g. Ubuntu). On top of that we have an account on https://app.netbird.io/
Please make sure that your Hetzner machine is behind firewall and only necessary ports are open. Exposing Kubernetes control plane to the Big Scary Internet is a no-no.
On Hetzner firewall I recommend opening ICMP
(for pings) and 51820/udp
. The
latter is not required in order to connect via SSH but I discovered that when
it’s not open then:
- I see this in
/var/log/netbird/client.log
on my Raspberry
client/internal/peer/conn.go:533: send offer to peer
client/internal/peer/conn.go:261: OnRemoteAnswer, status ICE: Disconnected, status relay: Connected
client/internal/peer/handshaker.go:91: received connection confirmation, running version 0.38.2 and with remote WireGuard listen port 51820
client/internal/peer/handshaker.go:79: wait for remote offer confirmation
client/internal/peer/conn.go:533: send offer to peer
client/internal/peer/conn.go:261: OnRemoteAnswer, status ICE: Disconnected, status relay: Connected
client/internal/peer/handshaker.go:91: received connection confirmation, running version 0.38.2 and with remote WireGuard listen port 51820
client/internal/peer/handshaker.go:79: wait for remote offer confirmation
client/internal/peer/conn.go:533: send offer to peer
- Which I correlated with ugly K3S behavior that was exhausting CPU resources
on Hetzner and producing this logs in
sudo journalctl -xeu k3s-agent.service
on Raspberry:
W0321 20:18:13.728945 10565 transport.go:356] Unable to cancel request for *otelhttp.Transport
E0321 20:18:13.729070 10565 controller.go:195] "Failed to update lease" err="Put \"https://127.0.0.1:6444/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/debian?timeout=10s\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
W0321 20:18:23.729523 10565 transport.go:356] Unable to cancel request for *otelhttp.Transport
E0321 20:18:23.729601 10565 controller.go:195] "Failed to update lease" err="Put \"https://127.0.0.1:6444/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/debian?timeout=10s\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
Setup NetBird
Let’s generate NetBird setup keys – we will need them to join our VPN mesh. For more info check https://docs.netbird.io/how-to/register-machines-using-setup-keys
Ok, let’s install and setup NetBird:
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg -y
curl -sSL https://pkgs.netbird.io/debian/public.key | sudo gpg --dearmor --output /usr/share/keyrings/netbird-archive-keyring.gpg
echo 'deb [signed-by=/usr/share/keyrings/netbird-archive-keyring.gpg] https://pkgs.netbird.io/debian stable main' | sudo tee /etc/apt/sources.list.d/netbird.list
sudo apt-get update
sudo apt-get install netbird
netbird up --setup-key <SETUP KEY>
Setup K3S
Time for our Kubernetes cluster. There is a dedicated doc for that but I had some hard times setting up a working cluster with it. Luckily this blogpost by Alex Feiszli turned out to be working also with NetBird after some slight modifications.
Let’s install K3S server on our Hetzner cluster first
NETBIRD_IFACE=wt0
NETBIRD_SERVER_IP=100.123.177.254
NETBIRD_CIDR=100.123.0.0/16
DEFAULT_NO_PROXY='127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16'
curl -sfL https://get.k3s.io |
NO_PROXY="127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,${NETBIRD_CIDR}" \
INSTALL_K3S_EXEC="server --node-ip ${NETBIRD_SERVER_IP} --node-external-ip ${NETBIRD_SERVER_IP} --flannel-iface ${NETBIRD_IFACE} --disable=traefik" \
sh -
where:
NETBIRD_IFACE
is taken fromip a
NETBIRD_SERVER_IP
is a VPN IP address of our hetzner serverNETBIRD_CIDR
is a “catch all” for every possible VPN IPDEFAULT_NO_PROXY
is a value from https://docs.k3s.io/advanced#configuring-an-http-proxy – it turns out thatNO_PROXY
needs to be modified – otherwise pods wont be able to reach other pods in different node
Let’s check the result of sudo kubectl get nodes
- status should be “Ready”
NAME STATUS ROLES AGE VERSION
htz-euc-fsn1-bastion-1 Ready control-plane,master 122m v1.31.6+k3s1
We can also check if all pods are running with sudo kubectl get pods --all-namespaces
Let’s now get a Node Token that will be needed to add more nodes into our cluster:
sudo cat /var/lib/rancher/k3s/server/node-token
Ok, let’s join our Raspberry Pi to the cluster. We’re gonna need to setup K3S agent
SERVER_NODE_TOKEN=...
NETBIRD_IFACE=wt0
NETBIRD_CIDR=100.123.0.0/16
DEFAULT_NO_PROXY='127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16'
NETBIRD_SERVER_IP=100.123.177.254
NETBIRD_AGENT_IP=100.123.205.52
curl -sfL https://get.k3s.io |
NO_PROXY="${DEFAULT_NO_PROXY},${NETBIRD_CIDR}" \
INSTALL_K3S_EXEC="agent --server https://${NETBIRD_SERVER_IP}:6443 --token ${SERVER_NODE_TOKEN} --node-ip ${NETBIRD_AGENT_IP} --node-external-ip ${NETBIRD_AGENT_IP} --flannel-iface ${NETBIRD_IFACE}" \
sh -
where:
SERVER_NODE_TOKEN
is the value from/var/lib/rancher/k3s/server/node-token
that we checked in Hetzner machineNETBIRD_AGENT_IP
is a VPN IP address of our Raspberry Pi
Let’s get back to our Hetzner machine and run sudo kubectl get nodes
. We
should see two nodes now:
NAME STATUS ROLES AGE VERSION
htz-euc-fsn1-bastion-1 Ready control-plane,master 132m v1.31.6+k3s1
local-rpi Ready <none> 129m v1.31.6+k3s1
Test
Now we need to make sure if there are no networking issues between the nodes. I will reuse the steps from previousely mentioned Alex’s blogpost but another valuable test is to use Echo-Server
So in our Hetzner machine lets run:
echo '
apiVersion: apps/v1
kind: Deployment
metadata:
name: pingtest
namespace: pingtest
spec:
selector:
matchLabels:
app: pingtest
replicas: 2
template:
metadata:
labels:
app: pingtest
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- pingtest
topologyKey: "kubernetes.io/hostname"
containers:
- name: busybox
image: busybox
command: ["/bin/sh", "-ec", "sleep 1000"]
' > pingtest.yaml
sudo kubectl create namespace pingtest
sudo kubectl apply -f pingtest.yaml
Once we see that pods are deployed with sudo kubectl get pods -n pingtest -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pingtest-c867f8fcd-prm87 1/1 Running 0 15s 10.42.1.2 local-rpi <none> <none>
pingtest-c867f8fcd-vlnh2 1/1 Running 0 15s 10.42.0.5 htz-euc-fsn1-bastion-1 <none> <none>
we can ensure there are no connectivity issues
sudo kubectl exec -ti pingtest-c867f8fcd-prm87 -n pingtest -- ping -c 3 10.42.1.2
sudo kubectl exec -ti pingtest-c867f8fcd-prm87 -n pingtest -- ping -c 3 10.42.0.5
sudo kubectl exec -ti pingtest-c867f8fcd-vlnh2 -n pingtest -- ping -c 3 10.42.1.2
sudo kubectl exec -ti pingtest-c867f8fcd-vlnh2 -n pingtest -- ping -c 3 10.42.0.5
Next steps
You can now host some services on your own Kubernetes cluster. If you’re interested in simple setup I really recommend https://paulbutler.org/2024/the-haters-guide-to-kubernetes/ – there is no need to overcomplicate things, but - as usual - YMMV.
I think running Caddy outside of the cluster (for certificate automation and ingress) works perfectly.
If you feel adventurous you can make your K3S cluster HA:
For further reading I also recommend https://rpi4cluster.com/
Summary
We’ve created our own, self-hosted Kubernetes cluster. We’ve spread the nodes between cloud and on-prem locations. We’ve leveraged WireGuard based VPN for connectivity and increased security.