Kubernetes the self-hosted single node way
For more than ten years, I have a self-hosted server providing common services including mail, dns and http.
One of the first architecture of the server was amazing
because it was a FreeBSD server with
jails on top of
zfs.
It was the same idea than
cgroups,
docker and
overlayfs before they even exists !
This setup was very fun to maintain, but I had less time to do it, so in 2012 I switched the server to Debian GNU/Linux first with LXC and then to qemu/kvm.
Having virtual servers in kvm is fine from a security point of view, but it also uses a lot of memory, my 16GB of memory were almost fully used and I ended to think “on which existing VM I’ll put this new service ?” Which is bad…
Also the configurations of the different servers were very complex, even when managed by saltstack because the mix of jinja and yaml was also hard to read and to write. I was wasting time to understand the setup I’ve done several years ago.
On another side, I became more and more familiar with docker so I decided to give a try to kubernetes. To be clear, kubernetes require high docker and automation skills and has itself a pretty high learning curve. That being said, the gain is:
- Configuration is pure yaml and centralized inside kubernetes manifests in a git repository.
- My dockerfiles are quite generic and simple to maintain.
- I rebuild images to get security updates and re-deploy everyday
- I have a simple and common way to get logs, inspect configuration, spawn a shell in a container.
- It’s very easy for me to rebuild a new cluster in case of moving the physical server or disaster recovery.
Multi-nodes kubernetes clusters require a reliable distributed storage system, which is quite complex for self hosting. Since I only have one physical server I decided to install a single node cluster with local storage in a VM.
Installing
There’s a tons of way for installing kubernetes, including dedicated installers like kubespray or kops targeting various usages and cloud providers.
For my single node cluster, I decided to go with kubeadm
I started a 8GB debian stretch VM with no swap and set
GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"
in
/etc/default/grub
, then run update-grub && reboot
.
Then I installed docker and kubernetes from official repositories:
$ sudo apt-get update && sudo apt-get -y install apt-transport-https
$ wget https://download.docker.com/linux/debian/gpg -O - | sudo apt-key --keyring /etc/apt/trusted.gpg.d/docker.gpg add -
$ echo "deb [arch=amd64] https://download.docker.com/linux/debian stretch stable" | sudo tee /etc/apt/sources.list.d/docker.list
$ wget https://packages.cloud.google.com/apt/doc/apt-key.gpg -O - | sudo apt-key --keyring /etc/apt/trusted.gpg.d/kubernetes.gpg add -
$ echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
$ sudo apt-get update && sudo apt-get -y install docker-ce kubectl kubelet kubeadm kubernetes-cni
# It's important to not upgrade these automatically upgrading kubernetes
# require some additionals commands, so hold the packages.
$ sudo apt-mark hold kubectl kubelet kubeadm
Then I bootstrapped the cluster with kubeadm
:
sudo kubeadm init --service-dns-domain k.in.philpep.org --pod-network-cidr 10.42.0.0/16 --service-cidr 10.96.0.0/12
Here each of my pods will have a domain name <pod name>.<namespace>.k.in.philpep.org
inside the 10.42.0.0/16 address range.
The service CIDR is used by the Service resources.
Here it’s important to use CIDR not already used in your private networks or they will be unreachable from the cluster…
After kubeadm init
you can control your cluster with kubectl
:
$ mkdir ~/.kube
$ sudo cat /etc/kubernetes/admin.conf | cat > ~/.kube/config
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k2 NotReady master 4m17s v1.14.1
At this point important to know that kubelet is a systemd service which
bootstrap the cluster using static manifests in /etc/kubernetes/manifests/
and its config is in /var/lib/kubelet/config.yaml
, checking kubelet logs
with journalctl -f
might help.
Since we’re single node, your master node will also be a worker node, you
will have to mark your node as schedulable for pods by running kubectl taint nodes --all node-role.kubernetes.io/master-
Then you will have to install a network plugin which will handle network and policies in your cluster.
I used calico, but I have heard that weavenet might be simpler than calico.
$ wget https://docs.projectcalico.org/v3.7/manifests/calico.yaml
# Then I modified the manifest with:
# CALICO_IPV4POOL_CIDR set to "10.42.0.0/16"
# CALICO_IPV4POOL_IPIP set to "Never"
# veth_mtu set to "1500"
$ kubectl apply -f calico.yaml
$ watch -n 5 'kubectl get pods --all-namespaces'
And wait for all pods to be running and your node should be marked as “Ready”:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k2 Ready master 32m v1.14.1
Installing an ingress controller and cert-manager
Ingress is an awesome kubernetes resource that will provide an automatic configuration of a http/https frontend on top of your Services.
I used ingress-nginx, but I have heard that traefik is also a good choice here.
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/mandatory.yaml
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/provider/baremetal/service-nodeport.yaml
$ kubectl -n ingress-nginx get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-nginx NodePort 10.111.205.201 <none> 80:32231/TCP,443:31334/TCP 37s
Here node port tcp 32231 is forwarded to port 80 of our ingress controller inside kubernetes.
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k2 Ready master 41m v1.14.1 10.0.2.15 <none> Debian stretch 4.9.0-9-amd64 docker://18.9.5
$ curl -I http://10.0.2.15:32231
HTTP/1.1 404 Not Found
Server: nginx/1.15.10
Date: Sun, 05 May 2019 17:26:57 GMT
Content-Type: text/html
Content-Length: 154
Connection: keep-alive
So for my server I just added a rule on the VM host forwarding port http to the VM on port 32231 and port https to the VM on port 31334.
cert-manager is also an awesome project providing letsencrypt certificates automatically just by writing some annotation to the Ingress resource. I’ll not go in deep in the configuration, but just let you know that everything you can do with letsencrypt is handled here, http or DNS ACME challenge, automatic renew etc.
For instance, you can create a “Certificate” request for a wildcard certificate
with DNS validation and then modify ingress-nginx to use this certificate with
--default-ssl-certificate=<namespace>/<secret>
. Then you just have to
declare tls hosts in the Ingress resource and you have TLS available
immediately \o/
Persistent storage
Here I just used a single
local
PersistentVolume and PersistentVolumeClaim and mount each directory on its pod using subPath
.
Probably a local StorageClass that run a mkdir
in some data directory upon
a PersistentVolumeClaim creation could be better.
Logs
I tried to use the common EFK stack (Elasticsearch, Fluentd and Kibana). But then it appear the logging system actually used more resources (disk, cpu and ram) than all my services… I think having a reliable logging infrastructure with EFK require high configuration skills in both fluentd and elasticsearch indexes, or just bigger servers.
Since I already have a rsyslog server, I just wrote
some configuration to read logs from /var/log/containers/
, extract
namespace and pod from the filename and send logs to the server. Here’s my /etc/rsyslog.d/kubernetes.conf
:
$MaxMessageSize 16k
module(load="imfile" mode="inotify")
module(load="mmjsonparse")
input(type="imfile" file="/var/log/containers/*.log"
tag="kubernetes" addmetadata="on" reopenOnTruncate="on" ruleset="remoteLog")
template(name="kubernetes" type="list") {
constant(value="k2 kubernetes ")
property(name="!k8s_namespace")
constant(value="/")
property(name="!k8s_pod")
constant(value=" ")
property(name="!log" droplastlf="on")
constant(value="\n")
}
ruleset(name="remoteLog") {
if $msg startswith "{" then {
action(type="mmjsonparse" cookie="")
}
set $!k8s = field($!metadata!filename, "/", 5);
set $!k8s_namespace = field($!k8s, "_", 2);
set $!k8s_pod = field($!k8s, "_", 1);
action(type="omfwd" target="192.168.62.54" port="514" protocol="udp" template="kubernetes")
stop
}
Network policies
I usually want to limit network access to attackers executing code through a service having a security bug. For this I think it’s simpler to limit outgoing access from the process rather than limiting incoming traffic to all other services.
To do this with kubernetes, I used NetworkPolicy resources.
For example:
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: egress-dns
spec:
podSelector:
matchLabels:
egress-dns: "true"
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
name: kube-system
- podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: egress-https
spec:
podSelector:
matchLabels:
egress-https: "true"
policyTypes:
- Egress
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
ports:
- protocol: TCP
port: 443
Here I define a policy egress-dns
which only allow to resolve on the
dedicated DNS server and a egress-https
allowing outgoing https traffic on
internet.
Then I just have to add a label egress-dns
and/or egress-https
set to
"true"
to the pods given what access they need. Because network policies
applies if at least one network policy is associated with the pod.
Upgrades
Handling upgrade with kubeadm is quite simple as long as you read the upgrade notes carefully. For the latest releases I just ran:
$ export v=1.14.1
$ apt-get update
$ apt-get install kubeadm=$v-00 kubelet=$v-00
$ kubeadm upgrade plan
[...]
$ kubeadm upgrade apply v$v
And it worked just fine.
But keep in mind that kubernetes release often, ingress-nginx, calico and cert-manager too. You will have to update them as well.
Conclusion
I’m quite happy with my kubernetes migration, adding new services, accessing logs and debugging is more simple than before and I have a better control of what’s running on my servers.
But this still requires a quite complex stack, I have my own docker registry, images are built by jenkins and I rebuild them upon security upgrades of the underlying OS distro, I explain this in my previous post.
I also wrote my own tool to deploy latest docker image builds, this is called Imago.
So, in definitive, kubernetes is complex, but running servers is complex too, and I think kubernetes offer a proper solution to manage servers, services and processes.