For more than ten years, I have a self-hosted server providing common services including mail, dns and http.

FreeBSD jails

One of the first architecture of the server was amazing because it was a FreeBSD server with jails on top of zfs. It was the same idea than cgroups, docker and overlayfs before they even exists !

This setup was very fun to maintain, but I had less time to do it, so in 2012 I switched the server to Debian GNU/Linux first with LXC and then to qemu/kvm.

Having virtual servers in kvm is fine from a security point of view, but it also uses a lot of memory, my 16GB of memory were almost fully used and I ended to think "on which existing VM I'll put this new service ?" Which is bad...

Also the configurations of the different servers were very complex, even when managed by saltstack because the mix of jinja and yaml was also hard to read and to write. I was wasting time to understand the setup I've done several years ago.

Kubernetes

On another side, I became more and more familiar with docker so I decided to give a try to kubernetes. To be clear, kubernetes require high docker and automation skills and has itself a pretty high learning curve. That being said, the gain is:

  • Configuration is pure yaml and centralized inside kubernetes manifests in a git repository.
  • My dockerfiles are quite generic and simple to maintain.
  • I rebuild images to get security updates and re-deploy everyday
  • I have a simple and common way to get logs, inspect configuration, spawn a shell in a container.
  • It's very easy for me to rebuild a new cluster in case of moving the physical server or disaster recovery.

Multi-nodes kubernetes clusters require a reliable distributed storage system, which is quite complex for self hosting. Since I only have one physical server I decided to install a single node cluster with local storage in a VM.

Installing

There's a tons of way for installing kubernetes, including dedicated installers like kubespray or kops targeting various usages and cloud providers.

For my single node cluster, I decided to go with kubeadm

I started a 8GB debian stretch VM with no swap and set GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1" in /etc/default/grub, then run update-grub && reboot.

Then I installed docker and kubernetes from official repositories:

$ sudo apt-get update && sudo apt-get -y install apt-transport-https
$ wget https://download.docker.com/linux/debian/gpg -O - | sudo apt-key --keyring /etc/apt/trusted.gpg.d/docker.gpg add -
$ echo "deb [arch=amd64] https://download.docker.com/linux/debian stretch stable" | sudo tee /etc/apt/sources.list.d/docker.list
$ wget https://packages.cloud.google.com/apt/doc/apt-key.gpg -O - | sudo apt-key --keyring /etc/apt/trusted.gpg.d/kubernetes.gpg add -
$ echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
$ sudo apt-get update && sudo apt-get -y install docker-ce kubectl kubelet kubeadm kubernetes-cni
# It's important to not upgrade these automatically upgrading kubernetes
# require some additionals commands, so hold the packages.
$ sudo apt-mark hold kubectl kubelet kubeadm

Then I bootstrapped the cluster with kubeadm:

sudo kubeadm init --service-dns-domain k.in.philpep.org --pod-network-cidr 10.42.0.0/16 --service-cidr 10.96.0.0/12

Here each of my pods will have a domain name <pod name>.<namespace>.k.in.philpep.org inside the 10.42.0.0/16 address range. The service CIDR is used by the Service resources.

Here it's important to use CIDR not already used in your private networks or they will be unreachable from the cluster...

After kubeadm init you can control your cluster with kubectl:

$ mkdir ~/.kube
$ sudo cat /etc/kubernetes/admin.conf | cat > ~/.kube/config
$ kubectl get nodes
NAME   STATUS     ROLES    AGE     VERSION
k2     NotReady   master   4m17s   v1.14.1

At this point important to know that kubelet is a systemd service which bootstrap the cluster using static manifests in /etc/kubernetes/manifests/ and its config is in /var/lib/kubelet/config.yaml, checking kubelet logs with journalctl -f might help.

Since we're single node, your master node will also be a worker node, you will have to mark your node as schedulable for pods by running kubectl taint nodes --all node-role.kubernetes.io/master-

Then you will have to install a network plugin which will handle network and policies in your cluster.

I used calico, but I have heard that weavenet might be simpler than calico.

$ wget https://docs.projectcalico.org/v3.7/manifests/calico.yaml
# Then I modified the manifest with:
# CALICO_IPV4POOL_CIDR set to "10.42.0.0/16"
# CALICO_IPV4POOL_IPIP set to "Never"
# veth_mtu set to "1500"
$ kubectl apply -f calico.yaml
$ watch -n 5 'kubectl get pods --all-namespaces'

And wait for all pods to be running and your node should be marked as "Ready":

$ kubectl get nodes
NAME   STATUS     ROLES    AGE     VERSION
k2     Ready      master   32m     v1.14.1

Installing an ingress controller and cert-manager

Ingress is an awesome kubernetes resource that will provide an automatic configuration of a http/https frontend on top of your Services.

I used ingress-nginx, but I have heard that traefik is also a good choice here.

$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/mandatory.yaml
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/provider/baremetal/service-nodeport.yaml
$ kubectl -n ingress-nginx get svc
NAME            TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
ingress-nginx   NodePort   10.111.205.201   <none>        80:32231/TCP,443:31334/TCP   37s

Here node port tcp 32231 is forwarded to port 80 of our ingress controller inside kubernetes.

$ kubectl get nodes -o wide
NAME   STATUS   ROLES    AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION   CONTAINER-RUNTIME
k2     Ready    master   41m   v1.14.1   10.0.2.15     <none>        Debian stretch   4.9.0-9-amd64    docker://18.9.5
$ curl -I http://10.0.2.15:32231
HTTP/1.1 404 Not Found
Server: nginx/1.15.10
Date: Sun, 05 May 2019 17:26:57 GMT
Content-Type: text/html
Content-Length: 154
Connection: keep-alive

So for my server I just added a rule on the VM host forwarding port http to the VM on port 32231 and port https to the VM on port 31334.

cert-manager is also an awesome project providing letsencrypt certificates automatically just by writing some annotation to the Ingress resource. I'll not go in deep in the configuration, but just let you know that everything you can do with letsencrypt is handled here, http or DNS ACME challenge, automatic renew etc.

For instance, you can create a "Certificate" request for a wildcard certificate with DNS validation and then modify ingress-nginx to use this certificate with --default-ssl-certificate=<namespace>/<secret>. Then you just have to declare tls hosts in the Ingress resource and you have TLS available immediately \o/

Persistent storage

Here I just used a single local PersistentVolume and PersistentVolumeClaim and mount each directory on its pod using subPath.

Probably a local StorageClass that run a mkdir in some data directory upon a PersistentVolumeClaim creation could be better.

Logs

I tried to use the common EFK stack (Elasticsearch, Fluentd and Kibana). But then it appear the logging system actually used more resources (disk, cpu and ram) than all my services... I think having a reliable logging infrastructure with EFK require high configuration skills in both fluentd and elasticsearch indexes, or just bigger servers.

Since I already have a rsyslog server, I just wrote some configuration to read logs from /var/log/containers/, extract namespace and pod from the filename and send logs to the server. Here's my /etc/rsyslog.d/kubernetes.conf:

$MaxMessageSize 16k
module(load="imfile" mode="inotify")
module(load="mmjsonparse")
input(type="imfile" file="/var/log/containers/*.log"
      tag="kubernetes" addmetadata="on" reopenOnTruncate="on" ruleset="remoteLog")
template(name="kubernetes" type="list") {
  constant(value="k2 kubernetes ")
  property(name="!k8s_namespace")
  constant(value="/")
  property(name="!k8s_pod")
  constant(value=" ")
  property(name="!log" droplastlf="on")
  constant(value="\n")
}
ruleset(name="remoteLog") {
  if $msg startswith "{" then {
    action(type="mmjsonparse" cookie="")
  }
  set $!k8s = field($!metadata!filename, "/", 5);
  set $!k8s_namespace = field($!k8s, "_", 2);
  set $!k8s_pod = field($!k8s, "_", 1);
  action(type="omfwd" target="192.168.62.54" port="514" protocol="udp" template="kubernetes")
  stop
}

Network policies

I usually want to limit network access to attackers executing code through a service having a security bug. For this I think it's simpler to limit outgoing access from the process rather than limiting incoming traffic to all other services.

To do this with kubernetes, I used NetworkPolicy resources.

For example:

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: egress-dns
spec:
  podSelector:
    matchLabels:
      egress-dns: "true"
  policyTypes:
    - Egress
  egress:
    - to:
      - namespaceSelector:
      matchLabels:
        name: kube-system
      - podSelector:
      matchLabels:
        k8s-app: kube-dns
      ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: egress-https
spec:
  podSelector:
    matchLabels:
      egress-https: "true"
  policyTypes:
    - Egress
  egress:
    - to:
      - ipBlock:
      cidr: 0.0.0.0/0
      except:
        - 10.0.0.0/8
        - 172.16.0.0/12
        - 192.168.0.0/16
      ports:
    - protocol: TCP
      port: 443

Here I define a policy egress-dns which only allow to resolve on the dedicated DNS server and a egress-https allowing outgoing https traffic on internet.

Then I just have to add a label egress-dns and/or egress-https set to "true" to the pods given what access they need. Because network policies applies if at least one network policy is associated with the pod.

Upgrades

Handling upgrade with kubeadm is quite simple as long as you read the upgrade notes carefully. For the latest releases I just ran:

$ export v=1.14.1
$ apt-get update
$ apt-get install kubeadm=$v-00 kubelet=$v-00
$ kubeadm upgrade plan
[...]
$ kubeadm upgrade apply v$v

And it worked just fine.

But keep in mind that kubernetes release often, ingress-nginx, calico and cert-manager too. You will have to update them as well.

Conclusion

I'm quite happy with my kubernetes migration, adding new services, accessing logs and debugging is more simple than before and I have a better control of what's running on my servers.

But this still requires a quite complex stack, I have my own docker registry, images are built by jenkins and I rebuild them upon security upgrades of the underlying OS distro, I explain this in my previous post.

I also wrote my own tool to deploy latest docker image builds, this is called Imago.

So, in definitive, kubernetes is complex, but running servers is complex too, and I think kubernetes offer a proper solution to manage servers, services and processes.