Kubernetes

Taming Kubernetes Complexity: Reusable Manifests with Kustomize

This was originally going to be a heading in my previous post, however I realized that app deployment and configuration deserved its own post. Deploying containers into Kubernetes is quite straightforward. Almost every self-hosted app has instructions on how to deploy either to Kubernetes (be it directly via yaml files or a tool like Helm) or using a Docker Compose file. If there aren’t instructions for Kubernetes the container definitions in docker-compose.yml can be rewritten as a Pod specification and then wrapped in Deployments, StatefulSets or left purely as a pod. Deploying multiple containers in a consistent and coherent manner adds a layer of complexity since you want to ensure all your containers use the right annotations, keep their persistent data in the right StorageClass and define their services and load balancing rules in the same manner. ...

Kubernetes Homelab Rescue: Troubleshooting with AI (and the Lessons Learned)

Recently I woke up to a fun set of issues with my homelab. In an effort to make more use of LLMs I turned to Claude for troubleshooting assistance, which did help but also once again reminded me of the risks of following AI instructions without appropriate knowledge of which suggestions are risky. The VPS in my Kubernetes cluster, jlpks8888, had powered itself off in the early morning without any warning or reason. Trying to power it back on through the admin console had no effect so a ticket was opened. Looking at my cluster status to ensure everything had failed over I had a single failing pod, diun1. It had failed over to arthur but it looked as though the PVC was failing to mount so it kept crashing. Diun being in failed state was not an issue right that moment, it acts on a schedule so any missed container version updates would be caught on the next run. The fact that any pod was in a failed state was what caught my eye. The machine had some pending updates so I ran them and rebooted, I probably would have done so in the next 24-36 hours anyway. Blackstaff, my Kubernetes master, had somehow lost track of the tailscale DNS server so it couldn’t resolve any of my other machines. sudo systemctl restart tailscaled fixed this issue. Arthur finished rebooting and I’m confronted with every single Longhorn related pod stuck in Crash Loop Backoff. Time to wake up fully and actually figure out what is going on. ...

Homelab Kubernetes Automation: Why I Chose K3s

Now that I’ve covered how I ensure my homelab stays connected and gets configured I can finally start talking about application workload orchestration. This will be spread over multiple posts to allow for more depth. Kubernetes offers a convenient, albeit complex, platform for running containerized1 applications. Deploying a Kubernetes cluster in a cloud provider (Azure, Google, Amazon, etc) is fairly straightforward since they take care of ensuring the underlying compute, network and storage are configured. Deploying a full Kubernetes cluster manually can be significantly more complex since you need to ensure every required component is configured. A team of coworkers tried to do so 5-6 years ago and ran into numerous issues due to unclear documentation or unsupported configurations. Many of the improvements to Kubernetes in the intervening years have made things more straightforward but it is still more complex than I would like for a homelab. ...

Why Programmatic Configuration Matters: From UptimeKuma to Gatus

I’ve been in the process of upgrading my homelab for the last few months. I will be writing a more in depth series of posts about it in the near future. When I first deployed my new homelab (see my previous post on why I had to redeploy) I went with Uptime Kuma for status monitoring. I quite liked the UI for the statuses themselves as well as the dashboards I could create. Unfortunately all the persistent storage data for these status checks was lost during the rebuild and when looking for a more programmatic way to re-create the checks I found out that there is no official API to do so. Now I could install the Unofficial UptimeKuma API Wrapper and use Python to create all monitor endpoints, but then I would need to ensure I kept making these updates each time I added a new application. Instead I was able to settle on Gatus by browsing the list of awesome-status-pages and then using a combination of Claude Sonnet, Reddit and Google to determine a path forward. ...

Homelab Disaster Recovery: When Borg Backups Meet Longhorn Volumes

A few weeks ago my Kubernetes-based Homelab suffered a catastrophic failure. The internal routing no longer worked nor did any DNS resolution for in-cluster services. Since I had almost everything defined in yaml files and had all my persistent data stored in Longhorn Volumes, which were they themselves backed up, I felt safe destroying and rebuilding the cluster. Unfortunately while doing so the Longhorn volumes turned out to either be missing (only 9 of roughly 17 were found) or were orphans that refused to attach to Kubernetes. I was able to find the remaining volumes in a backup so started restoring but when trying to mount them individually using a Longhorn Docker container the contents were corrupt. Working through my Borg backups of the volumes only found more corruption except for my PGDump backups, which luckily contained the vast majority of the data I would hate to lose. ...