Kubernetes Homelab Rescue: Troubleshooting with AI (and the Lessons Learned)

Recently I woke up to a fun set of issues with my homelab. In an effort to make more use of LLMs I turned to Claude for troubleshooting assistance, which did help but also once again reminded me of the risks of following AI instructions without appropriate knowledge of which suggestions are risky. The VPS in my Kubernetes cluster, jlpks8888, had powered itself off in the early morning without any warning or reason. Trying to power it back on through the admin console had no effect so a ticket was opened. Looking at my cluster status to ensure everything had failed over I had a single failing pod, diun1. It had failed over to arthur but it looked as though the PVC was failing to mount so it kept crashing. Diun being in failed state was not an issue right that moment, it acts on a schedule so any missed container version updates would be caught on the next run. The fact that any pod was in a failed state was what caught my eye. The machine had some pending updates so I ran them and rebooted, I probably would have done so in the next 24-36 hours anyway. Blackstaff, my Kubernetes master, had somehow lost track of the tailscale DNS server so it couldn’t resolve any of my other machines. sudo systemctl restart tailscaled fixed this issue. Arthur finished rebooting and I’m confronted with every single Longhorn related pod stuck in Crash Loop Backoff. Time to wake up fully and actually figure out what is going on. ...

July 14, 2025 · 8 min · 1538 words · Jonathan

Homelab Kubernetes Automation: Why I Chose K3s

Now that I’ve covered how I ensure my homelab stays connected and gets configured I can finally start talking about application workload orchestration. This will be spread over multiple posts to allow for more depth. Kubernetes offers a convenient, albeit complex, platform for running containerized1 applications. Deploying a Kubernetes cluster in a cloud provider (Azure, Google, Amazon, etc) is fairly straightforward since they take care of ensuring the underlying compute, network and storage are configured. Deploying a full Kubernetes cluster manually can be significantly more complex since you need to ensure every required component is configured. A team of coworkers tried to do so 5-6 years ago and ran into numerous issues due to unclear documentation or unsupported configurations. Many of the improvements to Kubernetes in the intervening years have made things more straightforward but it is still more complex than I would like for a homelab. ...

July 7, 2025 · 5 min · 1012 words · Jonathan

Managing My Homelab: How I Use Salt for Customization and Automation

Given the choice I would treat all my homelab servers as cattle1 however there are components that do need bespoke configuration. My main home server acts as my Kubernetes Master, my on-site backup server and my Salt master. My ArchLinux home server acts as a package mirror for my desktop and laptop while also building my AUR packages to ensure they stay up to date and consistent between machines. On the other hand most of my machine configuration is fairly identical between my homelab machines. I want the same CLI tools available, similar backup configurations and want to make sure any new host can be added to Kubernetes without issue if applicable. Between these common configurations and my preference for Programmatic Configuration I knew I needed some sort of Configuration Management tool. Even my pet servers become more cattle-like since their configuration is easily reproduced on a new machine in case of a failure. ...

June 9, 2025 · 7 min · 1315 words · Jonathan

Secure Homelab Connectivity: How Headscale Handles my Needs

As I mentioned previously, I have a mixture of local and remote machines in my homelab as well as a laptop and phone that can and will connect to my various systems and applications. This means I need some sort of VPN or SD-WAN1 to ensure that the traffic remains encrypted and to provide a reliable way to connect without having to expose everything to the public internet. ...

June 1, 2025 · 5 min · 912 words · Jonathan

What I host: Comentario - Self-Hosted Comments

When I started writing my blog I realized that I wanted to add comments. I considered using something like Disqus but was concerned by the number of ads I have heard they serve when you aren’t using an ad-blocker. I use an ad-blocker so I hadn’t seen that directly (at least not in recent memory) but given the choice I’d rather limit the intrusiveness of ads. With that in mind I looked for self-hosted comment systems to see what options there were. ...

May 18, 2025 · 3 min · 555 words · Jonathan

Homelab Hosting Introduction

For me, a homelab only truly comes to life once you start hosting applications on it. My previous homelab only ran a handful of applications and I basically ignored it. This incarnation already hosts several times as many and I make use of them almost every day. Setting up a hosting environment, be it Kubernetes, Proxmox, a Cloud environment such as Azure, AWS or GCP, or any other variety of VPS, PaaS or IaaS1 offerings is a good training exercise, but in my opinion one of the main points of a homelab is to host your own copies of applications that would otherwise require you to trust and pay2 others to do so. ...

May 14, 2025 · 2 min · 314 words · Jonathan

Rebuilding and Expanding: A New Homelab, A New Approach

This post kicks off an ongoing series exploring the components and applications that make up my newly rebuild homelab. A homelab is a great resource for honing DevOps and Infrastructure skills. It can be as simple as a temporary environment stood up using Docker Compose to see how some self-hosted applications work or as intricate as a whole home datacenter rack or cloud environment running everything you can think of. Just as important as the skills you gain from running a homelab is the control it gives you. If you don’t like an update to an application you can switch to a new one, you aren’t caught by the vendor because they have your data. For the last few years I’d had a single desktop server running Nomad and Consul and a handful of applications. I hadn’t used Nomad in any practical way but was curious about what it offered compared to Kubernetes. For the most part it worked but almost every time the server rebooted I would have to go and restart several of the applications to get them to properly detect one another. There were also multiple leftover configurations from experiments that hadn’t worked out making it harder to update or change anything. Finally I decided it was time to rebuild, to create a homelab that could be rebuilt programmatically when faced with a disaster. The rebuild wasn’t a single, clean sweep but evolved as I realized I wanted, or was missing, something. The changes were incorporated while ensure I would be able to redeploy and learn as I went. ...

May 4, 2025 · 5 min · 921 words · Jonathan

How I Build my Blog: Hugo and Git Hooks

A bit over a week ago my friend Sam restarted his blog with a post about How [he] built this site. I’d been planning to write my own version of that post in the future, however I decided to do so sooner since our approaches have noticeable differences. I’m using Hugo as a static site generator for the content so that rebuilding the blog is as easy as deleting the public/ folder then recreating it using the hugo command. I’m not going to go into the full details of my blogging workflow at this time beyond saying I write the posts in Emacs Org-Mode and use ox-hugo to generate the appropriate markdown files for the posts themselves. My Techdocs site uses the exact same process just with a different theme that is better designed to handle documentation. ...

May 4, 2025 · 4 min · 690 words · Jonathan

Why Programmatic Configuration Matters: From UptimeKuma to Gatus

I’ve been in the process of upgrading my homelab for the last few months. I will be writing a more in depth series of posts about it in the near future. When I first deployed my new homelab (see my previous post on why I had to redeploy) I went with Uptime Kuma for status monitoring. I quite liked the UI for the statuses themselves as well as the dashboards I could create. Unfortunately all the persistent storage data for these status checks was lost during the rebuild and when looking for a more programmatic way to re-create the checks I found out that there is no official API to do so. Now I could install the Unofficial UptimeKuma API Wrapper and use Python to create all monitor endpoints, but then I would need to ensure I kept making these updates each time I added a new application. Instead I was able to settle on Gatus by browsing the list of awesome-status-pages and then using a combination of Claude Sonnet, Reddit and Google to determine a path forward. ...

April 27, 2025 · 3 min · 448 words · Jonathan

Homelab Disaster Recovery: When Borg Backups Meet Longhorn Volumes

A few weeks ago my Kubernetes-based Homelab suffered a catastrophic failure. The internal routing no longer worked nor did any DNS resolution for in-cluster services. Since I had almost everything defined in yaml files and had all my persistent data stored in Longhorn Volumes, which were they themselves backed up, I felt safe destroying and rebuilding the cluster. Unfortunately while doing so the Longhorn volumes turned out to either be missing (only 9 of roughly 17 were found) or were orphans that refused to attach to Kubernetes. I was able to find the remaining volumes in a backup so started restoring but when trying to mount them individually using a Longhorn Docker container the contents were corrupt. Working through my Borg backups of the volumes only found more corruption except for my PGDump backups, which luckily contained the vast majority of the data I would hate to lose. ...

April 18, 2025 · 5 min · 929 words · Jonathan