Hubris Part 1: Arch Linux, an encrypted root partition, and the failure to start Load Kernel Modules

This is a story about hubris. It is a story about free software, and about troubleshooting, and about finding things out on your own, but mostly it is about my own unbearable ego standing in the way of success and my habit of doing things the hard way out of pride and spite. I made a series of mistakes in this story, and I hope I will not be judged too harshly for them. Please consider it a cautionary tale, and may the lessons I learned not be too painful when you learn them yourself.

A bit of history… I discovered Arch Linux in the early 2010’s. I had just built my first computer, and when it came time to purchase an OS, the cumulative price tag made my decision for me. “Which one’s free?” I can’t remember if my first distro was Arch or if I was pulled there by its selective gravity; the gravity that draws the proud, the curious, the people who are willing to spend hours doing something the hard way because it’s fun. To put it less delicately, it draws people with too much free time that they are willing to spend in varying states of frustration for the bragging rights. I had my fun back then, trapped in an unemployed twilight zone. After awhile I knew my way around Linux pretty well, I thought. I had fought those good fights, staying up all night because Portal 2 had just come out and dammit, I was going to make it work with WINE or else. A million other petty projects came along. I wanted my cute evolving pets from a long-defunct virtual pet site, really just a series of illustrations that you caused to advance by clicking them every day, to be able to hover on my desktop in all their transparent glory. Just a few though, just my favorites. There was no API per se, but I think curl and conky were involved. I was pretty proud of myself.

Little did I know how far I had to fall. Employment was inevitable. Years of onsite tech support, crawling under desks and plugging things in, troubleshooting Outlook, reassuring irate customers that no, I don’t think a male tech would be better for this, before swiftly fixing their issue. Years of standing in tiny network closets for hours on the phone with the ISP, years of soul-crushing labor composed of mundane tasks like “my monitor isn’t working” (plug it in?) “I want to print to that other printer, the one with color, but not that one, because for that one I have to walk across the room, so this one that’s wireless and not on the network like the other printers, because our last IT person didn’t give a f***” (…ok, give me a few minutes)

When I got off work I didn’t even want to look at a computer. My last Linux experience was a tragic one, early on. Exhausted from work I tried to do a long-delayed kernel update. It was a Toshiba laptop. It was dying. I was sitting on my small apartment balcony trying to decompress and relax after work and had not brought a charger. It died while trying to change something critical. With it died my last enthusiasm for troubleshooting anything outside of work. I half-heartedly grabbed my charger and rebooted, the dark knowledge of what I would see resting heavily in my soul. I knew there were ways to fix this, the easiest of which would just be a reinstall. I just…didn’t…care. (Later, due to careless storage under the bed, I stepped on the screen one day, and that was that.)

Fast forward to the turbulent years of the pandemic. We started working mostly remotely, except for the occasional demanding customer that wanted in-person service, or the switch replacement (try doing that remotely), or even that server without an iDrac that just needed someone to drive across town to push the power button. We began to hear the phrase “boots on the ground” more frequently. Then, I escaped. I got a new fully-remote job with a decent pay increase. The new job was revitalizing. My enthusiasm for tech, already beginning to bounce back, accelerated. I was trapped in the house, but more free than I had ever been. No commute, no interacting with customers, a job that was interesting and challenging, and energy in my free time to experiment, tinker, code, and learn for fun. I splurged on a new PC build, and life was beautiful. The first distro I found when I came back to the FOSS world was Manjaro. “Ooooo, this is perfect!” I thought. Easy to use and setup, but familiar Arch under the hood. I used it as my personal desktop for half a year, but soon I felt the call of my old friend. I had begun to want more advanced things in my OS, more challenge. My tinkering often brought me to ArchWiki, the AUR, and the associated forums, and nostalgia overtook me. This was /home/.

This is where the story really begins. I decided to install Arch Linux. I skimmed the installation guide which looked familiar enough that I assumed it hadn’t changed much over the years, so I didn’t read it in depth. OOF. I did a bit more research and found Archinstall https://python-archinstall.readthedocs.io/en/latest/index.html), a new automated Arch installer written in Python (OMG I love Python). The coolness factor and the desire to get into my new system as soon as possible made me decide; I’d try this thing out. I read through the docs.

MISTAKE #1: DISK ENCRYPTION

The Archinstall docs have this to say about disk encryption:

Disk encryption - Selecting a disk encryption password enables disk encryption for the OS partition. Note: This step is highly recommended for most users, skipping this step comes with some risk and you are obligated to read up on why you would want to skip encryption before deciding to opt-out.

I naively thought encrypting the root partition would be a good security-minded decision with no foreseeable repercussions and no need for more research. (OOF) I had Bitlockered my fair share of Windows drives; the users were annoyed at a pre-boot encryption password in addition to the normal login, but I am used to trading convenience for security so I didn’t think twice.

The first time I tried to install, I did my first really stupid thing. I somehow encrypted both the boot partition and the root partition. I banged my head against this issue for a few hours, trying to fix it without a reinstall. I don’t know if anyone else struggles with this, but a reinstall seems like the easy way out sometimes; how am I supposed to learn anything if I am ignoring the problem and starting from scratch? After some frustration, I walked away from the computer for a bit and had an insight. After hours of repeatedly searching variations of error messages and phrases that didn’t quite describe the problem, I slowed down for a second and thought through it. The boot partition is automatically mounted within the root partition.

Root: /
Boot: /boot

However, you need to boot before you can do anything else. Including decrypt the root partition. Chicken, meet egg. (Much later, I found some GRUB documentation that actually showed how to use an encrypted boot partition, but this was not the way.) At this point, I had no choice but to re-partition the drive and reinstall.

My next reinstall went flawlessly. Encrypted root, unencrypted boot and home. I settled into my new desktop, with xfce4 because I was fond of it in Manjaro. I happily used the system for months, getting settled in with a series of projects, including sorting through everything I had pulled from Google Takeout and getting the metadata organized (a custom bash script using exiftool for photos, MusicBrainz Picard for music), customizing and theming everything I could, getting my Yubikey setup for PAM authentication, and more. I got a VPS, put Apache on it, and rio.pink was born (the night I went live, a major Cloudflare outage occurred. Doja Cat would be proud, I assume I broke the internet (WARNING, VERY NSFW VIDEO, BUT IT’S DOJA SO IT’S WORTH IT IF YOU’RE BRAVE). “what a time to be aliiive…”

I went through a few kernel upgrades without incident. v5.18.4, v5.18.5, v5.18.6, so far so good. When 5.18.7 released I thought it was business as usual. I shut it down after the upgrade and went to bed. The next day was fortunately a Saturday. I would be spending my entire day dealing with the consequences of my actions, but at least I wouldn’t have to work at the same time.

When I turned on the computer the next morning, I entered my password to decrypt the drive, and…it didn’t boot. I saw this lovely error message:

[FAILED] Failed to start Load Kernel Modules.

Again, hubris struck. I had a perfectly intact recent timeshift snapshot I could have restored from, and it would all be fine. Not my style. I frantically searched for a solution. I booted from a live USB to assess the situation. I learned about arch-chroot. This seemed perfect. It…didn’t work. Because of (drumroll) the encrypted disk. After some experimentation I found a manual way to chroot after decrypting the drive. I had to repeat this process so many times as I tried things, booted from disk, back to USB, back to disk, I took notes on the steps. I hope this helps someone:

Boot from USB
iwctl
station wlan0 connect [your wifi SSID]
exit
mkdir /mnt/chroot
cryptsetup open /dev/[your encrypted partition] root
mount /dev/mapper/root /mnt/chroot

cd /mnt/chroot
mount -t proc /proc proc/ 
mount -t sysfs /sys sys/ 
mount --rbind /dev dev/ 
mount --rbind /run run/

cp /etc/resolv.conf etc/resolv.conf
chroot . /bin/bash

I learned some tricks:

  • journalctl results are hard to read through, and I wanted to pinpoint the logs from the first problematic boot. journalctl --list-boots let me find the boot with the right timestamp and check it for more detailed errors.

  • I wanted to check for errors in systemctl status systemd-modules-load.service and it does not run in chroot. systemd-nspawn lets you run your system in a container with full systemd functionality.

Other things I learned were which packages were needed to reinstall the kernel (linux linux-headers mkinitcpio kmod) and how to use mkinitcpio and grub-mkconfig.

Finally, I figured out the issue (partly). Apparently the boot partition wasn’t mounted when I upgraded the kernel. I added this to my chroot procedure above:

mount /dev/[your boot partition] /mnt/chroot/boot

Ran my pacman reinstall, mkinitcpio, and grub-mkconfig, and rebooted. Finally, after wayyy too many hours, I rebooted back into my Arch installation. OOF.

Time passed, as it tends to do. Before I knew it, it was time for v5.18.8. I checked /boot/. It had stuff in it (a kernel or whatever IDK) so I assumed the partition was mounted this time. I crossed my fingers and did a pacman -Syu. Guess what happened on reboot?

[FAILED] Failed to start Load Kernel Modules.

Yup. The problem I “fixed” was not so fixed. Unfortunately, I didn’t have time to dig deeper. I followed my trusty chroot procedure from above and did all the things (pacman -S linux linux-headers mkinitcpio kmod, mkinitcpio -p linux, grub-mkconfig -o /boot/grub/grub.cfg, reboot) and all was well.

I made a note to look into why it happened but then went on with my life. I had other things to do.

Hey, look, it’s 5.18.9-o’clock! Yup. Hello Load Kernel Modules my old friend. I did my ritual to satisfy the gods, made another note to REALLY figure out why this was happening, and went on with my life.

All things must continue. Seeds grow into plants, flowers bloom and then die, leaves turn brown and wilt, and winter comes. And lo, 5.18.10 was released, and Load Kernel Modules darkness fell over the earth. I was fed up. I wasn’t going to take it any more. I was going to figure out what was happening once and for all.

(like and subscribe for more MISTAKES! Rio out.)

blogbit~

2022-07-11

pinkblog



what a time to be alive

since 7/30/22, there have been this many visitors: