My First Kernel Module: A Debugging Nightmare by Ryan Eberhardt
Nov 18, 2020
This is the story of the time I wrote some code, deployed it to production, and ended up bricking the server it was running on by frying the kernel.
This post is about perils of concurrency and race conditions. My code was nearly correct, but ultimately, there were two major synchronization bugs that killed it.
This is a really long post that gets into the weeds at times, but I have tried to write it so that you can jump into any section and hopefully learn something from it: