Why, you DirtyFrag!

Hot on the heels of Copy Fail, here comes another Linux kernel root-escalation vulnerability. As always, this is not a slam on the kernel developers as OSdev is hard and often a thankless task. And Windows has its fair-share of equivalent bugs, so these kinds of holes are not a Linux-specific issue. But two points spring to mind:

One is that this is a really neat exploit. You basically trick the kernel into allowing you to scribble over in-memory copies of normally read-only files, so that for all intents and purposes, you can modify them, and bend them to your will. If those files can grant you complete control of the machine, then you have a root-privilege escalation vulnerability.

The other point is that this was supposed to be disclosed on May 12, 2026. It was discovered by Hyunwoo Kim, and privately reported so that the kernel developers and Linux distributions could have their patches ready in time for the disclosure.

However, at least one person saw the patch in the open-source kernel tree, figured it was closing a root-privilege escalation vulnerability, and blogged about it not knowing there was an embargo on disclosure. That led to Kim publicly documenting the technique, now known as DirtyFrag, ahead of the intended disclosure date.

To me that's the headache with open source: the preparation for disclosure is at times out in the open, so while we benefit from the transparency, it also means that security bugs are disclosed as the patch is being developed and distributed. Which is pain for those managing the security of shared and public-facing systems. I'm personally all for open source, but this is all an interesting trade-off to consider.

I was first to report the CPU-level Meltdown vulnerability in January 2018 because I saw over the Christmas period leading up to that, open-source kernel developers were reworking MMU-level code that ordinarily doesn't need to be touched, revealing that there was a processor-level side-channel vulnerability that could leak secrets and other data from memory that wasn't supposed to be accessible. And that's the kind of thing that is tough to hide in open source.

Patches for DirtyFrag are making their way to users; update your systems, or use short-term mitigations.

IMHO...

One final thing before my analysis of the bugs: check out Zero Day Clock. According to the data it has processed, and if correct, the time from disclosure of a computer security vulnerability to exploitation has plummeted. There are lots of reasons why, from autonomous AI to supply-chain attacks to financial incentives.

This collapse of disclosure-to-exploitation time represents a fundamental shift in, for want of a better term, the adversarial lifecycle. When I spotted the early signs of Meltdown, it required a human eye to notice unusual churn in low-level code and connect the dots. While still today there are folks who scrutinize source code commits for such things, as we saw with DirtyFrag, that manual intuition is being augmented or replaced by automated pipelines that treat every public commit like a blueprint for an exploit.

We have moved from a world where defenders had weeks to coordinate, to one where the window of exposure is often measured in hours from the moment a fix hits a public repository. In the open-source model, the very act of healing the code now serves as a flare for those looking to wound its users, creating a race where the defender must be perfect, but the attacker only needs to be fast.

My technical summary

DirtyFrag is a pair of Linux kernel security vulnerabilities that can be exploited alongside each other to achieve root-level privilege escalation. Someone can use this technique to go from a normal user to root user, and take over the system.

This is done by abusing zero-copy networking paths to modify the page cache of read-only files in RAM. The two distinct bugs are: a 4-byte arbitrary write in the xfrm-ESP subsystem (CVE-2026-43284) and an 8-byte brute-forced write in the RxRPC stack (CVE-2026-43500).

The core primitive

In exploiting each bug, an attacker uses the splice system call to plant a reference to a page cache page, which the attacker only has read access to, into the frag list of a nonlinear socket buffer (sk_buff).

When the kernel later processes this sk_buff through a path that performs in-place cryptographic operations (where the source and destination buffers are identical), it writes directly onto the mapped file page. This bypasses the typical copy-on-write (CoW) protections because the kernel erroneously assumes the buffer is private or fails to check if the underlying fragment is shared with the page cache.

The overall result is that an attacker can deterministically overwrite bytes in read-only files with attacker-chosen data. For example, if the target file is /etc/passwd, an attacker can gain root-level privilege escalation by crafting a root-level account entry with an empty password, and then running the su command as an unprivileged user to gain root access.

Exploit step-by-step: xfrm-ESP (CVE-2026-43284)

This primary method of exploitation provides a deterministic 4-byte arbitrary write primitive but requires CAP_NET_ADMIN (typically obtained via a user namespace).

  1. The attacker creates a new user and network namespace (CLONE_NEWUSER | CLONE_NEWNET) to gain the necessary privileges for XFRM state management.

  2. An XFRM state is registered using the authencesn algorithm, allowing the attacker to specify a 32-bit value in the Extended Sequence Number (ESN) field via the XFRMA_REPLAY_ESN_VAL attribute. While the ESN is typically used in the authentication header, it is processed here in a way that allows this 32-bit value to serve as a 4-byte arbitrary write payload.

  3. The attacker uses vmsplice to load an ESP header into a pipe, followed by splice to map 16 bytes of a target file into the same pipe.

  4. The pipe is spliced into a UDP socket; over loopback, the kernel receives the packet and routes it to esp_input.

  5. In esp_input, the kernel checks if the skb is cloned or has a frag_list. If it is a simple nonlinear skb with fragments but no frag_list, it jumps to the skip_cow label, leaving the attacker-pinned page cache page in the fragment list.

  6. The crypto_authenc_esn_decrypt function prepares for decryption by moving high-order sequence bits, performing a scatterwalk_map_and_copy to the destination buffer. Since the source and destination are identical, the attacker-controlled ESN value is written directly into the target file's page cache.

  7. By repeating this process across different offsets, the attacker can overwrite arbitrary bytes in a target file. Because the ESP primitive is deterministic and can write larger chunks of data across multiple offsets more easily than the RxRPC variant, it is frequently used to overwrite setuid-root binaries (like /usr/bin/su or pkexec) with a small ELF stager to gain immediate root access.

Exploit step-by-step: RxRPC (CVE-2026-43500)

This secondary vulnerability acts as a fallback for the primary ESP exploit, providing an 8-byte write primitive that does not require namespace privileges but involves brute-forcing a cryptographic key.

  1. In user space, the attacker calculates a session key K such that fcrypt_decrypt(Ciphertext, K) yields a desired plaintext, for example an entry in /etc/passwd. fcrypt is an Andrew File System dedicated 64-bit block cipher based on DES, and its 56-bit keys can be brute-forced relatively quickly on modern hardware or via precomputed tables, making this a viable "namespace-less" path.

  2. The attacker registers an RxRPC token, using the add_key system call, containing the chosen session key.

  3. Similar to the ESP variant, splice() is used to plant 8 bytes of the target file into an sk_buff fragment.

  4. An AF_RXRPC client initiates a call, and a fake UDP server responds with a CHALLENGE to establish a security context using the attacker's key K.

  5. The attacker responds with a DATA packet; when the client calls recvmsg, the kernel reaches rxkad_verify_packet_1, which performs in-place pcbc(fcrypt) decryption on the first 8 bytes of the payload.

  6. Because the fragment points to the page cache and the skb is not properly isolated, the decryption result is written back into the file page in RAM.

  7. Since one 8-byte write may not be enough to reach a delimiter, the exploit performs multiple writes using "last-write-wins" logic, correcting for previous modifications during the brute-force step.

Paired for reliability

The two bugs are paired to maximize Linux distribution coverage. The ESP variant is preferred for its deterministic nature and ability to write large payloads (like an ELF binary) 4 bytes at a time. If user namespaces are restricted (for example, by AppArmor on Ubuntu), the exploit falls back to the RxRPC variant. While RxRPC is limited by the brute-force cost and the rarity of the rxrpc.ko module on some enterprise builds, it is often present and accessible on desktop distributions where namespaces are hardened.

Implementing client-side RAG

After integrating the Chrome Prompt API with diodeaign.org to provide client-side AI site search, I quickly ran into an interesting brick wall of a challenge: the (right now) 6,144-token context window of the on-device Gemini Nano model. While 6K tokens might sound generous for a quick chat, it's nowhere near enough to hold the entire archive of the site to interrogate.

If I just dumped every log entry into the combined user prompt and system prompt, the model would simply choke or truncate the most important bits.

The solution is a client-side Retrieval-Augmented Generation (RAG) architecture. Instead of giving the AI everything when you're trying to search the site's archives, I only give it the snippets, or "chunks", that are actually relevant to your query.

The indexing pipeline

The first step happens at build time. I updated the build.py script to process every Markdown file and "chunk" the text into 5,000-character segments. I also made sure to inject each text's metadata, such as the byline and summary, into each chunk so the search engine knows who wrote what and why.

These chunks are exported into a search-index.json file, which is fetched by the browser when the "Lab AI" initializes.

Retrieval: Orama and the fail-safe fallback

For the retrieval engine, I'm using Orama, a lightweight, WASM-based search engine that runs entirely in the browser. It handles the heavy lifting of indexing the site's content on the fly and performing ranked keyword searches.

However, natural language is messy. I found that a strict search engine sometimes penalizes conversational queries. To combat this, I implemented a two-stage retrieval pipeline:

  1. As the primary search, Orama attempts a ranked search across all chunks.
  2. As a keyword fallback, if Orama returns zero hits, the system triggers a fail-safe manual scan. It breaks the query into individual keywords and ranks every chunk by how many unique terms it matches.

This ensures that if you ask about a specific data point, such as which riots did I cover while working in the media, the system will find it even if the rest of the query is noisy.

Staying within the token budget

Once the relevant chunks are found, I have to ensure they actually fit in the 6,144-token window. The system now performs a token budget check before every inference:

  • It uses session.countPromptTokens() to measure the size of the combined prompt.
  • If the count exceeds the limit (with a safety margin), it iteratively drops the least relevant chunks from the context until it fits.
  • The maximum number of tokens is defined by session.maxTokens, which is used as the limit.

Citing sources

Finally, I tuned the system prompt to make the AI more academic. Instead of vague summaries, it's now instructed to cite its sources directly, such as: "According to the [Title] log..." This makes the answers feel less like a hallucination and more like a genuine research assistant tapping into the lab's archives.

The result is a search experience that's private, kinda fast on my laptop, and hopefully knows what it's talking about.

Integrating the Chrome Prompt API

For fun, I implemented a client-side AI-powered site search feature on diodeaign.org called ask the lab. It uses the experimental Prompt API built natively into Chrome, starting with Chrome Canary. You ask this website a question using the text box on the homepage, and it uses an on-device large language model, currently Gemini Nano, if available, to answer those queries using the website's static archives, eliminating cloud latency and privacy concerns. It avoids sending anything to any server for processing.

The feature works by combining the archives with the user's query and a system prompt, then sending that payload to the on-device model and displaying the answer in the page.

The primary challenge was managing the lifecycle of the 4GB neural weight component (OptimizationGuideOnDeviceModel) tied to the browser's profile directory. I wanted to avoid making people download the model over and over, so I implemented logic to persist the model across sessions. Initial implementations ran afoul of both the API namespace changing overnight (from window.LanguageModel to window.ai.languageModel) and silent schema changes that caused the browser's ML execution service to fatally hang during initialization.

The final architecture uses a graceful fallback approach:

  1. Probe the API namespace to ensure the device even supports the local model. There are hardware requirements, and if these aren't met, then the query box never even appears.
  2. Intercept the downloading status to provide user feedback while Chrome silently fetches the 4GB of weights in the background.
  3. Manage the initial "cold start" latency (telling the visitor the page is hydrating the weights from disk to RAM) when the AI is first invoked.
  4. Stream the output chunks directly into the UI alongside a dynamic DOM collapse, creating a chat experience that's all too familiar these days.

If you have Chrome or Chrome Canary suitably configured, you can test it by clicking the search box on the homepage. If your browser or system doesn't support the API yet, the site appears exactly as it did before without the search box appearing.

As I said, this was for fun to get a handle on the new API, and add a bit more functionality for free to the site.

Migrating Diosix to Zig

I've decided to move the Diosix RISC-V hypervisor codebase from Rust to Zig. While I appreciate Rust's safety, and it is a fantastic language, I find Zig's approach to low-level systems programming more productive. This isn't a critique of Rust’s capabilities, but rather a preference for a different development flow. Zig allows me to write as I think from an overall design, giving me the mechanisms for safety without the restrictiveness of a mandatory borrow checker.

You can find this ongoing Zig work in the zig branch of the Diosix git repository.

The road ahead for Diosix is focused on demonstrating its architecture as a mesh of autonomous gossiping nodes.

The next phase of the project will see - hopefully - Diosix evolve into a self-organizing mesh, where localized reasoning within a sovereign root VM allows it and its peer VMs to negotiate resources and maintain resilience across the mesh without needing a central controller. This shift toward decentralized intent is going to be an interesting challenge to pull off, and I'm curious to see how this new environment handles the stress of 1,000 or more VMs.

I'll be pushing these experiments out as they stabilize; I'm wondering what others might build on top of a system that manages itself.