DEV Community: Vivian Voss

The Renewal You Did Not Survive: How an Acquisition Turned VMware Ownership into Tenancy

Vivian Voss — Fri, 22 May 2026 06:58:01 +0000

In the Net, Episode 04

You bought VMware once, outright, the way you buy a tool. You paid for a perpetual licence, you owned it, and it kept running whether or not you ever spoke to the vendor again. In November 2023 a 69 billion US dollar (around 64 billion euro) acquisition closed, and over the following eighteen months the thing you owned was quietly, methodically converted into a thing you rent. This is the fourth episode of In the Net, a series on the documented mechanics of vendor lock-in. The premise has not changed. Every platform tells you how to come in. The architecture, and increasingly the contract, tells you whether you can leave, and on whose terms.

This episode is slightly different from the first three. Adobe, LinkedIn and AWS each built their lock-in into the product. VMware's lock-in was built into a transaction. The product barely changed; the ownership of it did, and the terms followed.

The Promise

VMware made the hypervisor boring, and that was the whole point. A hypervisor is the layer that lets one physical server pretend to be many; VMware's ESXi and vSphere did this so reliably that an entire generation of data centres was built on the assumption that it simply worked. vMotion moved a running virtual machine from one host to another without dropping a packet. Distributed Resource Scheduler balanced load without anyone watching. For roughly two decades, vSphere was the quiet floor the enterprise data centre stood on.

Crucially, it was sold as a perpetual licence. You paid once for a version, and you owned the right to run it indefinitely. Support and upgrades were a separate, optional subscription. The distinction mattered: the software you ran was yours, and the relationship with the vendor was something you chose to maintain, not something you were compelled to renew. That distinction is the thing this episode is about, because it is the thing that was removed.

The Hooks

Broadcom announced its intent to acquire VMware in May 2022 and completed the acquisition on 22 November 2023, after some eighteen months of regulatory review by the US FTC, the UK Competition and Markets Authority and the European Commission. The deal was valued at around 69 billion US dollars, including roughly 8 billion in assumed debt. VMware was delisted from the New York Stock Exchange. None of the customers running vSphere were consulted, which is unremarkable: that is how acquisitions work. What followed is the part worth documenting.

Perpetual licences ended. Broadcom moved VMware to a subscription-only model. Existing perpetual licences do not technically expire, but they no longer receive support or security updates, and at renewal the customer is moved onto an annual subscription. In practical terms, for any estate that needs patched, supported software, ownership became tenancy.

The free tier was removed, then quietly returned. The free edition of ESXi, long the entry point for labs, edge nodes and small deployments, was withdrawn in early 2024. In April 2025 it returned, as ESXi 8.0 Update 3e, without a press release: the news appeared in the product release notes, which described it as an entry-level hypervisor for non-production use, with no Broadcom support, and unable to connect to vCenter. A capability removed with a transition plan and reinstated in a footnote tells you something about who the communication is for.

Per-CPU pricing became per-core, sold in bundles. Licensing shifted from a per-socket model to a per-core model, and the products were repackaged into a small number of bundles: VMware vSphere Foundation (VVF) and the larger VMware Cloud Foundation (VCF). These bundles include components such as NSX (networking) and vSAN (storage) whether or not the customer runs them. A customer who wanted vSphere and nothing else now buys a suite, and the suite is priced per core.

Each of these, on its own, is a defensible commercial decision. Taken together, over eighteen months, they convert a one-time purchase into a recurring obligation whose size the vendor sets.

The Standing

VMware entered this period holding roughly 70 per cent of the server-virtualisation market on Gartner's figures for 2024. Gartner has since projected that share falling to around 40 per cent by 2029, a consequence the firm attributes directly to the post-acquisition strategy. A vendor does not shed thirty points of a market it dominates because the technology got worse; it sheds them because the terms changed faster than the customers could be locked down.

The clearest signal came from the distribution channel rather than the customers. In December 2024, Ingram Micro, one of the largest technology distributors in the world, announced it would end its Broadcom and VMware distribution relationship in select regions from early January 2025. Distributors do not walk away from a 70-per-cent-market-share product over a disagreement about colour schemes. When the people whose business is selling the product decide it is not worth selling, the contract on offer was not a friendly one.

How the User Is Treated

The dignity dimension this series tracks has, in VMware's case, an unusually clean piece of evidence: a court filing.

In August 2024, AT&T filed suit against Broadcom in the New York State Supreme Court. AT&T held perpetual VMware licences and, critically, had signed an amended support agreement with pre-acquisition VMware in 2022, running through 8 September 2024, with an option to renew for two further years. After the acquisition, AT&T alleged, Broadcom declined to honour that renewal option unless AT&T purchased VMware's new subscription bundles under a three-year commitment. AT&T sought injunctive relief; by October 2024 the parties were reported to be moving towards a settlement.

Set aside the legal merits, which the settlement leaves unresolved in public. The structural fact is the one that matters for this series. AT&T is one of the largest telecommunications companies on earth, with a procurement department the size of a mid-tier vendor, and it concluded that the only way to obtain support it believed it had already contracted for was to sue. The 72-core minimum, discussed below, asks small clusters to pay for cores they cannot use. Between those two points sits every customer too large to ignore the bill and too small to file in the New York State Supreme Court.

The Exit That Isn't

The exit problem here is not technical in the way AWS' was; a virtual machine is far more portable than an IAM policy. The exit problem is contractual and informational, and the clearest illustration is the 72-core episode.

In March 2025, reports emerged that Broadcom would raise the minimum core count per VMware order from 16 to 72, effective 10 April 2025. For a customer with a 32-core cluster, this meant buying 72 cores and paying for 40 they could not use: a 125 per cent increase for no additional capability. After a sharp industry backlash, the requirement was withdrawn, with distributors confirming the minimum stayed at 16. When asked, a Broadcom spokesperson stated that the company had "never announced a price change".

This is worth sitting with, because it is the exit problem in miniature. A pricing rule that can appear, reshape a customer's renewal maths for several weeks, and then be denied as never having existed is a pricing rule the customer cannot plan against, cannot cite in a negotiation, and cannot appeal. The lock-in is not only that migration takes time; it is that the terms you are migrating away from will not hold still long enough to be measured.

This is Lock-in by design, in the precise sense this series uses the phrase. Not a boardroom conspiracy to trap users, but an arrangement in which the architecture and the contract together produce the outcome that customers stay and pay even when, on their own honest accounting, they would rather not. Ownership was converted to tenancy by a transaction the customer was never party to, and the terms of the tenancy are set, revised and occasionally un-set by the landlord.

The Price

The price of staying is the renewal, and the renewals have been steep. Reported increases vary enormously by customer, tier and how much grandfathering survived the transition: figures from 150 per cent to several hundred per cent are common in trade reporting, with some large or specialised estates citing four-figure percentage increases. These are reported numbers from customer accounts and analyst write-ups rather than published list prices, and should be read as a range, not a constant; the precise figure depends on the bundle, the core count and the negotiation.

The price of leaving is a hypervisor migration, which is real work but bounded work. A virtual machine has a portable representation: the Open Virtualization Format (OVF) for the metadata, and disk images that convert cleanly to qcow2 for KVM-based platforms. The migration cost is in testing, in re-tooling the operational layer (backup, monitoring, automation that assumed vCenter), and in retraining. It is measured in months for a large estate, not years, and it is a one-time cost against a recurring one. That asymmetry, a bounded one-time exit cost against an unbounded recurring stay cost set by the vendor, is exactly the calculation that shifts when the renewal letter arrives.

The Escape Route

The escape route from VMware is unusually mature, for a reason worth naming: the alternatives reached production quality at almost exactly the moment the pricing crisis hit. The timing was not planned by anyone, but it changed the negotiation.

Proxmox VE is the headline alternative. It is an open-source virtualisation platform built on KVM and QEMU for full virtual machines and LXC for containers, with clustering, live migration, software-defined storage (including Ceph) and a web interface. It is free to use with no feature gates; a support subscription is optional and starts at around 115 euro per CPU socket per year, rising through tiers to enterprise support. For the small and mid-sized estates that felt the 72-core maths most acutely, the cost delta against a VVF or VCF renewal is not subtle.

XCP-ng with Xen Orchestra is the Xen-based alternative, with a strong following in environments that prefer the Xen hypervisor and want a polished management layer. Nutanix AHV is the hyperconverged option for customers who want an integrated appliance-style platform and are willing to pay for it. Microsoft Hyper-V and Azure Stack HCI exist for Windows-centric shops, with their own lock-in characteristics to weigh. OpenStack remains the option at genuine cloud scale, with genuine operational weight to match.

And the FreeBSD stack is the quietly elegant choice for those who want their virtualisation layer to be a small, auditable part of a coherent operating system rather than a product in its own right: bhyve for virtual machines, jails for containers, ZFS for storage, all in the base system and none of it for sale to an acquirer. This is the point worth holding onto, because it is the structural answer to the whole episode: a perpetual licence can be revoked at the next renewal, but a base-system component is not a product anyone can buy out from under you.

For those who miss the point-and-click of a Proxmox dashboard, Sylve puts bhyve and jails under one Proxmox-style web interface, written in Go and SvelteKit, with clustering and scheduled backups via zelta. Sylve made its first release (v0.1.0) in early 2026 and sits at v0.2.x at the time of writing; it requires FreeBSD 15.0 or later, and it is honestly young: not yet a drop-in for an enterprise estate that needs support contracts and proven HA at scale, but already productive for homelabs and FreeBSD-first shops whose operators are comfortable on the v0.x curve. The substance is the stack, which is mature; Sylve is the convenience layer, which is new. It will not suit every estate; it suits the ones whose operators value being able to read the whole stack.

The migration path itself is the open one: export from VMware to OVF, convert disks to qcow2, import to the target. The Proxmox project ships a VMware importer that automates the bulk of this for the common case. The work is in the operational tail, not the disk images.

Coda

VMware did not get worse. vSphere still moves a live machine without dropping a packet, and vMotion is still a small marvel of engineering. The product is not the issue. The issue is that the product was sold as something you own, and an acquisition you had no part in converted it into something you rent, with the rent set by a landlord who reserves the right to revise the terms and then deny having done so.

The lesson generalises well past virtualisation. A perpetual licence is a promise about the future, and a promise about the future is only as durable as the entity that made it. The moment a layer of your architecture stops being replaceable, its price stops being negotiable. The defence is not loyalty to a vendor or hostility to one; it is keeping the replaceable layers replaceable, so that the renewal letter is an invitation rather than a summons.

You owned a hypervisor. You were handed a subscription. The hypervisor, it turns out, was never the expensive part.

Read the full article on vivianvoss.net →

Causa GitHub, or: Your Editor Extensions Run as You

Vivian Voss — Thu, 21 May 2026 06:13:38 +0000

Wire Fire — Episode 02

On 18 May 2026 an attacker published a poisoned version of a popular Visual Studio Code extension. It was live for roughly eleven minutes. That was long enough to reach a GitHub employee's laptop, and from there to exfiltrate around 3,800 of GitHub's own internal source-code repositories. GitHub confirmed the breach on 20 May. This is the situation, what it means, and what to do about it.

The Breach

The timeline is short and worth stating precisely.

On 18 May, a trojanised build of Nx Console (the VS Code extension for the Nx build system, published under the identifier nrwl.angular-console) appeared on the Visual Studio Code Marketplace as version 18.95.0. It was live for approximately eleven minutes before being pulled. Eleven minutes is the number to sit with: it is the same lesson as the axios incident in Wire Fire Episode 01, where three hours of a poisoned package tagged "latest" was more than enough. A short window is not a small window when the install is automatic and the reach is global.

On 19 May, GitHub detected the intrusion. On 20 May, GitHub confirmed publicly that an employee's device had been compromised through the extension, and that the attacker had used that access to clone internal repositories. GitHub stated it had isolated the device, removed the extension, and rotated credentials within hours of detection.

The attacker is TeamPCP, tracked by Google Threat Intelligence as UNC6780, a group that specialises in supply-chain attacks against open-source security utilities and developer tooling, and that has been active across npm, PyPI and PHP package ecosystems earlier in 2026. They have claimed responsibility on underground forums and are reportedly asking more than 50,000 US dollars for the stolen material.

The Scope

Around 3,800 internal repositories were exfiltrated. GitHub reports, as of the confirmation, no evidence that customer data, enterprise accounts or user repositories were affected. That assessment may change as the investigation continues; treat it as the current state, not the final word.

The exposure surface is wider than GitHub, and that is the part worth attention. The Visual Studio Code Marketplace serves the most widely used code editor in the world. Installation is one click. There is no enforced provenance: the Marketplace does not require that the publisher prove control of the upstream project, and a name collision or a compromised publisher account can place a poisoned build in front of millions of developers in the time it takes to click "Install".

The Mechanism

Here is the part that should change behaviour, stated in plain terms.

A VS Code extension runs with the full privileges of the developer who installed it. There is no sandbox between the extension and the rest of your machine. When you open a workspace (a folder, a project) the extension's activation code runs immediately and automatically. From that moment the extension can read any file your user account can read, run any command your user account can run, and reach any credential, token or SSH key sitting on your machine.

The poisoned Nx Console build did precisely this. On activation it fetched an obfuscated payload from an external server and executed it. The payload harvested credentials and environment secrets. On an ordinary developer's machine that is bad. On a GitHub employee's machine, the harvested access was enough to clone internal repositories that the employee could legitimately reach.

Note what did not happen. No firewall was breached from outside. No server was exploited. No password was brute-forced. The attacker did not break in; the attacker was invited in, by an automatic activation of code the developer chose to trust with a single click. The front door was never touched.

The Exposure

If you installed Nx Console (nrwl.angular-console) around 18 May 2026, and specifically version 18.95.0, you should assume the machine is compromised. Rotate every credential reachable from that machine: cloud tokens, registry tokens, SSH keys, API keys, anything in your environment or your credential store. Revoke and reissue, do not merely change. Check for unexpected outbound network connections and review recent repository access.

For everyone, whether or not you touched this specific extension, the operative steps are the same and they generalise:

Pin extension versions where your editor allows it, and disable automatic updates for extensions. Auto-update is the mechanism that turns one poisoned build into thousands of compromised machines before anyone notices.
Audit your installed extensions and their publishers. Remove the ones you do not use. Every extension is attack surface that runs as you.
Treat a new editor extension with the same suspicion you would give a new dependency in your code. You would (one hopes) read about a new npm package before adding it to production. The extension has more privilege and runs sooner.

On FreeBSD, the structural fix has a name. Capsicum is a capability-mode sandboxing framework: a process can drop into capability mode and then operate only on the file descriptors and resources it was explicitly handed, with no ambient authority to open new files, make new connections, or reach the wider system. An editor built on that model could run an extension in a box that holds only what the extension actually needs. Editors are not built that way yet, on any platform. The capability is in the kernel; the application has not asked for it.

The Pattern

The editor is now part of the supply chain, and that is the structural news under the GitHub headline.

For two years the industry has been learning, expensively, that npm install runs arbitrary code with your privileges, that the registry is an attack surface, that a dependency you did not write and cannot read is running on your machine. Wire Fire Episode 01 covered the npm side of this in detail. The lesson is now well established for package registries.

It is exactly as true for editor extensions, and almost nobody treats it that way. The extension marketplace is a package registry by another name: arbitrary code, published by parties you have not vetted, installed with one click, activated automatically, running with your full privileges. It has all the supply-chain risk of npm and less of the scrutiny, because the install does not feel like adding a dependency. It feels like configuring your editor.

For a decision-maker, the translation is direct: editor extensions are production dependencies and belong in your software-supply-chain policy. If your organisation reviews and pins npm packages but lets developers install any VS Code extension with one click, you have secured the front door and left every window open. GitHub, of all organisations, demonstrated the cost this week. The marketplace is a registry now. It is not being watched like one.

The home of the world's source code was read through a plugin to the world's most popular editor. Both, as it happens, belong to the same company. The call came from inside the toolchain.

Read the full article on vivianvoss.net →

By Vivian Voss, System Architect and Software Developer. Follow me on LinkedIn for daily technical writing.

The Unit That Crossed a Boundary: Mars Climate Orbiter, 1999

Vivian Voss — Wed, 20 May 2026 07:52:59 +0000

Tales from the Bare Metal — Episode 04

23 September 1999. The Mars Climate Orbiter fires its main engine to enter orbit around Mars, passes behind the planet as planned, and is never heard from again. The spacecraft cost 193 million dollars to build, part of a 327.6 million dollar mission. It travelled 670 million kilometres across nine months of deep space, and it was lost to a number with no unit written on it.

This is the cleanest example in the engineering record of why a quantity is not the same thing as a number.

The Incident

The Mars Climate Orbiter launched on 11 December 1998. Its job was to enter a stable orbit around Mars and study the planet's atmosphere, also serving as a communications relay for the Mars Polar Lander that would follow.

The mission proceeded normally for nine months. On 23 September 1999, the orbiter executed its Mars Orbit Insertion burn: a planned firing of the main engine to slow the craft enough for Mars' gravity to capture it. The burn was designed to place the orbiter at a closest approach of 226 km above the Martian surface, comfortably above the atmosphere.

The craft passed behind Mars, as expected, and signal was lost, as expected. It was never reacquired. Post-failure reconstruction showed that the trajectory had brought the orbiter to approximately 57 km above the surface, deep within the atmosphere. A spacecraft built for the vacuum of orbit does not survive atmospheric entry at orbital speed. It either burned up or was torn apart; either way, it was gone.

The Mars Climate Orbiter Mishap Investigation Board released its Phase I report on 10 November 1999. The root cause was stated plainly, and it was not a hardware fault, not a navigation error in the usual sense, not a launch problem. It was a unit mismatch at a software interface.

The Diagnosis

The trajectory of a spacecraft is adjusted over its journey by small thruster firings, called Angular Momentum Desaturation manoeuvres among others. Each firing produces an impulse, and the magnitude of that impulse must be fed into the navigation software so the predicted trajectory stays accurate.

Two organisations wrote the two halves of this loop. Lockheed Martin, in Colorado, built the spacecraft and the ground software that calculated the impulse from each thruster firing. NASA's Jet Propulsion Laboratory, in California, ran the navigation software that consumed those impulse figures and computed the resulting trajectory.

Lockheed Martin's software produced the impulse in pound-force seconds. This is an imperial unit: a pound-force is the force exerted by gravity on a one-pound mass, and a pound-force second is that force applied for one second.

JPL's navigation software expected the impulse in newton-seconds. This is the metric (SI) unit: a newton is the force that accelerates one kilogram at one metre per second squared, and a newton-second is that force applied for one second.

One pound-force second equals 4.45 newton-seconds. The two numbers describe the same physical impulse, but the number representing it differs by a factor of 4.45 depending on which unit you mean.

The interface specification between the two systems required metric units. Lockheed Martin's software, for the specific file in question, produced imperial. JPL's software read the imperial numbers as though they were metric, and so every impulse was interpreted as 4.45 times smaller than it actually was. The trajectory corrections were therefore systematically wrong, in the same direction, for nine months. The error accumulated until the predicted 226 km insertion altitude was, in reality, 57 km.

The single most important sentence in the whole account is this: the number was correct. The software did not miscalculate. The bits that crossed the interface were the right bits. What was missing was the label that said what those bits meant, and each side filled in the missing label with its own assumption.

The Context

A unit mismatch sounds like the kind of thing that should be caught in an afternoon. The interesting question is not how the mistake was made, but how it survived nine months and 670 million kilometres without being caught. Three conditions allowed it.

First, the interface specification existed and was clear: it required metric. The imperial output was a deviation from spec. But nothing enforced the specification at runtime. The specification was a document, not a check. A document does not stop a wrong number; it only assigns blame after the wrong number has done its work.

Second, the discrepancy was visible before the loss. Navigators at JPL noticed during the cruise that the trajectory was not behaving quite as the models predicted; small corrections were needed more often than expected. The signal was there in the data, months before arrival. It was discussed informally but never escalated into a formal anomaly investigation, partly because each individual correction was within tolerance and the cumulative drift looked like ordinary navigation noise until it was too late.

Third, there was no end-to-end test. No test ran the Lockheed Martin impulse calculation and the JPL trajectory calculation against the same manoeuvre and compared the result to an independent reference. Each system was tested in isolation and behaved correctly in isolation. The fault lived only in the handoff between them, which is precisely the region that isolated unit tests do not cover.

These three conditions recur in almost every interface failure. The spec is a document not a check; the warning signal is visible but below the escalation threshold; the test coverage stops at the boundary rather than crossing it.

The Principle

Science settled the question of units a century ago. The Système International (SI) is the one coherent measurement system the entire scientific and engineering world shares, precisely so that a number computed by one team means the same thing to another. The Mars Climate Orbiter was lost because one half of the programme had not honoured that settlement: it still computed in pound-force seconds, an imperial unit with no place in serious engineering work.

The lesson is direct. In scientific and engineering contexts, measure in metric. There is no defence for imperial units in work where a factor of 4.45 decides whether a spacecraft enters orbit or enters the ground. Every serious laboratory, the United States included, standardised on SI for exactly this reason, and the orbiter is the canonical demonstration of what the alternative costs. Imperial units are a regional convention for everyday life; they are not a tool for computing trajectories, and the moment a programme treats them as one, it has built a 4.45 into its maths and dared the universe to find it.

There is a second guard worth stating, because metric alone is not sufficient. Two systems both using metric can still fail if one means seconds and the other milliseconds, if one means metres and the other kilometres, if one means bytes and the other kibibytes. So the unit must also travel with the number, enforced where the two systems meet:

Types the compiler checks. Rust newtypes let you define a NewtonSeconds(f64) distinct from a PoundForceSeconds(f64); the compiler refuses to pass one where the other is expected. F# has units of measure built into the type system: a value typed float<N*s> cannot be assigned to a float<lbf*s> without an explicit conversion. The mistake becomes a compile error, which is the cheapest place a mistake can possibly be caught.
Fields the parser demands. A data interchange format that requires the unit as a mandatory field (not an optional comment) makes the bare number unrepresentable. You cannot serialise the quantity without stating what it is.
Names the variable carries. The oldest discipline, available in any language: never write timeout = 30; write timeout_ms = 30 or timeout = Duration::from_secs(30). The unit lives in the name where the next reader cannot miss it.

The first rule needs no tooling: in science, measure in metric. The second rule catches what the first cannot: name the unit at every boundary regardless. The orbiter needed both, and had neither.

On FreeBSD this discipline is visible in the small, ordinary places. dd bs=1M states the block size with its unit; the bare number would be ambiguous. sysctl values are documented with their units, and the tunables in loader.conf name their dimensions. The convention across the base system is that a number with a physical meaning is rarely written naked. This is not glamorous and it has prevented an enormous number of quiet disasters.

The boundary is where the unit must be loudest, because the boundary is exactly where two assumptions meet and discover they disagree.

Where It Travels

The Mars Climate Orbiter is a spacecraft, which makes the failure feel exotic. It is not exotic. The same failure ships in ordinary software every day:

Every API that passes a duration as a bare integer. Is 30 seconds or milliseconds? The function name setTimeout(30) does not say, and the two readings differ by a factor of a thousand.
Every configuration value that takes a size without a suffix. Is cache_size = 1000 bytes, kilobytes, or entries?
Every function signature with a parameter named distance, interval, rate or size and no unit anywhere in the type or the name.
Every CSV handed from one team's export to another team's import, where column 7 is "amount" and nobody agreed whether it is gross or net, dollars or cents.
Every number that means one thing on the left of an interface and another thing on the right, because the interface carried the number faithfully and the meaning not at all.

The Same Mistake, Smaller: Gigabyte and Gibibyte

The unit mismatch does not even need two systems as different as imperial and metric. It hides inside a single system, in the gap between decimal and binary.

A gigabyte, by the SI definition, is 1,000,000,000 bytes; "giga" is the standard decimal prefix for a billion, the same prefix as in gigahertz or gigawatt. A gibibyte, defined by the IEC in 1998, is 1,073,741,824 bytes: two to the power of thirty, the nearest binary round number. The two differ by about 7.4%, and the gap widens with every step up the ladder: terabyte versus tebibyte is about 10%, petabyte versus pebibyte about 12.6%.

The disk industry sells in decimal. A "1 TB" drive holds 1,000,000,000,000 bytes, exactly as labelled. Many operating systems, Windows most stubbornly, then display that capacity in binary while still calling the result "GB". The drive shows up as roughly 931 "GB", the customer concludes they have been short-changed by seven per cent, and a support ticket is born. Nobody has lied; two definitions of the same word have simply met at a boundary without introducing themselves.

It is the Mars Climate Orbiter with the stakes turned down from a spacecraft to a slightly disappointing disk. The IEC defined a complete binary ladder in 1998, parallel to the decimal one at every rung: kibibyte (KiB) beside kilobyte (kB), mebibyte (MiB) beside megabyte (MB), gibibyte (GiB) beside gigabyte (GB), and on through tebibyte (TiB), pebibyte (PiB) and beyond.

The recommendation applies to the entire scale, and it is the same discipline as everywhere else in this story. If you mean the binary value, write GiB, MiB, TiB. If you mean the decimal value, write GB, MB, TB. Never write one while meaning the other. The standard is correct, unambiguous, and has been available for over a quarter of a century. Use it: prefer the explicit binary prefixes (KiB, MiB, GiB, TiB) wherever you actually mean powers of two, which on a computer is most of the time. The only thing still keeping GB where GiB belongs is habit, and habit is precisely what put pound-force seconds where newton-seconds were expected.

A 193-million-dollar spacecraft was lost because one team did its sums in imperial units. The number was right. The unit was from the wrong century.

The lesson costs nothing to apply. In science and engineering, measure in metric; the rest of the world, and most of science within the imperial holdouts themselves, settled this long ago for exactly the reason the orbiter demonstrates. And where two systems must still meet, name the unit of every number that crosses between them, so that the compiler or the variable name holds what no shared assumption ever safely will.

Read the full article on vivianvoss.net →

By Vivian Voss, System Architect and Software Developer. Follow me on LinkedIn for daily technical writing.

grep: An Hour at Bell Labs in 1973

Vivian Voss — Tue, 19 May 2026 07:29:50 +0000

Technical Beauty — Episode 36

A development server holds a mystery. Someone deployed something, the logs hold the truth. One types grep -i timeout /var/log/messages and three lines admit what happened. The command was unremarkable. The thing that made it possible has been answering that question since November 1973, and the architecture inside has held that whole time.

The Hour

The story is well-known in Unix circles but bears retelling because it remains the cleanest example of what a productive afternoon at Bell Labs in 1973 looked like.

Doug McIlroy, who ran the research department at Bell Labs and is the man behind the Unix pipeline as a concept, came to Ken Thompson with a request. Lee E. McMahon, also at Bell Labs, wanted to analyse the text of the Federalist Papers by pattern. The need was specific and mundane: search a body of text for occurrences of certain words and word-classes. The tools available were ed's g/re/p command — global, regular-expression, print — which printed every line of a file matching a regex. It worked, but only inside ed.

McIlroy asked Thompson to extract the command into a standalone tool. Thompson, by Brian Kernighan's later telling (in "The Unix Programming Environment" and elsewhere), disappeared into his office for about an hour and came back with grep. The name was the ed command turned into a single word. The interface was the same regex language that ed users already typed daily.

That hour produced a tool that, fifty-three years later, ships in the base system of every Unix-derived operating system in production. It is also, on any reasonable telling, the tool that taught a generation of working programmers what regular expressions were for.

The Surface

The surface of grep is famously austere.

grep pattern files

One pattern, n files, print the matching lines. The pattern is a regular expression in the grep dialect (POSIX BRE by default, POSIX ERE with -E, fixed strings with -F, Perl-compatible with -P where the implementation includes it). Three flags carry the bulk of daily use:

grep ERROR /var/log/messages              # case-sensitive, basic regex
grep -i timeout /var/log/messages         # case-insensitive
grep -v INFO /var/log/messages            # invert: print non-matching
grep -r "TODO" src/                       # recurse into directories
ps aux | grep -v grep | grep nginx        # composed through a pipe

The output is text on stdout: matching lines, one per line, optionally preceded by filename and line number with -n -H. The compositional fit with pipes is what made grep central to Unix from its first day. grep | awk | sort | uniq is not a script someone wrote; it is a sentence in the language Unix happens to be.

Over the years the variant tools merged. egrep (extended regex with +, ?, alternation) became grep -E. fgrep (fixed strings, no regex parsing) became grep -F. pcregrep (Perl-compatible regex with lookahead and named groups) became grep -P on implementations that include it. One tool, four dialects, single calling convention.

The Layers Underneath

The implementation is the part that does not look modest once you look at it.

Naïve regex matching is slow. The straightforward approach (backtracking through the pattern tree at every position in the input) has worst-case runtime exponential in the pattern length for certain patterns. Production grep cannot afford that.

The interior of GNU grep, the most widely deployed implementation, was carried for many years by Mike Haertel (afaik the main author from the late 1980s through the 1990s and into the 2000s). His "why GNU grep is fast" note, posted to the FreeBSD mailing list in August 2010, is one of the cleanest pieces of performance-engineering writing one will encounter. He states the principle: "The key to making programs fast is to make them do practically nothing."

The mechanics he applied:

Boyer-Moore for fixed strings. When the pattern contains no regex metacharacters, GNU grep uses the Boyer-Moore algorithm (Robert Boyer and J Strother Moore, 1977). Boyer-Moore is sub-linear on average: it skips characters in the input it has already determined cannot start a match. On a typical search, this is the single largest performance win.
Two-way string matching for harder cases. When fixed-string matching does not apply but the pattern has a fixed prefix or suffix, a two-way string matcher (Crochemore and Perrin, 1991) handles the harder cases without falling back to general regex.
Thompson's NFA construction. For full regex, GNU grep falls back to the NFA-construction algorithm Thompson himself published in 1968 (CACM, "Regular Expression Search Algorithm"). The NFA is built once and run as a deterministic state machine over the input, avoiding the exponential blow-up that naïve backtracking allows.
mmap on the input. GNU grep mmaps the input file into memory rather than reading it through stdio. The match runs directly on the mapped pages; the kernel pages-in on demand. No userland copy, no stdio buffering.
Avoid the input where possible. GNU grep does not look at every byte. The Boyer-Moore skip table lets it advance by the pattern length on a mismatch; on a typical English-text input with a four- or five-character pattern, the inner loop touches a small fraction of the file.

The result is a tool that, on modern commodity hardware, scans several gigabytes of plain text per second. The 1973 surface and the late-1980s interior happen to fit together such that the workflow Thompson designed in an hour is still the workflow one wants in 2026.

On FreeBSD

FreeBSD has shipped bsdgrep in the base system since around 2010 (afaik), a BSD-licensed reimplementation that replaced the GNU-licensed version in the base tree. The reason was licence: the FreeBSD project preferred to keep its base userland under BSD licences for the same reasons of coexistence that have applied to FreeBSD's licence philosophy since the project's earliest days.

bsdgrep is a leaner implementation than GNU grep. It does not aim to match GNU grep's last-mile performance optimisations on every input class; it aims for correctness, POSIX compliance, and small footprint. For the typical workloads on a FreeBSD system (log scanning, build-system uses, ports infrastructure), the difference is invisible. For the workload of scanning a multi-gigabyte corpus repeatedly, GNU grep is one pkg install gnugrep away.

The FreeBSD path to grep is therefore: /usr/bin/grep is bsdgrep, available without installing anything, on every FreeBSD system from a minimal install upward. OpenBSD and NetBSD follow the same pattern. On Linux, GNU grep is the default; /bin/grep and /usr/bin/grep point to it on the major distributions.

The Lineage

The shape — pattern, file, matching lines on stdout — has stayed identical for fifty-three years. The modern descendants reproduce it exactly.

ack arrived in 2005, written by Andy Lester (afaik) in Perl. Its primary differences from grep were ergonomic: by default it skipped version-control directories, build artefacts and binary files, and it recurse-searched by default. The interface was grep's.

ag, the silver searcher, arrived in 2011 from Geoff Greer (afaik), written in C. ag's primary differences from ack were performance: ack was Perl and slow; ag was C and fast. The interface was still grep's.

ripgrep arrived in 2016 from Andrew Gallant (BurntSushi on GitHub), written in Rust. ripgrep honours .gitignore by default, uses SIMD instructions for fast searching, and is competitive with or faster than GNU grep on most workloads. The interface was, once again, still grep's.

The pattern across these descendants is worth naming. Each was written by a developer who looked at grep and concluded that it was the right shape, that the only thing worth changing was the implementation. Thompson's original choice (the regex grammar from ed, the line-oriented output, the pipe-as-default-output) survived three implementation rewrites across nearly two decades without anyone proposing a different interface. That is what an interface looks like when its designer was paying attention.

A Note on Regular Expressions

Stephen Cole Kleene formalised regular sets in 1956 as a mathematical object: a finite-state recogniser for regular languages. The construction was not intended for programmers; it was a piece of theoretical computer science.

Thompson, in 1968, wrote the first practical implementation in his CACM paper. He showed that an NFA could be built directly from a regex syntax, that the NFA could be simulated efficiently, and that this gave a fast pattern-matching procedure suitable for an editor. The QED editor he was working on at the time was the first user of this construction; ed inherited it; grep extracted it.

The pedagogical consequence has been quietly enormous. A working programmer in 2026 who knows regex learnt it because grep was the first place they typed one. The pattern language travelled from ed to grep to sed to awk to Perl to JavaScript to every modern programming environment. Kleene's mathematical object became a working tool because Thompson saw, in 1968, what one could do with it, and because grep, in 1973, made it the default search interface for the working programmer.

The Aphorism

A man wrote, in about an hour, a tool that taught a generation of programmers what regular expressions were for. Then he wrote Unix, B, C, and UTF-8 in the years on either side. One rather suspects he was paying attention to which tools the world would still need fifty years later.

The Federalist Papers got their pattern analysis. The world got grep. Lee McMahon, on the available record, has been somewhat under-credited; one rather hopes he was pleased.

Read the full article on vivianvoss.net →

By Vivian Voss — System Architect & Software Developer. Follow me on LinkedIn for daily technical writing.

DTrace vs eBPF: The Twelve-Year Reconstruction

Vivian Voss — Mon, 18 May 2026 07:11:29 +0000

The Unix Way — Episode 17

A production server is slow. The senior engineer wants to know which syscall is blocking, on which thread, for how long, without restarting the service. The honest answer in 2026 is: that question has had a clean answer on Solaris and FreeBSD since the mid-2000s, and a clean answer on Linux since 2018. The fifteen-year gap is the story.

FreeBSD: DTrace (In Base Since 2009)

Bryan Cantrill, Mike Shapiro and Adam Leventhal designed DTrace at Sun Microsystems beginning in 2001 and shipped a working implementation in November 2003. Their USENIX 2004 paper, Dynamic Instrumentation of Production Systems, is still the canonical reference. Solaris 10 made DTrace generally available in January 2005.

The principle was rather radical for its time. Existing tracing tools (truss, strace, ptrace) interposed on the traced process; they either stopped it at each syscall or used signals to interrupt it, and either way the act of measurement disturbed what was being measured. DTrace took the opposite approach: instrument the kernel itself with passive probes, fire the probes only when armed, and never block the path of the traced code. The cost of an unarmed probe was zero; the cost of an armed probe was a small constant.

To make this safe in production, the team did something rather elegant: they made the DTrace script language deliberately Turing-incomplete. No unbounded loops, no recursion, no dynamic memory allocation in the probe action. The compiler could prove that every probe terminated within a bounded time and consumed a bounded amount of memory before loading it into the kernel. A misused script could not crash production; it would be rejected at compile time or terminate cleanly at the bound.

DTrace was released under the Common Development and Distribution License (CDDL) as part of OpenSolaris in 2005. John Birrell of the FreeBSD project ported DTrace to FreeBSD, where it landed in 7.1-RELEASE on 6 January 2009. The port covered the kernel provider, the userland USDT (User-Statically-Defined-Tracing) probes, and the full D script language. On a modern FreeBSD host, the number of installed probes runs into the tens of thousands; one enumerates them with dtrace -l.

A complete one-liner that counts open syscalls by process:

dtrace -n 'syscall::open*:entry { @[execname] = count(); }'

Press Control-D and DTrace prints a histogram. The traced processes did not pause; the production load did not skip; the open call observed was the actual open call, not a replay.

Linux: The Licence Wall

The Linux kernel could not adopt DTrace upstream. The structural cause is worth naming clearly, because the popular framing reverses it.

The CDDL is a file-level weak copyleft licence: it requires that CDDL files remain CDDL, but it does not demand that other files in the same project be relicensed. CDDL accepts coexistence with files under any other licence. The GPL is strong copyleft: any work that links GPL code must, as a whole, be distributed under GPL. The GPL does not accept coexistence with any licence that is not itself GPL.

The two are not symmetric. The CDDL would have accepted Linux as a neighbour without conditions. The GPL cannot accept any neighbour that does not relicense itself to GPL. The block was not in the CDDL's behaviour; it was in the GPL's design. One could equally describe the GPL as a licence that absorbs every other licence it touches.

This was not a hypothetical. Sun (and later Oracle) made clear that DTrace would not be relicensed to GPL. The Linux community concluded that the integration of DTrace into the mainline kernel was not legally possible without the GPL's claim on the combined work being honoured, which Sun would not do. A loadable kernel module port from Oracle has existed for some years, but it requires a CDDL kernel module to be loaded into a GPL kernel, a configuration that most distributions decline to ship out of caution about the GPL's reach.

The Linux answer was therefore not a port. The Linux answer was a rebuild, starting from a substrate that had existed since 1992.

Linux: eBPF, Built From BPF

Steven McCanne and Van Jacobson at Lawrence Berkeley Laboratory introduced BPF, the Berkeley Packet Filter, in 1992 (afaik; the design was first described at USENIX in 1993). BPF was originally a small in-kernel virtual machine used by tcpdump to filter packets without copying every packet up to userland. The kernel ran a small bytecode program against each packet header; only matching packets were forwarded. The design was simple, fast and provably safe: the bytecode was bounded and could not loop unboundedly.

For two decades BPF remained a packet-filter substrate. Then, in 2013-2014, Alexei Starovoitov (then at PLUMgrid, now Meta) and Daniel Borkmann (then at Cisco, now Isovalent) rewrote it. The result was eBPF: a more general virtual machine with a register-based instruction set, broader data types, kernel-side maps for state, and verifier guarantees borrowed in spirit from DTrace. The eBPF rework was accepted by David Miller into the networking tree in March 2014 and shipped in Linux 3.18 on 7 December 2014.

The verifier is the safety-critical piece. Before loading, the kernel walks every possible path of the eBPF program and proves that the program terminates, that all memory accesses are bounded, that all pointers are valid for their context, that no loop runs unboundedly. The proof is, on its terms, more elaborate than DTrace's Turing-incomplete language because eBPF chose a more powerful instruction set. The result is the same in production: a misused script cannot crash the kernel.

User-facing tracing tools came in the years that followed.

The BPF Compiler Collection (BCC), led by Brenden Blanco at IO Visor with substantial contributions from Brendan Gregg and many others, arrived in 2015. BCC let one write tracing scripts in Python with embedded C, which the kernel-side runtime would compile to eBPF bytecode and load. The ergonomics were better than raw eBPF assembly; they were still not DTrace.

bpftrace was the closer match. Alastair Robertson (then a researcher) wrote the initial implementation; Brendan Gregg announced it in October 2018 as "DTrace 2.0 for Linux". The bpftrace script language deliberately echoes the D language: probe specification, action block, aggregations, histograms, stack traces. A bpftrace one-liner counting open syscalls by process:

bpftrace -e 'tracepoint:syscalls:sys_enter_open* { @[comm] = count(); }'

The same answer as the DTrace example above. The five letters at the front are different; the shape is identical.

The Bridge: Brendan Gregg

The biographical detail is worth naming. Brendan Gregg had been one of the most prolific DTrace authors at Sun and Joyent. He co-wrote DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X, and FreeBSD with Jim Mauro in 2011. He moved to Netflix in 2014, where the production fleet runs Linux. He spent the following decade rebuilding, on Linux, the observability he had taken for granted on Solaris. His 2019 book BPF Performance Tools: Linux System and Application Observability is the spiritual sequel to the DTrace book; the same author, the same shape of question, two ecosystems, fifteen years apart.

His public reflection has been measured. In 2018, on announcing bpftrace, he wrote that the goal was not to compete with DTrace but to deliver, on Linux, the production-tracing experience that DTrace had defined. The Linux engineering community has not generally disputed the framing.

The Point

The Unix way is to expose the system to the operator and trust them to ask honest questions. DTrace was the canonical answer in 2003. The shape was: instrument every probe, run on demand, prove safety at load, return aggregated results.

Linux shipped the equivalent shape in 2018, fifteen years later. The engineering on Linux was not harder than the engineering on Solaris in 2003; the verifier work was more elaborate because eBPF chose a more powerful instruction set, but the safety guarantee is comparable. The journey was longer for a reason that had nothing to do with engineering: the original work was released under a licence the Linux kernel could not accept, the rebuild had to start from a different substrate, and the user-facing tools had to be written essentially from scratch.

Two practical takeaways for a working engineer in 2026:

First, on FreeBSD the tracing answer has been settled for the better part of two decades. dtrace -l enumerates the probes; the D language is small enough to learn in a weekend; the production safety is, on the architectural record, the strongest of any tracing system in active use.

Second, on Linux the answer has finally settled too. bpftrace is the closest match to D in syntax and intent; BCC remains useful for richer scripts; perf, ftrace and the underlying tracepoints continue to coexist for narrower use cases. The shape of "ask the kernel anything in production" is now a settled Linux capability, and Brendan Gregg has, between two books, written most of the documentation one will need.

The longer point is not which tracer is better. The longer point is that an architectural idea, however clearly demonstrated, can be delayed by fifteen years on a neighbouring system if the receiving licence refuses coexistence with the giving one. The shape was always the same. The journey was a great deal longer.

Read the full article on vivianvoss.net →

By Vivian Voss — System Architect & Software Developer. Follow me on LinkedIn for daily technical writing.

Why We Containerise Everything

Vivian Voss — Sun, 17 May 2026 08:14:06 +0000

On Second Thought — Episode 08

A new service is started. The README is written, then the Dockerfile. Within the hour the team is discussing the registry, the orchestrator, the sidecar and the helm chart. Nobody quite remembers when this became the second decision after the first commit. The container is the unit; everything else is paperwork.

This essay is about how the second decision became the first one.

The Axiom

The reflex is universal. Every new service ships in its own container. "Where does this run?" has one answer now: in its container. The container is no longer a deployment artefact but the unit of thought. Architecture diagrams are drawn in rectangles labelled with image names. Job postings list Docker and Kubernetes alongside the programming language, as if they were peers of equal weight.

The reflex is so well-trained that the question "could this run as a single process on a host" sounds almost rude. It sounds, to a sprint-shaped team in 2026, the way "could we just use a CSV file" sounds to a database team. Technically defensible, professionally suspect.

That feeling is the topic.

The Origin

Three currents converged. None of them, on its own, demanded that everything be containerised. Their convergence did.

The Isolation Current

The isolation question was answered, rather elegantly, in 1999. Poul-Henning Kamp, working for R&D Associates in Denmark, needed safe multi-tenancy on a single FreeBSD host. The traditional Unix answer was chroot, which constrains the filesystem view but not much else: a determined process inside a chroot can still see other processes, bind privileged ports, manipulate the network stack, and find its way out through any number of documented escapes. Kamp wrote Jails: kernel-native isolation that confined not only the filesystem view but the process namespace, the network stack, the user IDs, and the system calls available. He published the paper "Jails: Confining the omnipotent root" at SANE 2000 and shipped the implementation in FreeBSD 4.0 in March of the same year.

The design has aged extraordinarily well. No daemon, no image layers, no registry, no orchestrator. The kernel handles isolation directly. A jail starts in milliseconds because it is, fundamentally, a process. A FreeBSD host can comfortably run a thousand jails on four gigabytes of RAM, because the additional memory per jail is essentially zero: they share the base system.

The pattern was reproduced, less coherently. Solaris Zones (2004) followed similar principles, with a more elaborate administrative wrapping. Linux gained namespaces and cgroups between 2006 and 2008, but the design choice was the opposite of Jails: rather than a single coherent abstraction, Linux split isolation into eight separate namespace types (mount, UTS, IPC, PID, network, user, cgroup, time), each introduced in a different kernel release, each requiring the caller to compose them correctly. The user namespace in particular has produced a long sequence of CVEs, several of them giving container escape to unprivileged users, because the abstraction crosses the kernel privilege boundary in ways the original FreeBSD design carefully avoided. LXC (2008) composed the namespaces into something resembling Jails; Docker (2013) wrapped LXC, then replaced it with libcontainer, and gave the resulting pattern a brand, a CLI, an image format, a registry, and the marketing budget to make all of it the new normal. The technique was twenty-five years old by the time the world learnt to spell its name, and considerably more careful in the original.

This is the first current. The Jails version of it would have produced a useful tool. The Linux version of it, less coherent, more error-prone, dependent on a daemon, would have produced something more or less the same, given enough time to absorb the operational lessons. Neither alone would have produced "containerise everything".

The Org-Shape Current

The second current came from a different room entirely. Scrum was codified between 1995 (Sutherland and Schwaber's OOPSLA paper) and 2001 (the Agile Manifesto). It split the organisation into sprint-shaped teams: own backlog, own velocity, own deploy. The unit of organisational existence became the team that fits in a standup.

Melvin Conway, in 1968, had described the consequence in advance: any organisation that designs a system will produce a design whose structure is a copy of the organisation's communication structure. Sprint-shaped teams want service-shaped architecture. Each team needs its own deployable, its own release cadence, its own runtime, its own database, because anything shared becomes a synchronisation point and synchronisation points slow down sprints.

The container became the natural envelope. One service per team. One team per container. The Dockerfile became the contract between team and platform. Conway's law, originally a description of how organisations leak into their architecture, became a contract: the architecture is the org chart, rendered in YAML.

This is the second current. It does not require containers, technically. It requires a deployment boundary that matches the team boundary. Containers happened to be the available boundary.

The Runtime Current

The third current came from a third room. Node.js was released in 2009 by Ryan Dahl. The runtime was single-threaded by design: one event loop, asynchronous I/O, no shared-memory concurrency. For the use cases Dahl had in mind, that was a feature: server-side I/O without the threading complexity of Apache or Java.

Multi-core hardware did not fit the runtime. A box with eight cores running Node.js could, at best, see one of them. The fix was not to redesign the runtime. The fix was to start more processes. Cluster mode in Node.js, then process supervisors, then containers, then container orchestrators. The entire cloud-native scheduling stack absorbed, in part, a workaround for a runtime that did not know what to do with a second core.

The same logic applied to Python (the GIL), to Ruby (likewise), and to a generation of interpreted languages whose concurrency story ended at the process boundary. The container became the only practical unit of multi-core deployment for languages designed before multi-core was assumed.

This is the third current. It does not, on its own, require containerisation. It requires multiple processes. Containers happened to be how teams already did that.

The Convergence

Three currents, one architecture. Each could have produced something modest on its own: a useful isolation tool, a team-deployment boundary, a multi-process runtime strategy. Together, they produced the default that "everything ships in a container" and the secondary default that "everything needs an orchestrator to schedule the containers". By the time anyone thought to question the convergence, the convergence had become the architecture, the curriculum, the conference circuit, and the hiring funnel.

The honest description of what the three currents produced, taken together, is a stack of workarounds presented as architecture. The isolation current produced a useful primitive, then Linux reproduced it less carefully, then Docker wrapped the result in a daemon nobody asked for. The org-shape current produced a deployment-boundary requirement that pre-existing tooling happened to satisfy, badly. The runtime current produced a multi-process strategy because the runtime could not be fixed in the place that needed fixing. Each layer answers the layer beneath it, and each layer needs the next layer to answer back. None of the three currents, alone, demanded a stack of any height. Together, they make a workaround feel like architecture.

The Cost

The bill arrives in four layers, each justified by the previous.

Image Bloat

A minimal Node.js container image, based on Alpine Linux, is around 150 megabytes. The convenient default, node:22 without the -alpine qualifier, exceeds a gigabyte. Most of the gigabyte is build tools the application will not need at runtime: compilers, package managers, debug symbols, full GNU userland. Every unused package is, in security terms, an unpatched CVE waiting for someone to notice.

The application itself is a footnote in its own deployment. A 50-megabyte Node.js service ships in a 1-gigabyte image, which is then pushed to a registry, pulled to nodes, cached, layered, garbage-collected and replicated. The application is one to five per cent of the bytes moved.

The Supply Chain

The gigabyte travels with its npm tree. Every container that runs Node.js carries hundreds of transitive dependencies that the runtime, by design, loads at full privilege at start-up. The container's promise of reproducibility delivers, with the same fidelity, a reproducible attack surface.

In spring 2026 this stopped being theoretical. A six-week wave of supply-chain compromises ran through npm: the axios incident in March, attributed to Sapphire Sleet (a North Korean cluster) at roughly 100 million weekly downloads carrying a RAT through a plain-crypto-js postinstall; the SAP-namespace mini-Shai-Hulud incident in late April; the TanStack mini-Shai-Hulud in mid-May, reaching 84 versions in the first wave and spreading within 48 hours to 172 packages and 403 versions across npm and PyPI, totalling around 518 million cumulative downloads. In May, vx-underground reported the full Shai-Hulud source code in public circulation, lowering the cost of the next wave to the cost of one fork.

Every container pulled in that window inherited the blast radius. The Dockerfile's FROM node:22 is an instruction to absorb whichever supply-chain state happens to be current at build time. The reproducibility model contains no signal for whether that state is healthy. A Helm chart applied in good faith on a Wednesday morning in May 2026 could, in principle, ship a credential-stealer into production by Wednesday afternoon, and the deployment record would describe the operation as nominal.

The image is not the problem on its own; the image plus a runtime that loads its dependencies at start-up plus a public registry with no enforced provenance is the problem. The container's contribution is to make the problem deployable.

Daemon Overhead

Docker requires a persistent daemon, dockerd, which mediates between the container CLI and the kernel isolation primitives. The daemon is not free. A documented case on Docker's own forums shows dockerd consuming over five gigabytes of virtual memory while supervising 183 containers. The runtime that supervises the containers competes with the containers for the same memory budget.

The deeper architectural point is that the daemon is structurally unnecessary. FreeBSD Jails do not have a daemon. The kernel handles isolation directly; the CLI talks to the kernel; there is nothing in between to keep alive. Linux namespaces likewise do not strictly require a daemon: Podman demonstrated that a daemonless, rootless container runtime is possible while remaining Docker-compatible. The daemon is a historical accident of how Docker happened to be built, not a requirement of the isolation it provides.

Network Tax

A function call within a single process costs roughly one microsecond. A network call between two services in the same datacentre, even on a fast network, costs one to five milliseconds: about a factor of one thousand more expensive. The cost is mostly fixed: DNS resolution, TCP handshake, TLS, JSON serialisation and parsing.

Chain ten services in a typical microservices request graph and the request has paid thirty milliseconds of pure infrastructure latency before any business logic executes. At ten thousand requests per second, benchmarks show the microservices version of an application incurring around 140% higher p99 latency, 300% higher memory consumption, and 260% higher network I/O than the equivalent monolith. The complexity is paid in resources, then again in operational complexity, then again in the engineering hours spent diagnosing tail-latency spikes.

The Fix for the Fix

The latency the architecture introduced required service meshes to manage. Istio became the most prominent example: a control plane plus sidecar proxies that handle routing, retries, circuit breaking, observability and authentication between services. Istio's sidecar adds approximately 2.5 milliseconds of latency at p90, costs around 0.20 vCPU and 60 megabytes of RAM per proxy. Ten services with two proxies per hop consume four vCPU and over a gigabyte of memory for routing alone, paid before the application does anything.

Istio itself, telling on the architecture, quietly merged its own control plane from microservices back into a single binary in 2023. The microservices framework concluded that microservices were the wrong shape for its own implementation. The lesson did not generalise.

The Structural Cost

The deepest cost is not bytes or milliseconds. It is that sprint-shaped teams produce service-shaped architecture and never see another shape. The container makes the org chart load-bearing. Reorganising the team now requires reorganising the system. Conway's law, originally a description, became a contract.

When the team is split, the service is split. When the service is split, the runtime is split. When the runtime is split, the orchestrator becomes necessary. When the orchestrator becomes necessary, the platform team becomes necessary. When the platform team becomes necessary, the abstraction over the orchestrator becomes necessary, which is what the next Helm chart will manage. The architecture is no longer designed; it is grown, in the shape of the standups.

The Question

The Retreat, Documented

Before the alternatives, the retreat. Documented, and rarely cited.

Amazon Prime Video published, in 2023, an account of moving their video monitoring service from a distributed microservices architecture, deployed on AWS Lambda and Step Functions, back to a monolithic application running on EC2. The reported cost reduction was approximately ninety per cent. The article was written by the engineers who did the work and approved by Amazon. The Hacker News thread that followed contained the predictable defences ("they were using Lambda wrong", "they should have used containers properly"), which rather illustrated the point: the alternatives to the distributed default were treated as misconfiguration of the default, not as legitimate alternatives.

Segment, the customer data platform now part of Twilio, consolidated 140 microservices into a single monolith in 2017. Their engineering blog described the operational burden of the distributed version as untenable: testing went from hours to milliseconds, deployments simplified, on-call load dropped.

Istio, as noted, consolidated its own control plane. The microservices framework reached the conclusion that for its own implementation, a single binary was better.

37signals (Basecamp, HEY) left AWS entirely in 2023, returning to on-premises hardware running their own stack. Reported savings: around seven million dollars over five years, with hardware recouped inside a year. The architecture stayed in containers but left the orchestrator behind: a small fleet of dedicated servers running Docker, no Kubernetes, no cloud control plane.

The pattern is consistent. Teams reach the operational and financial ceiling of the distributed default and quietly walk back. The walk-back is presented in each case as an exception, not as evidence about the default.

The Alternatives

The alternatives have been quietly working all along.

On FreeBSD: Jails provide kernel-native process isolation with no daemon, no image layers, no registry, no orchestrator. They start in milliseconds because they are processes. Capsicum (FreeBSD 9.0, 2012) provides capability-based isolation at the process boundary: a process can voluntarily restrict its own access to specific file descriptors, syscalls, and resources, granted where it earns its keep. The combination is what process isolation looks like when designed by people who think about the kernel for a living.

On OpenBSD: pledge (5.9, 2015) lets a process declare which syscall categories it will use and the kernel will kill it if it tries to use any others. unveil (6.4, 2018) lets a process declare which filesystem paths it can see, narrowing the filesystem view to exactly the paths the application needs. The pair is what process-level least-privilege looks like when implemented by people who consider security a property of the operating system rather than a layer above it.

On the runtime side: Go and Rust use all CPU cores by default, ship as single static binaries of one to fifteen megabytes, and require no orchestrator to find a second thread. A Go binary running on FreeBSD inside a jail is, structurally, what containerised microservices were trying to be: isolated, lightweight, fast to start, simple to deploy.

None of these are new. None of these require migration to a different operating system, in most cases: pledge and unveil are OpenBSD-specific, but the principle of capability-based process isolation has Linux equivalents (seccomp, Landlock). The point is not that one stack is correct. The point is that the convergence of three currents produced a default that none of the three currents, individually, demanded.

The default arrived, in practice, on a marketing budget. Docker (the company) raised over 270 million dollars in venture funding. The Cloud Native Computing Foundation, the body that stewards Kubernetes, lists more than 200 corporate members, several of them with platform businesses whose strategy depends on the container being the unit of deployment. The conferences, the certifications, the cloud-native landscape diagram with its hundreds of competing logos: each is a piece of advertisement that pays for itself by reinforcing the architecture. The technical case for containerising everything has rarely been the loudest case in the room. The case won the room on advertisement, not on parts removed.

The alternatives quoted above did not arrive on similar budgets. FreeBSD's marketing department is, generously, the FreeBSD Foundation's annual report. The OpenBSD project's marketing department is Theo de Raadt's mailing-list voice. Go and Rust each had corporate sponsors, but neither sponsor sold a cloud orchestrator that depended on the runtime being too weak to find its own cores. The alternatives are not better-marketed; they are differently funded, which is most of the explanation for why they sound, in 2026, like nostalgia.

Conway's law can be read forwards or backwards. Forwards: organisational structure shapes architecture. Backwards: architecture shapes which organisational structures remain possible. A sprint-shaped team will produce service-shaped architecture; service-shaped architecture, once built, will tend to enforce sprint-shaped teams. The cycle is reinforcing.

The honest question is not which orchestrator. It is whether the isolation question, the runtime question and the team question were ever the same question. They became the same question because the same default answered all three. The default arrived without a vote.

What if isolation were a property of a process, granted where it earns its keep, rather than a shipping container we wrap around everything we no longer wish to think about? What if the runtime knew its own cores? What if the team shape did not need to be the system shape?

The Dockerfile is so reflexive that the alternatives feel like nostalgia. Twenty-five years of working alternatives suggest the nostalgia is misplaced. They are simply the path not taken, still running, still patient, still answering the question the convergence forgot was three.

Read the full article on vivianvoss.net →

The Screenshot Diary: Microsoft Recall, the Vault, and the Wall Next to It

Vivian Voss — Sat, 16 May 2026 08:11:51 +0000

Not in the Brief, Episode 03

Open Windows 11 on a Copilot+ PC. Navigate to Settings, Privacy & security, Recall & snapshots. The switch is there. The feature is opt-in today. It was not opt-in when it was first shipped in May 2024, and the first version stored its snapshot database mostly in cleartext on disk. That part of the history is not in the brief any more; it is part of the architecture's biography.

This is the third episode of Not in the Brief. The first two looked at the browser layer: Chrome's local AI model that any web page can call without a permission prompt, and Edge's password vault that decrypts itself eagerly at launch and leaves the cleartext in memory for the session. The third looks at the operating-system layer, where Microsoft built a feature that records the screen continuously, indexes the recording with a local AI model, and offers natural-language search across the user's past activity.

The feature is called Recall. The architecture is genuinely interesting. The history of how it arrived at the current architecture is more interesting still.

The Feature

Recall takes snapshots of whatever is on the screen at regular intervals, stores them encrypted on the local disk, and applies a local AI model to extract text (OCR) and semantic embeddings from each snapshot. The user can then open Recall and type a natural-language query: "find the page about cloud egress costs I read last Tuesday", "show me the email about the meeting on Friday", "what was that diagram I saw three weeks ago". Recall returns matching snapshots from the user's own past activity, with the timeline scrubbable in either direction.

The capability is, technically, a real piece of work. On-device OCR over a continuous screen capture, with semantic embedding indexing fast enough for natural-language query, on the user's laptop without a cloud round-trip, is the kind of thing that would have been a research paper five years ago. The neural-processing-unit (NPU) hardware on Copilot+ PCs makes it tractable. NPUs rated at 40 trillion operations per second or higher are the hardware threshold; the major laptop manufacturers (Microsoft, Dell, HP, ASUS, Lenovo, Samsung) ship Copilot+ models across most of their current portfolios.

What the feature produces, when enabled, is a continuous catalogue of what appeared on the screen. Every email read, every document edited, every page browsed, every video frame paused at, every chat window left open, every banking session, every medical-record viewer, every private message: indexed and queryable in natural language. The catalogue is held on the user's own disk, by software the user did not write, and addressed by a query interface the user did not specify.

The question of what the feature does is therefore the easy question. The question of how it was introduced, and what the architecture says about Microsoft's threat model, is the part of the brief that needed a second draft.

The Introduction

The feature was announced at Microsoft Build in May 2024. The original design was:

Default on. Recall would activate on every Copilot+ PC at first run. No setup prompt, no consent dialog. The user would receive a notification informing them the feature was active.
Storage in cleartext, mostly. Snapshots and the searchable index database were stored on the local disk in a SQLite database that was, in the words of the security researchers who looked at it, "mostly cleartext". The OCR text was unencrypted; the snapshots themselves were not deeply protected against an attacker with access to the user account.
No additional authentication for access. Once a user was logged in, any process running in the user's context could read the Recall database.

The security community, predictably, did not love this. Researcher Alexander Hagenah, based in Zürich, published a proof-of-concept tool called TotalRecall in June 2024 that extracted the Recall database trivially, ran SQL queries on the OCR text, and demonstrated that an attacker who could exfiltrate a small SQLite file from a target machine could have a complete searchable history of the target's recent screen activity. The story moved from the security press to the mainstream press within days.

Microsoft, to its credit, did not double down on the first design. The feature was withdrawn from the Copilot+ launch in June 2024. The company announced a redesign through summer 2024, missed an October 2024 release window, and shipped a redesigned preview to Windows Insiders in November 2024. The redesigned feature reached general availability on Copilot+ PCs in April 2025.

The redesign is, by the standards of what the original feature was, a real change in architecture. It is the second draft, and the second draft is the one that ships. The first draft is the one that survives in the architecture's biography, however, because the decision to ship a continuous screen-capture catalogue by default with mostly-cleartext storage was a deliberate decision by a large and competent engineering organisation. The second draft did not unmake the first decision. It rewrote the storage layer.

The Mechanics

The post-redesign architecture splits cleanly into two halves: the vault, and what happens after the vault has been opened. Both halves are important.

The Vault

The architecture, as documented by Microsoft on the Windows Experience Blog and Microsoft Learn, rests on four pillars:

Virtualisation-Based Security (VBS) Enclave. The components of Recall that handle decryption operate inside a VBS Enclave: a virtualised execution context that the rest of Windows, including the kernel running outside the enclave, cannot directly observe or modify. Encryption keys for Recall snapshots are generated, stored and used exclusively inside the enclave.
AES-256-GCM encryption at rest. Snapshots and the vector index are encrypted using AES-256 in GCM mode, with per-record keys. The Snapshot Store on disk is, at any given moment when the user is not actively using Recall, fully encrypted.
Trusted Platform Module (TPM) binding. The key material is sealed against the device's TPM. Removing the disk from the machine and reading it on a different machine does not yield decryptable snapshots; the TPM is required to release the unlocking material.
Windows Hello Enhanced Sign-In Security. To open the Recall timeline and search the snapshots, the user must authenticate with Windows Hello: a biometric (fingerprint or face) or, where biometric is not available, a PIN. The authentication unlocks the enclave's ability to decrypt for the duration of an authorised session, which times out and must be reauthorised.

Inside that envelope, the vault is solid. The keys are unreachable from outside the enclave. The data on disk is unreadable without the TPM. The unlock requires a biometric or PIN. The original cleartext-database criticism is, on the current architecture, no longer accurate. Microsoft did the engineering work and the engineering work is good.

That is the part of the story that ends with "the vault is solid".

The Wall

The other part of the story is what happens after authentication. The Recall timeline, once the user has authenticated with Windows Hello and the enclave has decrypted the relevant snapshots, has to actually show the snapshots on the screen. Showing the snapshots requires that decrypted pixel data, decrypted OCR text and decrypted metadata leave the enclave and enter ordinary Windows processes that render UI to the user.

The process that does the rendering is called AIXHost.exe. It runs in the user's session at ordinary user privilege. It is not inside the VBS Enclave. The decrypted snapshot content lives inside its address space for as long as the timeline is open.

In March 2026, Alexander Hagenah, the same researcher who built the 2024 TotalRecall tool, published a successor called TotalRecall Reloaded. The new tool:

Runs as an ordinary user, with no administrative privileges, no kernel exploit, and no privilege escalation.
Does not attempt to break the VBS Enclave or extract any keys from it.
Does not bypass Windows Hello in any meaningful sense; the tool requires that the legitimate user has already authenticated.
Injects into AIXHost.exe, the Recall timeline-rendering process, after the legitimate user has authenticated with Windows Hello.
Reads decrypted screenshots, OCR text, and metadata directly from AIXHost.exe's address space, for as long as the timeline is open.

In Hagenah's summary: "The vault door is titanium. The wall next to it is drywall."

Microsoft's response, attributed to David Weston, Corporate Vice President for Microsoft Security, was published in The Verge and other outlets in April 2026. The position is consistent: the demonstrated access pattern is "consistent with intended protections and existing controls, and does not represent a bypass of a security boundary or unauthorised access to data". In other words, the architecture is functioning as designed; the decrypted-after-Hello content is, by design, accessible to processes running in the authenticated user's session; Microsoft is not classifying this as a vulnerability.

That is, on its own terms, an internally coherent position. The enclave is for key material and at-rest encryption; once the user has authorised access, the data is by design in the user's session and shares the user's threat surface. The position is defensible.

It is also incomplete in the way that the entire architecture, taken as a system, has been incomplete since the beginning: the design is correct from the inside out, and silent on the brief.

A Note on the Other Side of the Stack

Recall is a Windows feature. On a Unix-style operating system such as FreeBSD or any Linux distribution, the equivalent does not exist; there is no system service that continuously captures the screen, indexes it with a local AI model, and exposes a natural-language search interface across the user's past activity. The capability is not there because nobody on the kernel or base-system side built it. A screen recorder can be installed (pkg install ffmpeg on FreeBSD will give the user the building blocks), but installing it is an explicit act, the daemon is named, and the captured content sits in a file the user named in a directory the user chose. The architectural difference matters for the awareness question: on a Unix system, the absence of capture is the default; on a Copilot+ Windows machine, the presence of capture is the default and the user's choice is whether to switch it on.

The Risk

The architectural risk has two layers. The first is the post-authentication accessibility problem demonstrated by TotalRecall Reloaded: a user who is running malicious software (an installed application that turned hostile, a compromised browser extension, an unfortunate click on a phishing link) and who authenticates to Recall is, on the current architecture, exposing decrypted snapshot content to that software. This is, on Microsoft's stated threat model, working as intended. It is also, on any reasonable user expectation, not the trade-off the user signed up for when they clicked "Yes, save snapshots".

The second layer is structural, and it is the more important one.

What Recall Actually Catalogues

The feature, where enabled, produces a third-party catalogue of every second on the user's own machine. Three properties of that catalogue are worth stating in full:

Continuous capture. The snapshot interval is short and the trigger is activity-based. In practical use, every page that stays on the screen long enough to read is captured. Every document edited is captured at intermediate states. Every video paused is captured at the paused frame. The catalogue is not a list of files the user chose to save; it is a record of what was visually present on the screen, taken at machine cadence.
Software the user did not write. The catalogue is queried by a local AI model, addressed by a user interface, returned through a system process, and exposed to APIs that other Microsoft components (and, under enterprise deployment, other vendors) can extend. The user does not own the index format, the query interface, or the integration surface. The user owns the disk the encrypted snapshots sit on.
No context distinction. The capture does not distinguish between a banking session and a holiday photo. It does not distinguish between a private message and a public web page. It does not distinguish between a medical-record viewer and a recipe site. Exclusions are available (per-app, per-website) and the user can configure them, but the default is capture-everything-that-is-not-explicitly-excluded. The architectural decision is to capture by default and require the user to enumerate exceptions.

The opt-in switch is the consent record. The presence of the switch, on every Copilot+ machine, is the default that survived public review. Consider what that means: the engineering case for the feature, after extensive public scrutiny, was strong enough that the company concluded the right answer was to ship the feature opt-in on every supported machine rather than not ship it at all. The history is in the brief. The decision is in the brief. The user, in 2026, encounters the result of that decision as a Settings switch.

What the User Actually Signed Up For

Consider a user who buys a Copilot+ laptop in 2026. They unbox it, run the Windows setup, and at some point during setup they see a Recall dialog. The dialog says, in essence, "Recall captures snapshots of your screen so you can search later. Do you want to enable it?" The default option is "Not now". The user clicks "Not now" or "Yes". The dialog goes away.

The user has now made one of two choices: opt in to Recall, or not. Both choices are documented and have a consent record. Neither choice involved being told that:

The opt-in dialog defaults to disabled because the first design did not default to disabled, and the first design's default was changed only after public outcry.
The Recall switch is present on the machine whether or not the user has heard of the feature; the existence of the option is itself the default that survived public review.
The post-authentication threat model accepts that decrypted content lives in user-session processes that can be read by other user-session code. This is by design; the boundary ends at Windows Hello.
What is captured is everything visually present on the screen, by default, indexed by software the user did not write, in a format the user does not own.

None of those facts are hidden. Microsoft has published all of them. They are in the brief, if one is willing to read Microsoft Learn end to end. They are not in the brief in the practical sense: the user, on installing the operating system, was not given a chance to weigh them against the convenience benefit.

The architecture is what it is. The point of awareness is to know that it is what it is.

How to See It

The verification path is straightforward on any current Copilot+ PC.

Settings, Privacy & security, Recall & snapshots. The master switch lives here. If the toggle is off, no snapshots are being saved and the database is empty. If the toggle is on, snapshots are being captured at the configured interval (default approximately every few seconds when activity is detected). The same page exposes the storage cap (how much disk Recall is allowed to use) and the retention horizon (how far back snapshots are kept).

The Recall app, in-app Settings, exclusions. Per-app and per-website exclusions are configured here. Browsers in private/incognito mode are automatically excluded from snapshots, by design. Passwords and credit-card fields are likewise automatically masked. Other categories of sensitive content (medical record viewers, banking apps, internal corporate tools) the user must exclude manually.

Group Policy, on Pro/Enterprise editions. Open Group Policy Editor (gpedit.msc). Navigate to User Configuration, Administrative Templates, Windows Components, Windows AI. The policy "Turn off saving snapshots for Windows" disables the snapshot-saving capability across the user's session, irrespective of the user's per-machine choice. The corresponding registry value is HKCU\Software\Policies\Microsoft\Windows\WindowsAI\DisableAIDataAnalysis (DWORD, value 1).

Intune / MDM, on managed devices. The WindowsAI configuration service provider exposes the same setting under ./User/Vendor/MSFT/Policy/Config/WindowsAI/DisableAIDataAnalysis. Managed devices have Recall disabled and the feature removed by default; the policy is the explicit setting if a managed device administrator wants to confirm the disabled state.

Forensic confirmation. If the master switch is off and the policy is enabled, the Recall database directory (typically under %LOCALAPPDATA%\CoreAIPlatform.00\UKP) should be empty or absent. If snapshots have been captured, the directory contains encrypted files; their content cannot be read without the user's Windows Hello authentication.

The honest qualification: even with Recall disabled, a Copilot+ PC has the capability to enable it at any time. The hardware is on the machine. The software is on the machine. The decision is the user's, or the device administrator's, or in some configurations, anyone who can persuade either of them to enable the toggle.

The Pattern Across the Series

Three episodes in, the pattern is starting to come into focus. The browser layer (Chrome with local AI, Edge with cleartext password vault) and the operating-system layer (Recall) all share a structural quality: a feature was added to the user's machine, the user's consent was obtained or assumed at a level that does not match the depth of the feature, and the architectural boundary that the vendor describes as sufficient does not match the boundary the user reasonably assumed.

In all three cases the company is, by its own terms, correct. Chrome's local model is local; it does not exfiltrate. Edge's password vault is, by Microsoft's threat model, secure against attackers who do not already have administrative access. Recall's enclave is, by Microsoft's threat model, secure against attackers who cannot defeat Windows Hello. In all three cases the user, by their own reasonable terms, was not told that this was the trade-off the vendor was making.

The series is not arguing that the features should not exist. The features are real, useful, and in some respects represent genuine engineering progress. The series is arguing that the consent record and the architecture should match the brief, and that where they do not match, the user should be told.

The looking is the entire point. Once a user has looked, the choice becomes theirs.

Closing

The diary is on every Copilot+ PC shipped today. On most of them, the cover is closed and the diary is empty; the user has chosen not to open it. On a growing number of them, the user has clicked "Yes, save snapshots" without quite understanding that this opens a continuously updating searchable record of everything that has appeared on their screen, kept under a vault door that is genuinely strong and a wall next to the vault that is, on Hagenah's evidence, drywall.

That is an architecture. It is also, by every available record, an opt-in architecture, achieved through public pressure rather than by original design. The history is part of the brief now. The looking, as the series keeps saying, is not difficult. It just has to start.

Read the full article on vivianvoss.net →

By Vivian Voss, System Architect and Software Developer. Follow me on LinkedIn for daily technical writing.

The Architecture You Did Not Design: How AWS' Real Lock-In Lives in IAM, Not Egress

Vivian Voss — Fri, 15 May 2026 09:19:13 +0000

In the Net, Episode 03

In March 2024 AWS announced that it would waive data-egress fees for customers wishing to leave. The press release was elegant, the wording generous, the timing precise: less than two months after the EU Data Act came into force, with its Article 25 obligations on cloud switching, and rather earlier than the moment in January 2027 when the same regulation will prohibit switching charges altogether. Two years on, the egress bill is no longer the largest cost of leaving AWS. The egress bill, in fact, is not even the main reason customers do not leave. The architecture is.

This is the third episode of In the Net: a series on the documented mechanics of vendor lock-in. The premise has not changed. Every platform tells you how to come in. The architecture tells you whether you can leave, what it does with what you build inside it, and how much of what you built belongs to you when you wish to walk out.

The Promise

AWS opened to the public in 2006 with what was, at the time, an unusual proposition. Stop owning racks. Stop running data centres. Rent capacity, and pay for what you use. Two decades later that promise has been kept on its own terms. Startups have shipped products without ever owning a server. Established firms have moved workloads off depreciating hardware on predictable cycles. The cloud has been, by any honest reading, the most productive infrastructure shift of a generation, and AWS has led most of it.

This matters. Lock-in stories are most useful when they begin with the promise that was real, because the architecture which produces the lock-in is not the architecture which produces the value. The value is real. The architecture, taken as a whole, also keeps the customer in a way that is increasingly difficult to characterise as a free choice.

The Hooks

The lock-in lives in three layers. The first is widely discussed. The second is rarely discussed. The third is almost never discussed in the right terms.

The egress layer

In March 2024, AWS published a blog post titled "Free data transfer out to internet when moving out of AWS". The programme is real. The conditions are also real. To qualify, a customer must hold an account in good standing, must have more than 100 GB of data stored in the account, must be moving all of their data off AWS, and must complete the move within 90 days; requests are reviewed at account level, and AWS reserves the right to apply additional scrutiny if the same account applies multiple times.

The European Union's Data Act entered into force on 11 January 2024, became applicable on 12 September 2025, and includes in Article 25 the most far-reaching cloud-switching obligations any major jurisdiction has yet legislated. By 12 January 2027, switching charges of any kind, including data egress charges levied during a switch, will be prohibited for in-scope providers. AWS' programme arrived in the window before the regulator did, with conditions the regulator will not, in fact, permit when the relevant article reaches full force. This is not an accusation of insincerity. It is an observation of timing.

The egress layer is the layer the industry has talked about for fifteen years, the layer Cloudflare campaigned against in 2021, the layer regulators eventually moved on. It is, on the evidence, also the easiest layer to mitigate. The cost of moving 50 TB out of AWS at standard rates is around €4,300 (US$5,000-ish, depending on region and class of transfer); the cost of moving 50 TB across a slow internet pipe is the duration of a few weekends. The egress layer is not, and never was, the reason large customers stay.

The runtime layer

AWS' managed services are the next layer down, and the lock-in here is structural rather than fiscal.

Amazon Aurora is documented as "PostgreSQL-compatible" and "MySQL-compatible". On the wire and at the SQL surface, this is true for the overwhelming majority of standard operations. Beneath the wire, Aurora is its own database. The storage layer is not PostgreSQL's; it is AWS' six-way replicated, log-structured shared-storage fabric. Aurora's Babelfish module accepts Microsoft SQL Server's T-SQL on top of the Aurora engine. Aurora Machine Learning calls SageMaker and Bedrock directly from SQL. Aurora Limitless Database introduces horizontal scaling semantics that have no PostgreSQL equivalent. None of these features ports off Aurora; each of them, once adopted in a production schema, becomes a one-way commitment.

Amazon DynamoDB has no on-prem equivalent. It is sold as a managed NoSQL database, but it is, more precisely, an API and a billing model wrapped around a proprietary key-value store with proprietary indexing semantics, proprietary stream semantics, and proprietary integration with Lambda, EventBridge, S3 and CloudWatch. The closest open-source replacements (Apache Cassandra, ScyllaDB, MongoDB) require non-trivial schema translation, and each has a meaningfully different consistency, availability and operational model. There are migration paths; there is no drop-in equivalent.

AWS Lambda is wired into the ecosystem at the point of event delivery. Lambda functions consume from EventBridge, S3 events, DynamoDB streams, SQS queues, SNS topics, Kinesis streams; they emit to CloudWatch logs and CloudWatch metrics; they are observed by X-Ray. Each of these dependencies is a service-specific protocol with no portable replacement that ships in the box. OpenTelemetry, Prometheus and Grafana exist and work; they are not, however, the path of least resistance inside AWS, and adopting them in addition to the AWS-native instrumentation is a deliberate engineering choice that adds cost in the short term and pays back only at migration time.

The runtime layer is the layer where the lock-in compounds. Each AWS-specific decision is locally rational. The cumulative effect, at the scale of a production estate, is that the workload is no longer a "PostgreSQL workload" or a "Linux workload"; it is an "AWS workload", and the noun matters.

The identity layer

The third layer is the layer most engineering leads underestimate, and it is, on this analysis, the most expensive layer to migrate.

AWS Identity and Access Management is two distinct things. At the level of resources, it is a policy language: a JSON document grammar that grants and denies actions on Amazon Resource Names (ARNs) under specified conditions. At the level of organisations, it is an account model: a hierarchy of accounts, organisational units, service control policies and trust relationships that, taken together, constitute the security perimeter of every workload running on AWS.

Neither half is portable. The policy language is AWS-specific. ARNs are AWS-specific. The account hierarchy is AWS-specific. KMS keys, the cryptographic substrate that secures most of what an enterprise stores on AWS, never leave the service in plaintext (by AWS' own KMS documentation); they cannot be exported, only used through API calls. KMS keys are region-bound; they cannot be shared across regions, let alone across providers. Re-encrypting a large estate with new keys held by a different provider, while keeping data continuously available, is not a 90-day operation.

IAM Identity Center, AWS' successor to AWS SSO, adds another layer: permission sets, assigned through the AWS organisation hierarchy, are translated at session time into IAM roles inside individual accounts. The permission set is the abstraction; the role is the artefact. Migrating off Identity Center means reconstructing the permission set semantics in a different identity system (Keycloak, Zitadel, Authentik, or a commercial product) and then re-grounding every workload's authorisation against the new system. The permission model is not a thousand lines of JSON; it is the encoded security history of an organisation, often built across several years and several reorganisations, and rarely documented outside the JSON itself.

A senior architect priced the migration in EC2 hours. The actual migration is in the permission model, and that took five years to build. The bill for moving compute is the bill the FinOps team will quote. The bill for moving identity is the bill the security team will quote, much later, and quietly.

The Standing

AWS holds approximately 30 per cent of the global cloud infrastructure market in Q1 2026, ahead of Microsoft Azure (around 21 per cent) and Google Cloud (around 13 per cent), according to Synergy Research Group; aggregated estimates from the same period place the Big Three together at around 65 per cent of the market, with the global cloud-infrastructure spend running at about $129 billion for the quarter and a year-on-year growth rate of 35 per cent driven largely by AI workloads. The market is, on any reasonable description, an oligopoly. AWS is the senior partner in that oligopoly.

This matters for the same reason that Adobe's 80 per cent of the creative-software market matters in Episode 01, and the same reason that LinkedIn's billion-plus users matter in Episode 02. The contract is offered from a position. The position determines what kind of contract is offered, and how much leverage the customer has to negotiate any of it. A startup signing an AWS Enterprise Agreement is not negotiating peer-to-peer with the platform that decides whether its product can run.

There is a second observation worth making. The European Union has, in the same eighteen-month window, designated Microsoft, Alphabet, Apple, Amazon, Meta and ByteDance as Digital Markets Act gatekeepers (September 2023, with full obligations from March 2024). Amazon is on the list. AWS' core services, however, are not in scope of the DMA's gatekeeper obligations; the DMA addresses Amazon's marketplace, not Amazon's cloud. The cloud was instead addressed, with some lag, by the Data Act, which applies to a far broader set of providers and is not enforced through the same designation mechanism. The architecture of the cloud sits in a regulatory space that the EU has, in essence, conceded to a sector-specific instrument rather than the DMA's gatekeeper apparatus.

The practical consequence is that AWS' cloud services are obligated to meet switching standards from January 2027, but are not obligated to meet the DMA's interoperability or data-portability standards in the same way that, say, Microsoft Windows or Apple iOS now are. The lock-in mechanisms described above continue under the Data Act regime; the Data Act provides switching rights, not interoperability rights. The customer can leave, eventually, after a notice period and a transitional period; the customer cannot, however, demand that AWS' API surface be replicated on a competitor's infrastructure.

How the User Is Treated

The Würde-Verhältnis, the dignity dimension this series tracks, has a quieter shape on AWS than it had on Adobe or LinkedIn. AWS does not, in the main, scrape its customers' workloads for AI training. AWS does not, by default, repurpose customer data. The AWS Customer Agreement is, by the standards of the platforms this series has previously examined, restrained.

What it does instead is shape the entire interaction around the assumption that the customer has chosen, and continues to choose, the AWS architecture. The Free Data Transfer Out For Leaving programme is structured as an exception to the default, granted at AWS' discretion, with conditions ("more than 100 GB", "all data", "90 days", "account-level review", "additional scrutiny on repeated applications") that read more like a creditor's conditions on a workout than a vendor's facilitation of a switch. The customer who wishes to leave is, by the structure of the programme, treated as a customer asking for a concession.

The Data Act, when it reaches full force in January 2027, will remove this framing. Until then, the framing is the architecture.

The Exit That Isn't

A customer can leave AWS. By the time this sentence ends, the customer will have understood several things they did not understand at the start.

The customer can move the data. Free Data Transfer Out For Leaving will, in most cases, be granted. The data will arrive on the receiving infrastructure, in a reasonable amount of time, at no charge. This is the easy part.

The customer cannot move the permission model. IAM policies are not portable; ARNs do not resolve outside AWS; KMS keys do not export; Identity Center permission sets have no direct equivalent on competitor platforms. Reconstructing the authorisation surface on a new identity provider is a multi-month engineering exercise that touches every workload.

The customer cannot move the managed-services semantics. Aurora's AWS-specific features must be rewritten or removed. DynamoDB schemas must be re-modelled for Cassandra, ScyllaDB or MongoDB. Lambda functions must be re-hosted, often as containers, with their EventBridge, SQS and SNS dependencies replaced by Kafka, NATS or a workflow engine. CloudWatch monitoring must be re-built on OpenTelemetry, Prometheus and Grafana. Each of these is tractable; in aggregate, on a 50-application estate, the work is 8 to 12 months and €1M to €3M of effort by independent industry averages.

This is a Lock-in by design. Not in the sense that someone in a boardroom said "let us trap our users". In the sense that the architecture, taken as a whole, produces the outcome that customers do not leave even when the cost of staying has, on their own honest accounting, exceeded the value of staying. That is what an architecture is: the outcome the structure makes likely, regardless of intent.

The Price

The price of staying is the AWS bill, the trajectory of which is documented quarterly. The Big Three's combined revenue grew 35 per cent year-on-year in Q1 2026, and the customer's share of that growth, on any given account, tracks broadly with the customer's adoption of AWS-specific services. The price of staying is also the opportunity cost of the architecture choices the customer would have made on a more portable substrate.

The price of leaving is the migration. By industry averages, a 50-application enterprise migration runs around €1.2M and 8 to 12 months; large waves with 100-plus applications run €1M to €3M and span longer. Optimised post-migration run-rates are 20 to 35 per cent lower than the pre-migration cloud bill; FinOps reports also document persistent 20 to 30 per cent waste in unoptimised AWS estates.

37signals, the maker of Basecamp and HEY, published a detailed account of its AWS departure across 2023 and 2024. The company's annual AWS spend, on its own figures, was around $3.2 million in 2022, fell to "well under a million" on-prem in 2024, and the company reported approximately $2 million in annual savings ongoing. Hardware spend, around $700,000 on Dell servers and around $1.5 million on Pure Storage for 18 petabytes, was recouped inside one year (afaik, per The Register and DCD reporting, 2024–2025). This is one case; it is also a public, documented, mid-scale case, and the kind of evidence the Würde-Verhältnis-balance shifts when it accumulates.

Twenty-one per cent of cloud workloads have been repatriated, on the Flexera 2025 State of the Cloud report. This is not a wholesale reversal of cloud adoption; it is a documented re-weighting of workloads back to private or hybrid infrastructure where the economics, latency, or data-gravity warrant it. The repatriation trend is the empirical answer to the question of whether the AWS architecture can be left. It can, for the customers who choose to.

The Escape Route

The escape route from AWS is not a single product. It is an architectural posture: build identity, policy, and the data layer on portable substrates from the beginning, and treat each managed AWS service as a deliberate, named, scoped exception to portability.

Identity, portable from day one. Keycloak is the established open-source identity provider, Java-based, with SAML, OIDC and SCIM out of the box; it is heavy to operate and widely deployed. Authentik is a more recent, lighter alternative, with a cleaner administrative interface. Zitadel is a Go-based, event-sourced alternative built for multi-tenant SaaS. All three speak the standards a modern enterprise identity provider must speak (OIDC, SAML 2.0, SCIM); all three can sit in front of AWS IAM and any other cloud's IAM, so that the customer's permission model is, from the start, defined in a system the customer owns.

Policy as code, vendor-neutral. Open Policy Agent (OPA), a CNCF graduated project, is a general-purpose policy engine with a declarative policy language (Rego) and integration with Kubernetes, service meshes, application admission controllers and infrastructure provisioning. Crossplane, also CNCF, is a Kubernetes-native control plane for provisioning cloud resources across multiple providers through a consistent declarative interface. OPA plus Crossplane is the closest the open ecosystem has come to a portable replacement for AWS-specific IaC plus IAM, and the combination scales to the kind of estate where the question is meaningful.

EU IaaS for sovereignty. Hetzner, OVHcloud, Scaleway and IONOS are the established EU-native infrastructure providers with no US parent and no CLOUD Act exposure. An independent Callista benchmark in February 2026 found Hetzner delivering approximately 14.3 times AWS' value-per-compute-unit and Scaleway approximately 4.8 times; Scaleway publishes free egress where AWS bills it. OVHcloud has the widest service range and the largest European data-centre footprint. None of these providers matches AWS' managed-services depth; all of them match AWS for compute, storage, networking, and the basic primitives an honest production workload needs.

Open-source data services. PostgreSQL replaces Aurora on the relational side, with the engine actively maintained by the community and used at multi-terabyte scale in production. Apache Cassandra and ScyllaDB replace DynamoDB on the wide-column side. MongoDB replaces it on the document side. MinIO and Garage replace S3 on the object-storage side, with the S3 API as a de-facto portability surface that ironically AWS itself does not control. PostgreSQL plus one of the wide-column or document stores covers most production data-layer workloads with no managed-service dependency on any single cloud.

Repatriation as a documented option. 37signals' case is not a recommendation that every workload return on-prem. It is evidence that the option exists and produces measurable results when chosen with engineering and operational discipline. The Flexera figure of 21 per cent repatriated workloads is the broader signal: cloud is the right answer for a great many workloads, and not the right answer for all of them.

Coda

The bill arrives every month. The data the customer has uploaded is portable as of March 2024, and will be free to move from January 2027. The architecture the customer has built inside AWS, taken whole, is not.

The cloud opened in 2006 with a promise to remove the racks. It removed the racks. It built, in their place, a permission model, a data layer and an identity layer that constitute the customer's actual production architecture. The architecture is, on this evidence, the larger part of what the customer pays for. The architecture is also the part the customer cannot fully take with them.

You can take your data with you. The architecture stays behind. The architecture is, on this evidence, the more expensive part.

Read the full article on vivianvoss.net →

By Vivian Voss, System Architect and Software Developer. Follow me on LinkedIn for daily technical writing.

npm Is on Fire: Why the Architecture Is the Product

Vivian Voss — Thu, 14 May 2026 09:06:46 +0000

Wire Fire: Episode 01

The Permanent State

npm (the open registry that nearly every JavaScript project on Earth depends on) has been under permanent attack for years. This is not a recent shift in adversary attention. It is a slow, observed, well-documented escalation that the ecosystem has not architecturally answered.

The headline number: in 2025 alone, 454,648 malicious packages were published to the npm registry. Over 99 percent of all open-source malware now targets npm. The remaining 1 percent covers every other registry combined (PyPI, RubyGems, Maven Central, NuGet, Cargo, Composer).

If you have ever installed a JavaScript dependency, you have participated in an ecosystem whose security model is, in the most polite possible terms, an act of structural optimism.

This post is a Wire Fire sitrep, the first episode of a new series for active security incidents. It covers the six weeks between 31 March and 14 May 2026, and places that evidence inside the larger structural story it belongs to.

Six Weeks of Wire

Four named incidents hit the registry in this window. Each is documented, each is attributable, and each demonstrates the same failure: there is no brake in the pipeline, and there never was one.

31 March 2026: the axios compromise

A North Korean state-sponsored group, tracked by Microsoft as Sapphire Sleet (Google Mandiant identifies the cluster as UNC1069), published two malicious versions of axios to the npm registry: 1.14.1 and 0.30.4. Both were tagged "latest", the default release channel that the majority of consumers follow without thinking.

The mechanism was indirect. The malicious versions declared a poisoned dependency, plain-crypto-js@4.2.1, whose postinstall lifecycle hook downloaded and executed a cross-platform remote-access trojan on the host machine. macOS, Windows and Linux were all targeted by the same payload, fetched at install time from attacker-controlled infrastructure.

axios receives roughly 100 million weekly downloads. The malicious versions were live for approximately three hours before being pulled. Three hours of "latest", for a library at that scale, is enough to compromise a meaningful percentage of every CI build, every developer workstation, and every container build that touched a fresh npm install in that window.

CISA issued a cybersecurity alert on 20 April 2026. Microsoft published a detailed analysis on 1 April 2026.

29 April 2026: Mini Shai-Hulud hits the SAP namespace

A worm variant tracked as "Mini Shai-Hulud" (a smaller, more targeted descendant of the original Shai-Hulud worm campaigns of 2025) infected four SAP-related npm packages. The payload was credential-harvesting: tokens, environment variables, SSH keys, anything reachable from the compromised install host. The packages were used in SAP-adjacent JavaScript tooling, narrow in audience but deep in privilege.

11 May 2026: TanStack and the OIDC chain

At 19:20 UTC on Sunday 11 May, 84 malicious versions of 42 packages from the @tanstack/* namespace were published in a six-minute window. TanStack is a popular JavaScript library suite; @tanstack/react-router alone has roughly 12 million weekly downloads.

Within 48 hours the campaign expanded to 172 packages and 403 versions, across both npm and the Python registry PyPI. Cumulative downloads of the compromised packages amounted to roughly 518 million over the affected version windows. Other namespaces hit in the same wave: @uipath/*, @mistralai/mistralai, OpenSearch, Guardrails AI.

The technical mechanism is worth looking at closely, because it explains how a single compromise becomes mass-published code.

12 May 2026: the worm goes public

On Monday 12 May, security research collective vx-underground reported that the full Shai-Hulud worm source code had been published openly. The attack toolchain (cache poisoning, OIDC extraction, provenance-attested publishing under stolen identities) is no longer the exclusive property of any single threat actor. It is off the shelf.

This is the watershed of the six weeks. Before 12 May, the worm was attributable to specific groups (TeamPCP, with documented overlap to nation-state actors). After 12 May, anyone with a misconfigured CI/CD pipeline and the will to use it can deploy the same toolchain against any package with similar exposure.

Shai-Hulud, Generation Four

The name is a Dune reference. Frank Herbert's giant sandworms, the predators of the desert planet Arrakis, are called by the Fremen "Shai-Hulud" ("the Old Man of the Desert"). The first npm worm to bear the name appeared in September 2025; it has hit the ecosystem in four waves to date.

Wave	Date	Scope
1	September 2025	~500 packages compromised in the first self-replicating wave
2	November 2025	"Shai-Hulud 2.0"; expanded scope, ~25,000 GitHub repositories affected
3	December 2025	Modified payload test, narrower scope
4	April–May 2026	Mini Shai-Hulud variant, SAP namespace, TanStack wave, source-code release

Each wave taught the attackers something the previous wave did not. By wave four, the operation could publish provenance-attested packages from stolen identities, scale across CI/CD environments in seconds, and survive into legitimate downstream builds via cache poisoning.

The waves are no longer waves. They are weather.

The Mechanism in Plain Words

Modern software is built on trust between strangers. When you install a JavaScript package, you do not install one thing; you install whatever that package declares as its dependencies, plus whatever those declare, plus whatever those declare. The average modern JavaScript project ends up with over a thousand transitive contributors no one on the team has ever met.

Every install runs whatever code the package author wrote, with the user's full operating-system permissions. The npm postinstall lifecycle hook is, by design, a script that runs arbitrary code on the install host immediately after a package is unpacked. It exists for legitimate reasons (compiling native extensions, fetching platform-specific binaries), and for that reason it cannot simply be removed without breaking large parts of the ecosystem.

This is the structural attack surface. Compromise one developer account anywhere in the dependency tree, and the worm can:

Steal credentials from every machine that runs the next install.
Use those credentials to publish new poisoned versions under the stolen identity.
Propagate to every project that depends on those packages.
Move on.

The 11 May TanStack wave automated this entire cycle. The technical chain, for specialists, exploited three weaknesses in GitHub's standard build service that combine to devastating effect:

# Sketch of the misconfiguration class the worm exploited
on:
  pull_request_target:
    types: [opened, synchronize]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          ref: ${{ github.event.pull_request.head.ref }}
      - run: pnpm install

The pull_request_target event executes the workflow in the base repository's security context, including its secrets, even when the pull request originates from a fork. Combined with the GitHub Actions cache, which is shared across the fork-base trust boundary, a poisoned pnpm store survives into the legitimate build of the next merge. The OIDC token the runner uses for npm publishing sits in process memory long enough to be extracted at runtime.

Combine the three weaknesses and you can publish provenance-attested packages from a stolen identity, on someone else's CI infrastructure. Provenance attestation, the supply-chain integrity feature npm introduced to address exactly this class of attack, gets converted into the attack vector itself.

What This Means If You Are Using npm

The response depends on your role.

If you are a developer

If you or your team installed any of the following packages since 31 March 2026:

axios versions 1.14.1 or 0.30.4
Any package in @tanstack/*
Any package in @uipath/*
@mistralai/mistralai
Any SAP-related npm package

then assume the install host is compromised. Rotate every credential reachable from that machine: cloud API tokens, SSH keys, npm publish tokens, GitHub PATs, browser-stored secrets, the lot. Downgrade the affected packages to a known-clean version. On FreeBSD, check VuXML for the current advisory status of the ports you maintain.

For new installs, in order of impact:

Set npm config set ignore-scripts true globally. This stops postinstall and other lifecycle hooks from running arbitrary code at install time. It will break a small number of packages that genuinely need install-time compilation; those can be opted back in case by case.
Pin transitive dependencies via a lockfile and review changes to that lockfile in pull requests as if they were code, because they are.
Run untrusted installs in isolation. On FreeBSD, a jail is the right primitive. On other systems, an unprivileged container or a virtual machine is the equivalent.
Treat any package with more than a hundred transitive maintainers as if you were accepting code from a hundred strangers, because you are.

If you are responsible for operations or security

The lockfile is now an audit artefact. Diff it on every change. Build environments that touch npm should be ephemeral, isolated, and incapable of reaching production credentials. The OIDC token model needs adjustment: scope publish tokens to specific packages, rotate aggressively, and never let an interactive CI step coexist with publish credentials in the same job.

Add pkg audit (FreeBSD) or the equivalent vulnerability-feed query for your platform to the CI pipeline. A daily diff against the appropriate VuXML / OSV / GHSA feed is a small habit with disproportionate value.

If you are deciding on a stack

This is the harder section to write without sounding partisan, so the structural verdict will do the work. npm is not safe by default. The architecture is the product. Every install runs arbitrary code with user permissions. The registry signs nothing the consumer is required to check. Five years of public incidents and a public worm have not changed any of this, because changing it would require breaking the ecosystem the product is built on.

Budget for that. Either pay the runtime cost of isolation, the ongoing cost of audit, and the human cost of credential rotation hygiene; or pick a registry that takes its security architecture seriously by design. There are no other choices that survive contact with the four waves of evidence above.

A FreeBSD Sidebar

FreeBSD's pkg system answers a different question than npm does. Where npm optimises for distribution speed and contributor accessibility, pkg optimises for review and auditability.

Each FreeBSD port has a named human maintainer. Every change to a port is reviewed by a committer before merge. Every binary package is built reproducibly from a signed Makefile recipe in the Ports tree. The pkg client verifies a signature on the package set it downloads. The whole pipeline is slow on purpose and boring on purpose, and that is what a supply chain looks like when the design assumes some packages will, eventually, try to bite.

For developers who must use npm (for example, working on a JavaScript-heavy codebase on a FreeBSD workstation), three layers are available without leaving the base system:

Jails isolate npm install processes from the host filesystem and from other jails. A development jail per project caps the blast radius of any one compromised install. A simple base recipe:

   pkg install jq node npm
   service jail enable
   # /etc/jail.conf entry for a per-project jail

Capsicum provides capability-mode sandboxing at the syscall level, available to processes that opt in. Wrapping a high-risk install in a capsicum-restricted shell is a meaningful additional layer.
VuXML is FreeBSD's per-port vulnerability database, queryable via pkg audit. Ports affected by upstream advisories surface immediately:

   pkg audit -F

None of these replace the structural problem npm itself presents. They reduce the blast radius of the consequences when (not if) the next wave lands.

The Architecture Is the Product

npm was designed for trust by default. Anyone can publish a package. Anyone who installs that package runs whatever code the author wrote. Anyone who depends on that package, transitively or directly, inherits that risk. The registry signs nothing the user must check. Provenance attestation is an opt-in feature that, in the 11 May wave, was used by the attackers themselves to make poisoned packages look more trustworthy than the clean ones.

Each of these decisions was individually defensible in 2010, when the ecosystem was small enough that maintainer reputation could carry the weight. Each was a deliberate trade-off in favour of distribution speed. The combined result is, in 2026, the largest unsigned code-execution surface ever assembled.

For any environment that takes its own security seriously, npm is simply not fit for purpose. The architecture is the product. One phished maintainer, one over-permissioned token, one approved bot pull request: each is enough, the mistake travels at machine speed, and there is no brake in the pipeline. There never was one.

This week's wave was a campaign. The fire underneath has been the weather for years, and the forecast does not change.

Sources

CISA Cybersecurity Advisories: Supply Chain Compromise Impacts Axios npm (20 April 2026): cisa.gov/news-events/alerts/2026/04/20/supply-chain-compromise-impacts-axios-node-package-manager
Microsoft Security Blog: Mitigating the Axios npm Supply Chain Compromise (1 April 2026): microsoft.com/security/blog/2026/04/01/mitigating-the-axios-npm-supply-chain-compromise
Google Cloud Threat Intelligence (Mandiant): UNC1069 activity clusters
Palo Alto Unit 42: npm Supply Chain Attack Monitoring (ongoing): unit42.paloaltonetworks.com/monitoring-npm-supply-chain-attacks
Wiz: Mini Shai-Hulud TanStack Compromise (12 May 2026): wiz.io/blog/mini-shai-hulud-strikes-again-tanstack-more-npm-packages-compromised
Snyk: TanStack npm Compromise Postmortem: snyk.io/blog/tanstack-npm-packages-compromised
StepSecurity: Mini Shai-Hulud Self-Spreading Supply Chain Attack (12 May 2026): stepsecurity.io/blog/mini-shai-hulud-is-back-a-self-spreading-supply-chain-attack-hits-the-npm-ecosystem
Aikido: TanStack Compromise Technical Breakdown: aikido.dev/blog/mini-shai-hulud-is-back-tanstack-compromised
vx-underground: Shai-Hulud source-code disclosure (12 May 2026)
Sonatype State of the Software Supply Chain Report 2025 (454,648 malicious npm packages, >99% of OSS malware on npm)
FreeBSD VuXML: vuxml.freebsd.org/freebsd/index.xml

By Vivian Voss, System Architect & Software Developer. Follow me on LinkedIn for daily technical writing.

The Regex That Ran Unbounded

Vivian Voss — Wed, 13 May 2026 08:42:59 +0000

Tales from the Bare Metal — Episode 03

« Thou shalt not let a regex run unbounded anywhere! »

13:42 UTC on Tuesday, 2 July 2019. A Cloudflare engineer deploys a single new managed WAF rule. Within seconds, every Cloudflare server in the world is at 100% CPU and incoming HTTP traffic stops moving. Twenty-seven minutes later, with traffic restored, the engineers will find that the rule was a regex of forty-five characters.

This is the story of how a regular expression that compiles cleanly, passes its tests, deploys via a globally distributed key-value store in seconds, and is enabled in simulation mode rather than blocking mode, takes down a sizeable fraction of the public internet for the duration of a coffee break. The interesting part is not the regex. The interesting part is the architecture that gave the regex unrestricted access to every CPU core on the network.

The Incident, in UTC

13:42 Engineer deploys the new managed WAF rule. Automatic deployment via Quicksilver, Cloudflare's distributed key-value store, propagates the change worldwide in seconds.
13:45 First PagerDuty alert: synthetic WAF test fails.
14:00 WAF identified as the root cause of the global CPU saturation.
14:02 A "global terminate" of the WAF Managed Rulesets is proposed.
14:07 Global terminate executed.
14:09 Traffic and CPU return to normal worldwide.
14:52 WAF re-enabled globally after the offending rule is rolled back and the change is tested.

Twenty-seven minutes from rule push to traffic restoration. Two further hours before the WAF Managed Rulesets are fully re-enabled.

The Diagnosis

The offending rule was a managed XSS-detection regular expression in Cloudflare's WAF. Its full form is a forty-five-character pattern; the catastrophic part is one sub-expression:

(?:.*=.*)

Reduced to its essence:

.*.*=.*

Two greedy quantifiers (.*) followed by a literal (=) followed by another greedy quantifier. When the engine matches this against a long input that does not contain =, it has to try every possible way of splitting the input between the two .* quantifiers before concluding failure. For an input of length n, there are O(n²) possible splits to try. For longer inputs against more complex variants of this pattern, the worst case escalates rapidly. This is the classic shape of catastrophic backtracking, a failure mode known under the acronym ReDoS (regular expression denial of service).

Cloudflare's WAF was implemented in Lua, running inside nginx (the OpenResty stack), with PCRE (Perl Compatible Regular Expressions) doing the actual regex matching. PCRE evaluates patterns by backtracking and has no built-in runtime budget; once a match attempt begins, it runs until it succeeds or exhausts the search space.

On a single host, a single pathological input produces a single CPU-bound match attempt. Multiplied by every HTTP request that hits the WAF, multiplied by every CPU core on every Cloudflare edge server worldwide, this is exactly the global saturation that took the network down.

The Context

Three systemic conditions shaped the afternoon.

The author was solving for coverage, not performance. The rule was new; the goal was to catch a wider class of XSS payloads. The test harness verified that the rule matched the patterns it was supposed to match and did not match the patterns it was supposed to ignore. CPU-time profiling on pathological inputs was not part of the harness, and Cloudflare's postmortem notes explicitly that previous WAF builds had not logged any CPU regression of the kind that would have flagged the new rule. The rule looked fine to every gatekeeper between author and edge.

The engine had no runaway protection. PCRE-the-engine has no built-in CPU budget; this is a property of the algorithm class (backtracking matchers cannot bound their own runtime in general). Cloudflare had previously had a CPU-usage protection mechanism in the WAF, but it had been removed for unrelated reasons and was never reinstated. The combination of an unbounded engine and a removed wrapper-level guard meant the runtime budget was, in practice, infinite.

The deployment path had no staged rollout for managed rules. Quicksilver pushes changes globally in seconds. This is a feature for emergency kill switches: when something has to be turned off everywhere at once, seconds matter. For ordinary rule changes, it is a feature that converts a local bug into a global outage. There was no canary stage, no per-data-centre soak time, no automated rollback on CPU regression. And the dashboard and API that customers and operators would have used to disable the rule manually ran on the same Cloudflare edge network; once the edge was saturated, the control plane was unreachable too.

The pattern across the three is familiar: each system was reasonable on its own. Quicksilver was excellent for what it was designed for. PCRE was an industry-standard regex engine. The XSS rule was a reasonable response to a known attack class. The failure mode lived in the spaces between them, where no single component owned the question "what is the worst case here?"

The Principle

The unixoid response to "this code path could run forever" is to give it a budget that the kernel enforces. The detailed answer has three parts.

Use an engine with runtime guarantees. The linear-time regex engines — Google's re2 and the Rust regex crate are the two production-grade examples — both refuse at compile time to accept patterns whose worst case is exponential, and both run any pattern they do accept in time proportional to the input. Cloudflare's preventive measures, announced in the postmortem, include switching the WAF from PCRE to either re2 or Rust regex. The linear-time class was first published as a theoretical result by Ken Thompson in 1968 and has been available in production form since RE2 was open-sourced in 2010. It is not new technology; it is technology that has not yet displaced its older sibling everywhere it should.

Where the engine cannot be changed, cap it from outside. On FreeBSD, rctl(8) enforces resource limits on processes, jails and users: CPU time, memory, file descriptors, and several others. A regex worker run inside a jail with rctl -a jail:waf:cputime:deny=200ms cannot eat more than 200 milliseconds of CPU per process before the kernel terminates it. On any Unix, ulimit -t in the parent shell or a separate worker process per request with a kill-after timeout provides a coarser version of the same guarantee. The principle is older than software: the operator pays for an unbounded computation; the kernel is the only place that can refuse on the operator's behalf.

Treat global propagation as the kill switch, not the default. Quicksilver propagating worldwide in seconds is the right tool for "turn this off everywhere" or "patch the active CVE". It is the wrong default for "deploy a new rule": the cost of a slow rollout (minutes per region) is the upper bound on the blast radius of a bug. Cloudflare's SOP changes announced after the outage include a staged rollout path for managed rules while retaining the emergency global lane. Two paths, two purposes, two SLAs.

The FreeBSD context is worth pausing on. The BSD camp has been arguing for runtime-bounded execution at the kernel boundary for decades: rctl for CPU and memory caps, jails for blast-radius containment, Capsicum for capability-level sandboxing, the Unix-traditional discipline of one job per process with a clean exit path. None of this needed to be invented in 2019. It needed to be applied at the point where the WAF was making decisions in the request hot path. The Cloudflare team has applied the equivalent answers since; the lesson, for the rest of the industry, is that the equivalent answers exist and have been quietly proven for a long time.

Where It Travels

Cloudflare runs the WAF, but the pattern is everywhere a regex meets user input:

Every Web Application Firewall. Cloudflare, AWS WAF, F5, ModSecurity, custom-built rules.
Every intrusion detection or prevention system. Suricata, Zeek, Snort all run rule sets containing regex against packet payloads or extracted strings.
Every centralised logging stack. Splunk, Elastic, Datadog and similar systems regex inbound log lines for parsing and routing. A pathological message can melt the parsing tier.
Every JSON or GraphQL schema validator that supports pattern keywords. A schema author can write a ReDoS into an API definition without realising it.
Every text-classification or content-moderation pipeline that runs regex on user-submitted content as a first-pass filter.
Every Python re application on a streaming source, where one bad input can stall the whole stream.
Every dashboard that runs on the platform it monitors. This is not the regex lesson; it is its travelling companion. A status page, a kill-switch console or a deployment dashboard that depends on the same infrastructure it controls will be unreachable exactly when it is needed.

The engine did exactly what the regex asked for. The architecture decided what exactly meant.

Read the full article on vivianvoss.net →

By Vivian Voss — System Architect & Software Developer. Follow me on LinkedIn for daily technical writing.

tcpdump

Vivian Voss — Tue, 12 May 2026 06:44:40 +0000

Technical Beauty — Episode 35

You have typed tcpdump -ni em0 'tcp port 443' at three in the morning, with a customer on the phone, and watched the lines scroll past in a small green miracle. The command was unremarkable. The thing that made it possible to type at three in the morning has been quietly doing the work for thirty-seven years.

The point of this episode is not that tcpdump is a useful command, which one rather hopes the reader already knew. The point is that tcpdump is a thin tool sitting on a thoughtfully layered architecture, and the layering is the technical beauty. The thing the network engineer types is the surface. The thing that makes the surface work has not needed to be replaced since 1992.

A Short Origin Story

tcpdump was written in 1988 by Van Jacobson, Craig Leres and Steven McCanne at the Network Research Group of Lawrence Berkeley Laboratory. The same Van Jacobson who, the year before, had published "Congestion Avoidance and Control" at SIGCOMM '88, the paper that fixed the Internet by introducing slow start, fast retransmit, and the congestion-window heuristics that keep TCP from collapsing under load. tcpdump came out of the same research programme: in order to understand what TCP was actually doing on the wire, the researchers needed a tool that could watch it, and the existing tools (Sun's NIT among them) were not up to the task.

The first version of tcpdump borrowed code from Sun's etherfind and was rewritten over time by McCanne, who eventually replaced the underlying packet-capture mechanism with something new. That something new became the BSD Packet Filter.

The BSD Packet Filter

In December 1992, McCanne and Jacobson wrote a paper titled "The BSD Packet Filter: A New Architecture for User-level Packet Capture". It was presented at the USENIX Winter 1993 Conference in San Diego, where it won the Best Student Paper award. It is, by any measure, one of the most quietly consequential systems papers of the 1990s.

The problem the paper addressed was simple: packet capture had been done in userspace, with the kernel copying every packet across the system-call boundary and userspace then filtering it. On busy networks, this was crushingly expensive: most of the packets would be discarded by the userspace filter anyway, and the kernel had paid the cost of copying them in vain. The previous architecture, Sun's Network Interface Tap (NIT), used a stack-based filter expression that the kernel evaluated, but the evaluator was slow.

BPF replaced the stack-based evaluator with a register-based pseudo-machine: a tiny instruction set running on a few virtual registers, optimised for predicate evaluation. A tcpdump filter expression like tcp port 443 is compiled by libpcap, in userspace, into a short BPF program. The program is uploaded to the kernel once, at the start of the capture session, and the kernel then runs it on every packet arriving on the chosen interface. Packets that pass the filter are copied to userspace; packets that fail are dropped on the floor. The paper measured BPF as up to twenty times faster than its predecessor's filter, and up to a hundred times faster than NIT overall on the same hardware.

The architecture is unusual for a few reasons. The pseudo-machine is small, simple and verifiable: a kernel can inspect a BPF program before running it and reject anything that loops or reaches outside its bounds. The filter compiler lives in userspace, where it can be replaced or improved without touching the kernel. The boundary between userspace and kernel runs through the filter, not around it, which keeps both halves small.

libpcap, the Library That Ate the World

Around the same time, the LBL group extracted the packet-capture interface from tcpdump into a separate library: libpcap. The library handles the platform-specific business of opening a capture device, compiling filter expressions into BPF programs, reading packets back from the kernel, and parsing the capture-file format. On FreeBSD, OpenBSD, NetBSD and macOS it uses the BPF device directly. On Linux it uses PF_PACKET sockets with classic BPF filtering, presenting the same API to userland.

The decision to factor libpcap out of tcpdump is the move that made the rest of the story possible. By 1995 or so, every serious network tool that wanted to capture packets was using libpcap instead of writing its own capture code. The list grew steadily: tshark and Wireshark, Zeek (formerly Bro, originally from LBL too), snort and suricata for intrusion detection, nmap for port scanning, ngrep and dsniff, the entire pcap-based observability ecosystem from dumpcap to commercial network forensics tools.

The library's pcap capture-file format became the de-facto standard for sharing packet captures between tools. A .pcap file recorded by tcpdump on a FreeBSD machine in 1996 can still be opened by Wireshark on a Linux laptop in 2026. Few file formats from the early 1990s have aged as well.

On FreeBSD

tcpdump and libpcap live in the FreeBSD base system; the kernel provides BPF under /dev/bpfN. No ports, no packages, no plugin model. Capturing packets on a fresh FreeBSD installation is as immediate as tcpdump -ni em0. The same is true on OpenBSD and NetBSD.

On a FreeBSD host, the relationship between the tool, the library and the kernel device is openly inspectable. The userland source for tcpdump lives under /usr/src/contrib/tcpdump. libpcap lives under /usr/src/contrib/libpcap. The BPF kernel device is in sys/net/bpf.c. The whole stack, from the filter language the operator types to the in-kernel pseudo-machine that evaluates it, is one repository away from any administrator who wants to understand what is happening on their machine.

The Linux Branch

Linux uses a different capture mechanism (PF_PACKET sockets rather than a BPF device), but the filter language is classic BPF: the same register-based pseudo-machine McCanne and Jacobson published in 1993, structurally borrowed and explicitly credited as such in the kernel documentation as "Linux Socket Filtering". libpcap on Linux compiles filter expressions to classic BPF programs, just as it does on the BSDs; the kernel runs them on the capture path.

In 2014, Alexei Starovoitov extended this mechanism into what is now called eBPF: a wider register file, new instructions, a verifier in the kernel, and the ability to attach BPF programs to kernel events well beyond packet capture (system calls, tracepoints, kprobes). eBPF has since grown into a general kernel-extension mechanism: bcc, bpftrace, the entire Cilium networking stack, much of modern container observability, security policies, scheduler hooks. Packet filtering, where the story began, is now one of many uses. The lineage still runs back to the same 1993 paper.

A small but important detail: libpcap on Linux does not currently generate eBPF code. It generates classic BPF and lets the kernel's classic-to-eBPF translator handle the conversion. The eBPF runtime hosts the filter; classic BPF is still the language libpcap speaks.

It is worth pausing on this. A two-page architecture decision made by two researchers at LBL in 1992, originally to make tcpdump fast enough to capture packets on a busy Ethernet, is the foundation of more or less every modern observability story being written in 2026. The thing the engineer types still looks the same. The cleverness underneath has been quietly extended and re-extended for three decades.

The Point

The technical beauty of tcpdump is not the tool. The tool is small. The technical beauty is the way the responsibilities are split.

The user types a filter expression and reads a one-line summary per packet, because that is what the operator's brain can hold at three in the morning. The library compiles the expression and parses the bytes, because that is the bit where the format details live. The kernel runs the filter and drops everything else, because that is the bit where performance matters. Each layer does the one thing it is in the best position to do, and the boundaries between them are clean enough that the layers can be replaced independently. eBPF extended the kernel runtime on Linux; the others stayed where they were.

A man who had also given the world TCP congestion control wrote, with two LBL colleagues, the tool every network engineer reaches for first and the architecture every serious network tool sits on. One rather suspects he knew what the kernel was actually for.

Read the full article on vivianvoss.net →

By Vivian Voss — System Architect & Software Developer. Follow me on LinkedIn for daily technical writing.

netstat vs ss

Vivian Voss — Mon, 11 May 2026 10:40:47 +0000

The Unix Way — Episode 16

On a busy Linux load balancer one types netstat -anp and the prompt is in no hurry to come back. On FreeBSD the same workload returns at once. Both tools speak Unix text. Both pipe into grep, awk and sort without fuss. This is not a story about Linux being slow. It is a story about where the Unix text interface lives (in the tool, or in a file the tool must read), and what that choice costs a decade later.

McIlroy's principle, the one that gets quoted at every Unix lecture, says: text streams are the universal interface. The Unix Way is to build tools that answer questions in text, and let pipelines compose them on the fly. The interface lives in the tool's mouth: you ask netstat, it speaks; you pipe its answer into grep or awk, that tool speaks; you compose. The text exists where one program hands its output to another or to the human at the keyboard, and not before.

FreeBSD took that principle and built netstat accordingly. The tool asks the kernel through sysctl, formats one answer at its output, pipes onward. KISS in the classical sense. Linux took the same principle and added something McIlroy never asked for: a file that looks like a text interface and isn't quite one. /proc/net/tcp is a kernel-rendered ASCII dump. It is readable, in the sense that one can cat it. It is not askable: there is no way to query it for "sockets on port 443" or "all UDP listeners". One can only read it whole. So the tool behind it still does all the work, plus the parsing of someone else's text on the way in.

A Short History of netstat

netstat was part of the 4.2BSD networking release of August 1983. The CSRG team at Berkeley, working with the new TCP/IP stack that Bill Joy and Sam Leffler had brought together, needed a tool that could ask the kernel about its sockets, its routing table, its interfaces and its protocol statistics. The result was netstat(1). The command-line vocabulary it established (-a for all sockets, -n for numeric output, -r for the routing table, -i for interfaces, -s for protocol statistics) became part of the shared idiom of every Unix administrator for the next several decades.

For most of its history, netstat was a thin Unix tool: ask the kernel, format the answer, send it to standard output. On BSD it asked through sysctl(3) and kvm(3): interfaces designed to be queried by a tool. On early Linux it asked through /proc/net: files designed to be read whole, in ASCII, by humans and tools alike. The format on the screen looked the same. The shape of the interface underneath did not. The first is a tool asking another tool a question. The second is a tool reading a file that someone else has already laid out and which one cannot, in any meaningful sense, ask.

FreeBSD: One Tool, Still in Base

On FreeBSD in 2026, the answer to "how do I list all the established TCP connections on this host" is the same as the answer in 1996, in 2006, and in 2016:

netstat -an

To get a specific protocol family:

netstat -anp tcp
netstat -anp udp
netstat -an -f unix

For the routing table, the same convention:

netstat -rn

For protocol statistics, the same again:

netstat -s
netstat -s -p tcp

The tool is part of the base system. The man page lives at /usr/share/man/man1/netstat.1.gz. The source lives under /usr/src/usr.bin/netstat/. It is updated as part of the FreeBSD release cycle, by the people who also maintain the kernel that produces the data.

The reason it is still fast on a busy server is the interface underneath. The FreeBSD kernel exposes its in-kernel protocol control blocks through the sysctl tree:

net.inet.tcp.pcblist (TCP PCBs)
net.inet.udp.pcblist (UDP PCBs)
net.inet.raw.pcblist (raw sockets)
net.local.stream.pcblist (Unix-domain stream)
net.local.dgram.pcblist (Unix-domain datagram)
net.local.seqpacket.pcblist (Unix-domain seqpacket)

A single sysctl call returns a structured array of records. The userland netstat formats them into columns, applies the requested filters, and exits. The interface here is netstat itself: one asks netstat, netstat speaks Unix text back, and that text pipes into grep, awk, sort or any other tool one cares to compose it with. There is no intermediate file pretending to be a tool. The cost is linear in the number of sockets, with a small constant; on a server with sixty thousand connections the command still returns in well under a second.

For users who prefer a more focused view of sockets without the routing-and-statistics surface area of netstat, FreeBSD also ships sockstat(1) in base:

sockstat -4l       # IPv4 listeners
sockstat -p 443    # everything on port 443

sockstat reads the same sysctl tree. It is a complementary tool, not a replacement, and it has been in the base system since FreeBSD 3.0 in 1998.

Linux: netstat, Still There, No Longer Maintained

On a Linux system, netstat is also still there, in most cases. It lives in the net-tools package, alongside ifconfig, arp, route and a handful of other tools that the BSD-trained administrator has known since the late 1980s. The Linux versions were written in the early 1990s by Fred N. van Kempen and others, modelled on the BSD originals but implemented from scratch against the Linux /proc filesystem rather than against sysctl and kvm.

For most of Linux's history, this worked. /proc/net/tcp contains one line per active TCP socket, in a well-defined ASCII format. The Linux netstat opens that file, reads the lines, parses each one, looks up process information through /proc/<pid>/fd/, and prints the result. On a small or medium server with a few hundred sockets, the cost was invisible.

The deeper issue is the shape of the interaction, not the byte count. /proc/net/tcp is a file. One can read it. One cannot ask it anything. If one wants only sockets on port 443, one has to read the whole file and filter in userspace. If one wants only established sockets, same thing. If one wants only sockets owned by a particular process, one has to read /proc/net/tcp whole and walk /proc/<pid>/fd/ whole, matching inodes by hand. The Unix Way says "compose tools by piping text"; /proc/net/tcp is not a tool, it is the text pretending to be one, and the actual work still happens in the tool behind it.

On a busy server the pretence becomes expensive. A reverse proxy or a load balancer in 2010 might have sixty thousand simultaneous connections, perhaps far more. Listing them with netstat -anp meant reading sixty thousand lines from /proc/net/tcp, parsing them all, walking the /proc/<pid>/fd/ tree, and filtering in userspace. On the same server, netstat could take ten or fifteen seconds and burn a noticeable amount of CPU during that time. None of that cost was paid by the kernel; all of it was paid by the tool, because the file in front of the tool was not askable.

The net-tools package itself has not had a new release since 2011. The maintainer publicly attempted to deprecate it in 2009 (the LWN article Moving on from net-tools documents the conversation). Most major distributions now recommend iproute2 as the replacement, and several no longer install net-tools by default. The Linux netstat is still there, it still works for small workloads, and it still does the wrong thing on busy ones. The wrong thing is not the tool's parsing; it is being asked to do that parsing because the thing in front of it was a file rather than a queryable interface.

Linux: ss, the Unix Pattern Restored

The replacement is ss(8), short for socket statistics. It was written by Alexey Kuznetsov, the same author who wrote much of the Linux QoS code and started the iproute2 package. ss does not read /proc/net/tcp. It asks netlink instead, and netlink, crucially, is askable.

netlink is a Linux kernel-to-userspace IPC mechanism, also Kuznetsov's work, designed for structured queries against kernel subsystems. In 2005, Linux 2.6.14 added a netlink family called NETLINK_INET_DIAG, which exposes IPv4 and IPv6 socket information as a queryable interface. In 2012, Linux 3.3 generalised it to NETLINK_SOCK_DIAG, adding Unix-domain socket support. The userland tool opens one netlink socket, sends one request specifying exactly what it wants, and receives back only the matching sockets as a stream of inet_diag_msg or unix_diag_msg records. ss formats those records into Unix text, prints them, and exits. The interface here is ss itself: one asks ss, ss speaks, and the answer pipes onward like any other Unix tool's output.

The performance difference is not a constant factor. On a server with a few hundred sockets it is barely measurable; on a server with sixty thousand it is the difference between a snappy tool and a slow one. The structured query also allows filtering in the kernel rather than in userspace: ss -tan state established '( dport = :443 )' translates the bracketed expression into a kernel-side filter, and only the matching records come back. Both gains follow from the same source: the kernel offers a queryable interface, and ss is free to be the Unix text tool one talks to. The middleman that pretended to be one is gone.

In day-to-day use:

ss -tan                     # all TCP, numeric
ss -tanp                    # add process info (needs root)
ss -s                       # summary by protocol
ss -tan state established   # only established TCP
ss -lntp                    # listening TCP with processes

The vocabulary is similar enough to netstat that a long-time BSD administrator can guess most of it on first encounter. The behaviour is fast enough that it survives in production.

The Architectural Point

The Unix Way puts the text interface in the tool's mouth. netstat IS the interface: one asks it, it speaks, the answer pipes into grep, awk, sort, less and whatever else one cares to compose. That is McIlroy's text-streams principle in its proper place. KISS in the classical sense: one tool, one question, one answer.

/proc/net/tcp is a different idea altogether. It is a file shaped like text. One can cat it, one can squint at it, one can grep it in a pinch. It is not a tool one can ask: there is no syntax for "established sockets on port 443 owned by nginx". So the tool behind it has to do that work: reading the whole file, parsing every line, filtering in userspace, walking another /proc tree, and only then formatting the result for the human. The Unix-ness has been pushed one layer out, and the work in front of it has not gone anywhere.

This is not a criticism of /proc as a debugging surface. Being able to grep /proc/net/tcp for one weird socket without writing a tool is a real benefit, and it is the reason /proc is still there in 2026. The trouble is that the same surface was also asked to serve as the production data feed, and a fixed text dump is the wrong material for that role. It can do both jobs, but only badly. It cannot do both well at scale.

ss and the netlink sock_diag family fixed the production case by adding a second interface that one can actually ask. The old surface stays for cat and grep. The new one took over the actual workload. The cost of the transition was a decade of slow netstat on production servers, a long mailing-list conversation, and a generation of administrators learning a new tool name. The shape of the fix, twenty-two years on, looked exactly like sysctl.

FreeBSD never had to choose, because FreeBSD never put a non-askable file between its tools and the kernel in the first place. The tool from 1983 still answers the question on the workloads of 2026. The Linux ecosystem did the right thing by adding ss. The fact that it had to is the architectural lesson.

The Unix text interface is a tool one can question, not a file one can read. netstat and sockstat are the interface; /proc/net/tcp only ever looked like one. KISS, properly applied, knows the difference.

Read the full article on vivianvoss.net →

By Vivian Voss — System Architect & Software Developer. Follow me on LinkedIn for daily technical writing.