<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://lord.technology/feed.xml" rel="self" type="application/atom+xml" /><link href="https://lord.technology/" rel="alternate" type="text/html" /><updated>2026-05-26T17:51:23+01:00</updated><id>https://lord.technology/feed.xml</id><title type="html">Jamie Lord</title><subtitle>Jamie Lord is a Solution Architect at CDS, building multi-tenant SaaS applications with C#, Cloudflare, Azure and AWS.</subtitle><author><name>Jamie Lord</name><email>jamie@lord.technology</email></author><entry><title type="html">Data residency was the wrong question</title><link href="https://lord.technology/2026/05/26/data-residency-was-the-wrong-question.html" rel="alternate" type="text/html" title="Data residency was the wrong question" /><published>2026-05-26T17:30:00+01:00</published><updated>2026-05-26T17:30:00+01:00</updated><id>https://lord.technology/2026/05/26/data-residency-was-the-wrong-question</id><content type="html" xml:base="https://lord.technology/2026/05/26/data-residency-was-the-wrong-question.html"><![CDATA[<p>The Dutch government <a href="https://www.politico.eu/article/netherlands-blocks-us-takeover-vital-digital-supplier/">today blocked</a> Kyndryl’s acquisition of Solvinity, the IT company that operates DigiD. DigiD is the digital identity system roughly fourteen million Dutch citizens use to file their tax returns, book GP appointments, and pay municipal bills. The block came under the Act on Undesirable Control in Telecommunications, WOZT in Dutch, on advice from the Investment Screening Bureau. The Authority for Consumers and Markets had cleared the deal in February on competition grounds. The veto came from a separate sovereignty review that had been running in parallel the whole time.</p>

<p>This is the first time a European government has used a dedicated sovereignty mechanism to block a US acquisition of a domestic digital infrastructure operator. The precedent matters more than the deal.</p>

<p>For a decade, data sovereignty has effectively meant data residency. Where do the bytes physically live, which region of which hyperscaler holds them, can the European customer get a contractual commitment that data does not leave the EU. The Dutch decision says that question was never the relevant one. The relevant question is who owns the company holding the bytes, because under the US CLOUD Act of 2018 American authorities can compel a US-headquartered company to produce data regardless of where in the world the data physically sits.</p>

<p>A US-owned Solvinity running DigiD in Dutch data centres on Dutch hardware would still be a US-owned company. The data sitting in Amsterdam would not change that. The legislation has been on the books for years. What is new is the political appetite to use it on a deal the competition authority had already cleared.</p>

<h2 id="why-this-is-not-just-a-dutch-story">Why this is not just a Dutch story</h2>

<p>The UK has the National Security and Investment Act 2021, which gives the Cabinet Office screening powers across seventeen sectors including communications and data infrastructure. Germany has the Außenwirtschaftsverordnung. France has Décret 2014-479, expanded in 2019. Every large European member state has a sovereign-investment screening regime now. Until today, almost every high-profile use of those regimes was a Chinese acquirer in semiconductors or robotics being waved away from the door. The Solvinity block is the first time the same screening logic has been pointed at an American buyer, against a target whose technical role any solution architect would recognise as routine.</p>

<p>Solvinity runs a managed cloud platform for the Dutch government. They do middleware, identity, hosting, the same work any large managed-services provider does for any government. There is nothing about Solvinity that makes it categorically different from the providers UK government departments use, the ones German Länder use, or the integrators French ministries hire. If WOZT applies to Solvinity, then NSIA applies to any equivalent UK firm, and the same logic applies right up the stack to the hyperscalers themselves.</p>

<h2 id="operator-nationality-is-now-a-first-class-procurement-question">Operator nationality is now a first-class procurement question</h2>

<p>For solution architects working with European public sector clients, the operator’s nationality is now a first-class concern in the way data location used to be. This is not a hypothetical compliance risk. The Dutch government has just demonstrated that an established Dutch operator with Dutch staff, Dutch data centres, and Dutch contractual terms can still be vetoed at the corporate parent layer if the parent becomes American.</p>

<p>That changes what the procurement conversation looks like. A managed service running on Azure, AWS, GCP, or Cloudflare is unremarkable while the operating company sits in the country buying the service. The moment the operating company is acquired by, or already belongs to, a US parent, the same managed service is reviewable under the local sovereign-investment regime regardless of where the workload sits. Contract terms, hosting region, and encryption all sit downstream of that decision. Only the corporate ownership of the operator addresses it.</p>

<p>I work with Cloudflare frequently. I <a href="https://architectingoncloudflare.com/">wrote a book</a> earlier this year arguing more teams should. Cloudflare is also US-headquartered. So is Microsoft, Amazon, Google, Kyndryl, IBM. The list of credible non-US-headquartered managed-cloud providers fits comfortably on a Post-it note. Anyone proposing one of those vendors to a European government client now has to explain, in writing, why the WOZT-equivalent in the buyer’s jurisdiction does not apply to the proposed arrangement. The honest answer is usually that the workload is not yet critical enough to invite review. That is not the basis to build a five-year procurement on.</p>

<h2 id="what-the-block-does-and-does-not-solve">What the block does and does not solve</h2>

<p>The Dutch veto is not a turn against US technology. Kyndryl will continue to operate in the Netherlands and Solvinity will continue to run DigiD under its existing ownership. The acquisition does not happen, but commerce does. The block is a narrow, surgical use of an existing screening regime to keep a specific category of national infrastructure out of US ownership.</p>

<p>It is also not the end of the matter. The Dutch parliament has been pushing since April to move DigiD operations away from Solvinity entirely, on the grounds that the next acquisition attempt could come from any direction. The block stops this deal. It does not solve the underlying problem of a private operator running the national identity system. The sovereignty regimes were never designed to address that. Solving it is the rest of this decade’s work.</p>

<p>For the rest of us, the takeaway is narrower. If you are advising a European client on a multi-year cloud arrangement, ownership of the operator is the question to ask first. Residency comes after. The EU-hosted footprint of a US company has been sold as a sovereignty defence for a decade. It is not one.</p>]]></content><author><name>Jamie Lord</name><email>jamie@lord.technology</email></author><category term="policy" /><category term="sovereignty" /><category term="cloud" /><category term="cloud-act" /><summary type="html"><![CDATA[The Dutch government today blocked Kyndryl’s acquisition of Solvinity, the IT company that operates DigiD. DigiD is the digital identity system roughly fourteen million Dutch citizens use to file their tax returns, book GP appointments, and pay municipal bills. The block came under the Act on Undesirable Control in Telecommunications, WOZT in Dutch, on advice from the Investment Screening Bureau. The Authority for Consumers and Markets had cleared the deal in February on competition grounds. The veto came from a separate sovereignty review that had been running in parallel the whole time.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://lord.technology/assets/images/og-image.png" /><media:content medium="image" url="https://lord.technology/assets/images/og-image.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The military aircraft tracker I built for an audience of one</title><link href="https://lord.technology/2026/05/21/the-military-aircraft-tracker-i-built-for-an-audience-of-one.html" rel="alternate" type="text/html" title="The military aircraft tracker I built for an audience of one" /><published>2026-05-21T17:00:00+01:00</published><updated>2026-05-21T17:00:00+01:00</updated><id>https://lord.technology/2026/05/21/the-military-aircraft-tracker-i-built-for-an-audience-of-one</id><content type="html" xml:base="https://lord.technology/2026/05/21/the-military-aircraft-tracker-i-built-for-an-audience-of-one.html"><![CDATA[<p>At about half past eleven one evening this week I noticed a US Navy E-6B Mercury orbiting over the North Sea. Not “noticed” in the way of someone who happened to look up — I was on the sofa with a laptop balanced on one knee, and the orbit was being drawn for me, in slow careful circles, by a dashboard I had been building, in evenings and weekends, for the previous eight days. The aircraft is a survivable airborne command post for the strategic nuclear force. It does not normally show up on a flight tracker at all, and when it does, it tends to fly straight lines between US bases. An orbit over the North Sea at FL250 with the callsign blanked is not a routine sight. The dashboard, which I had taught about a hundred small things by then, had quietly composed the event for me as a “rare type” anomaly with a “long sortie” co-signal. I watched the orbit for about forty minutes.</p>

<p>That feeling — the feeling of a system noticing something on my behalf — is the reason the project exists. It is called MilMov, it is closed source, and nobody but me will ever log in. I have a full-time day job; almost all the work on it has been done one-handed on the sofa, in evenings and weekends, with the other hand usually holding a mug of something. What follows is the engineering, the platform, the agent that wrote most of the code, and why the whole thing is private.</p>

<h2 id="what-milmov-actually-is">What MilMov actually is</h2>

<p>MilMov tracks “interesting” aircraft. Most of the catalogue is military — combat aircraft, transports, tankers, ISR platforms, helicopters, drones — but the underlying source data is curated by the <a href="https://github.com/sdr-enthusiasts/plane-alert-db">plane-alert-db</a> project, which also flags government VIP movements, special-mission civilian airframes, and a sprinkling of celebrity tail numbers that I deliberately filter out of the “what’s happening right now” surfaces because Taylor Swift’s jet is not what I am here for. The site polls <a href="https://www.adsbexchange.com/">ADS-B Exchange</a>’s global feed every five minutes, reconstructs closed flight legs every ten, scores anomalies on the way in and the way out, and serves a server-rendered dashboard from a single Cloudflare Worker.</p>

<p>The current numbers, on the day I am writing this:</p>

<ul>
  <li><strong>43,436 flights</strong> captured in the last seven days</li>
  <li><strong>9,794 anomalies</strong> in the archive, scored across ten dimensions</li>
  <li><strong>5,125 sorties by C-17 Globemasters alone</strong> in thirty days, across nine operators</li>
  <li><strong>One developer</strong>, no users other than me, and no AWS bill</li>
</ul>

<p><img src="/uploads/milmov-daily-brief.png" alt="MilMov daily brief" /></p>

<p>The header above is roughly what greets me when I open the laptop in the morning. The “What’s happening” feed is composed events — co-firing anomalies grouped by subject — and the chips on the right are the dimensions they fired on. Four Pilatus PC-21s in formation at FL185 over France. Ten US Navy T-45 Goshawks holding tight formation at FL132. A New York State Police Bell 430 squawking emergency. The number in the orange pill on the left is the composed score; the colour intensity is the rarity tier. If you have ever stayed up too late reading Aviation Week, this image is probably already producing a small reaction in your chest.</p>

<h2 id="it-started-as-a-different-project">It started as a different project</h2>

<p>The first version of MilMov was a .NET console app and a Blazor Server site. The first commit lands in March 2022. The early subjects — “ADS-B client decoding single flight data”, “Get aircraft type descriptions”, “Generate a new cookie” — read like the lab notebook of someone learning the territory in public, because that is what they were. I had a Raspberry Pi receiver in the loft feeding the ground network, a vague sense that I wanted to do <em>something</em> with that data beyond contributing it back, and no architecture worth defending. The Blazor site rendered server-side and used SignalR for live updates, which was fine, until I needed it to be globally distributed and cron-driven and resilient to ADS-B Exchange’s rate-limit behaviour, at which point hosting it became the whole problem. The project went into hibernation, with one attempted v2 in 2025 that did not make it past August.</p>

<p>I started the Cloudflare rewrite on the 13th of May this year. The commit subject is “Add TypeScript Cloudflare Worker; move C# to csharp/”. The C# directory was deleted six days later. This post is being written eight days after that first TypeScript commit. In those eight days the project has grown to 26,637 lines of code across 80 source files and 174 commits, on a single mainline branch, all of them mine, all of them written in evenings and weekends. That number is the part of the story I find slightly unbelievable when I look at it written down, and most of the rest of this post is an attempt to explain why it is not as unbelievable as it looks.</p>

<p>The .NET version was a hobby project I had to host. The Cloudflare version is a hobby project that hosts itself, and it overtook the .NET version in raw scope inside seventy-two hours.</p>

<h2 id="the-ingestion-problem-is-the-interesting-one">The ingestion problem is the interesting one</h2>

<p>The thing that makes any kind of aircraft tracking interesting is also the thing that makes it hard, which is that the upstream feed is not designed for you. ADS-B Exchange serves a global snapshot every second or so, but it serves it in a binary format called <em>binCraft</em>, compressed with Zstandard, gated behind a session cookie that you can only obtain by pretending to be a browser that has just fetched a particular JSON manifest, and rate-limited aggressively if you ask for the historical archive without warming the same cookie first. None of this is hostile — the format exists because JSON for global airborne traffic is enormous, and the cookie exists because the project’s economics depend on humans, not bots — but it does mean the first thing your tracker has to do is reverse-engineer the wire format and impersonate a browser politely.</p>

<p>The binary parser was a fascinating exercise. Each aircraft is a 112-byte struct in which integer fields are scaled by powers of ten and validity is encoded in separate bit-flag bytes. A null position is not a missing field; it is a flag bit that is off, against fields that are still present in the buffer. The decoder walks the struct in a <code class="language-plaintext highlighter-rouge">DataView</code>, scales each integer back to a human unit, and consults <code class="language-plaintext highlighter-rouge">flags73</code> and <code class="language-plaintext highlighter-rouge">flags74</code> to decide which of the resulting numbers are real. It is precisely the kind of code that you would expect to take weeks to land cleanly, and that Claude Code drafts in about an hour from a C# reference implementation with a careful prompt. One of my favourite small moments in the codebase is a comment where the parser notes a suspected scaling typo in the reference implementation (a QNH field being multiplied by ten where it should be divided), produces values like 101,000 hPa instead of the expected 1013, and I had to decide which side of the disagreement to land on. The reference was wrong. The atmosphere was right.</p>

<p>The cookie dance is more behavioural than technical. The worker generates an <code class="language-plaintext highlighter-rouge">adsbx_sid</code> cookie locally, primes it against the <code class="language-plaintext highlighter-rouge">/globeRates.json</code> endpoint to make it valid, and then uses that cookie for the next two days of binary fetches. The historical-archive endpoint is stricter — a fresh cookie that has not just primed sometimes gets a 429 anyway — so there is a distinct exception type for rate-limit hits and an adaptive throttle that backs off from 300 ms between requests to 1,500 ms after the first 429 of a run. None of this is exposed to the rest of the system. It is one function. It returns a cookie, or it throws.</p>

<h2 id="reconstruction-is-where-the-lies-get-caught">Reconstruction is where the lies get caught</h2>

<p>A single position fix is data. A flight is interpretation. Stitching one into the other turns out to be the most subtle part of the system.</p>

<p>The pipeline gets a stream of position samples — latitude, longitude, altitude, timestamp, ground/airborne flag — and is asked to produce a closed flight: a takeoff airfield, a landing airfield, a route line, a duration, an altitude profile. The obvious approach (look for the airborne-to-ground transition) breaks immediately, because the upstream feed lies. Aircraft ICAOs are only twenty-four bits, and they are reused across the world, which means a single ICAO sometimes returns positions from two unrelated airframes concatenated into one apparent “flight” that crosses the planet at Mach 6. The MilMov trace cleaner filters anything moving faster than 3,000 knots between adjacent samples — Mach 4-ish ground speed, well past anything operational, and well past the SR-71 that the US Air Force retired in 1998 — and drops sentinel positions at lat ±89.99° that the feed emits when it cannot resolve a location.</p>

<p>A subtler problem is the helicopter that sits with its rotor turning on a forward apron for forty minutes between two short hops, and looks to a naive segmenter like one continuous “airborne” flight with a very strange profile. The reconstruction code looks for any ground-altitude run longer than fifteen minutes inside what the feed has flagged as a single leg, and retroactively splits it. One row in the database becomes two real sorties, with two takeoffs, two landings, two timelines.</p>

<p>Once a flight is closed, a second-pass classifier reads the altitude profile and tags the mission. It looks at the ratio of cruise to loiter, whether takeoff and landing share an airfield, whether the altitude oscillates in touch-and-go shapes. From that, the row ends up tagged as one of <code class="language-plaintext highlighter-rouge">transit</code>, <code class="language-plaintext highlighter-rouge">training</code>, <code class="language-plaintext highlighter-rouge">patrol</code>, or <code class="language-plaintext highlighter-rouge">mixed</code>. None of this is AI — it is a small state machine over segment durations and altitude bands. The mission type is a column in the flights table; it backs a filter chip on the flights index; it is one of the reasons the dashboard can ever say something more interesting than “this aircraft was airborne for two hours”.</p>

<p>The same machinery runs <em>live</em>, against still-open flights. Each segment is appended as it stabilises, so the flight-detail page shows takeoff and climb and cruise as they happen rather than after the aircraft has landed.</p>

<p><img src="/uploads/milmov-flight-in-progress.png" alt="A Royal Australian Air Force MC-55A Peregrine airborne over Mississippi" /></p>

<p>The page above is one of those still-in-progress flights, caught while I was writing this. It is a Gulfstream MC-55A Peregrine — a brand-new signals-intelligence platform that the Royal Australian Air Force is still introducing, with single-digit numbers of airframes currently flying — operating out of Majors Airport in Texas, currently over Mississippi at FL388 with a SAMSS-prefixed callsign. The big black card is the live mission, broken row by row into the segment timeline the classifier is building as the aircraft flies it: takeoff at 13:33Z, climb to FL375 over thirteen minutes, ninety-odd minutes of cruise, and the airborne row that grows in real time — 676 nautical miles flown so far, one hour fifty-seven minutes in the air. The strip of tiles below it is the trailing history MilMov has built for this tail since it first appeared: 15 sorties, 36,969 nm total, time aloft of 88 hours, longest mission 3,158 nm, highest altitude FL450. There are not many flight trackers in the world that will surface a one-of-a-handful RAAF SIGINT type and tell you, at a glance, that this is its third sortie in the past thirty days.</p>

<p>The trace itself, in decimated form, lives in R2. One JSON blob per flight, keyed by flight ID, downsampled to the minimum number of points that preserves the shape of the route. R2 is where blob storage should live when you do not care about egress fees — which, on Cloudflare, you do not, because there are none.</p>

<h2 id="the-anomaly-framework-is-the-part-i-am-proudest-of">The anomaly framework is the part I am proudest of</h2>

<p>Once a flight is closed and a fact-pack has been computed for it, the anomaly engine fires. There are ten dimensions in the current registry, four of them firing inline against live position ticks, six of them firing on flight-close. They are:</p>

<ul>
  <li><strong>type-rarity</strong> — a type that has been seen in fewer than a small number of prior flights worldwide</li>
  <li><strong>type-global-count</strong> — today’s count of a type against its trailing average</li>
  <li><strong>type-region-novelty</strong> — first sighting of a type in a country or region</li>
  <li><strong>per-airframe-novel-route</strong> — an airframe flying a route pair it has never flown</li>
  <li><strong>per-airframe-duration</strong> — a flight much longer than the airframe’s median</li>
  <li><strong>per-airframe-inactivity</strong> — an airframe back after a long quiet stretch</li>
  <li><strong>burst-launch</strong> — N or more aircraft of one type leaving one airfield in a short window</li>
  <li><strong>spatial-density</strong> — interesting aircraft clustering in one 2° grid cell against the cell’s seasonal baseline</li>
  <li><strong>spatial-type-density</strong> — clustering of <em>one</em> type in one cell, which is the formation/exercise detector</li>
  <li><strong>emergency-squawk</strong> — 7500, 7600, or 7700 observed</li>
</ul>

<p>A score is computed for each dimension separately, and a composition layer groups co-firing signals on the same subject — say, four T-6 Texans launching from the same airfield within a thirty-minute window, three of them flying a novel route, the cluster also tripping a spatial-type-density spike — into one composed event with a single headline. The composition is the thing that lets the dashboard say “16 × T-6 Texan launched within 30 min” rather than dumping fifty separate signals on me and asking me to fuse them in my head.</p>

<p>The spatial-density detector is my favourite, because it is the only one where the baseline does the interesting work. Every five minutes the discover cron sees what is airborne, buckets each aircraft into a 2° grid cell, and compares each cell’s count against the cell’s own (day-of-week, hour) EWMA baseline. A spike fires when the z-score crosses two. The cells learn from the same observations that score against them, which keeps the comparison honest — once enough Tuesday evenings have rolled by, a cell off the British coast settles into a notion of what “normal RAF coastal patrol density” looks like, and only fires when it is busier than that. Exercises light it up. So do news events. The grid does not know what news is, and it does not need to.</p>

<p><img src="/uploads/milmov-anomalies.png" alt="MilMov anomaly archive" /></p>

<p>The page above is the anomaly archive. The chips at the top are dimension counts: 3,500 novel-route events, 1,461 coordinated launches, 1,066 rare-type sightings, 471 long sorties, 30 emergencies, 5 air-to-air refuelling events. The last category was the most fun to land. Tankers and receivers do not normally announce themselves; you have to detect them by spatial coincidence of two specific role tags at compatible altitudes within a tight box for long enough that one of them must be donating fuel to the other. It is not a high-volume signal — five events ever, in the live archive at the time of writing — but it is the one I most enjoy spotting in the wild.</p>

<h2 id="the-cloudflare-primitives-are-quietly-load-bearing">The Cloudflare primitives are quietly load-bearing</h2>

<p>I do not believe MilMov would exist if I had had to host it myself. The specific way the Cloudflare primitives compose is what makes a project at this scope tractable for one tired person on a sofa, and it is worth being specific about which primitive does what.</p>

<p><strong>Workers and Hono</strong> — the entire site is one Worker. There is no frontend host, no separate API gateway, no CDN configuration. Hono routes the requests and JSX renders the pages server-side. The site has no client framework. There is no React, no Svelte, no hydration. Pages ship as HTML with a few inline scripts for the Leaflet maps and the polling refresh. There is nothing on the wire that can be slow.</p>

<p><strong>D1</strong> — fifteen live tables, twenty-six migrations, sixty-eight hand-written query functions, no ORM. Hand-written is the operative word. The single biggest performance win in the project came from a commit that added two indexes and rewrote one list query as a CTE; the <code class="language-plaintext highlighter-rouge">/types</code> page went from scanning 3.3 million rows in 225 ms to scanning 158 thousand in 56 ms. SQLite rewards exactly this kind of attention, and D1 inherits the reward.</p>

<p><strong>R2</strong> — one decimated trace per closed flight, read once when I open the flight-detail page. No egress fees, a generous free tier, and a bucket that grows by thousands of objects a day without making me think about cost.</p>

<p><strong>Queues</strong> — the reconstruct and compute-facts pipelines both run on Cloudflare Queues. Each has its own producer (a cron that scans for eligible work), its own consumer (a handler that drains the queue with bounded concurrency), and its own dead-letter queue. The compute-facts consumer runs eight invocations in parallel, each fanning out twenty in-flight R2 reads, against a queue depth that can spike to hundreds of thousands after a backfill. It does not break. Queues are the unsexy part of the Cloudflare platform and the most impressive piece of operational engineering they have shipped.</p>

<p><strong>Workflows</strong> — the per-type historical backfill is a Cloudflare Workflow. When I trigger <code class="language-plaintext highlighter-rouge">/run/backfill-type/C17</code>, a Workflow class spins up and walks every active C-17 in the system, calls the backfill routine against each, and survives ADS-B Exchange rate-limit retries through the Workflow runtime’s own exponential-backoff machinery. The same code, on a long-running VM, would have been a process I would have had to babysit. As a Workflow, it just runs to completion, and the dashboard shows me which airframe it is currently working on because the workflow writes step-level status into D1 as it goes.</p>

<p><strong>Crons</strong> — seven scheduled triggers, ranging from every five minutes (live ingest) to once a day (baseline rebuild). They are declared in <code class="language-plaintext highlighter-rouge">wrangler.toml</code>. There is no separate scheduler. There is no calendar. There is no Lambda + EventBridge invoice. They are just there.</p>

<p>There is no LLM in any of this. The project did briefly have a Workers AI binding, for a daily narrative-summary experiment, and I tore it out within the first few days. The deterministic templates over the facts pipeline produced better summaries than the model did, more cheaply, more predictably, and more easily styled. I think about this when I read articles arguing every product now needs to be AI-powered. MilMov needed to be observation-powered. The observations are the thing.</p>

<h2 id="claude-code-is-the-reason-this-is-the-size-it-is">Claude Code is the reason this is the size it is</h2>

<p>It would be easy to overclaim here. Let me try to be precise.</p>

<p>If I had written MilMov by hand at evenings-and-weekends pace, I would have shipped maybe a quarter of the surface area in the same eight days, and the anomaly framework specifically would not exist. The reason I know this is that the .NET version, which I wrote by hand across short bursts in 2022 and again in 2025, got as far as “show me what is airborne and decode the binary feed” before I lost interest in the maintenance overhead of the rest of it. The TypeScript version, in roughly eight days of one-handed sofa evenings, has ten anomaly dimensions, a composition layer, a backfill workflow, a queue-driven reconstruction pipeline, an FTS5 search index, and a full anomaly replay system that can rebuild the archive over arbitrary date ranges without re-fetching traces. The difference is not motivation. The difference is leverage.</p>

<p>Claude Code does most of the actual typing. What I do is decide what to build, decide what the shape of the change is, decide what the data model has to look like, and review the diff. I run the typecheck, I open the migration file, I read the SQL, I push back on the ugly parts. The work that used to be a series of one-week side-quests (“write the queue consumer”, “wire up the workflow”, “add a column and a migration”, “rebuild the index”) is now mostly a series of evenings where I describe what I want and review the result. The repository carries a CLAUDE.md that tells the agent how the project is shaped, what conventions matter, where the SQL gotchas are. I add to it whenever I notice a mistake the agent is likely to make twice. The harness, <a href="https://lord.technology/2026/05/18/the-harness-is-the-product-not-the-model.html">as I have written elsewhere</a>, is the product.</p>

<p>The single highest-leverage commit so far is the one that introduced the queue-driven reconstruction pipeline. I described the problem (the in-cron <code class="language-plaintext highlighter-rouge">reconstruct()</code> was starving on a <code class="language-plaintext highlighter-rouge">LIMIT 25</code> with no ordering, ingestion was outrunning reconstruction by a factor of about twenty, traces were going un-stitched), described the shape of the fix (move reconstruction onto a queue, producer-consumer split, bounded parallelism for ADS-B Exchange rate limits, dead-letter queue for failures), and pressed send. I reviewed the resulting diff for about thirty minutes, asked for two changes, applied the migration, and deployed. The throughput jump that evening was the biggest single behavioural change the project has had so far. I did not type any of it.</p>

<p>I want to be honest about one other thing. Claude Code makes me a better engineer at this project, not a more careless one. The thing I am better at is the architectural call: what should be a queue, what should be a workflow, what should be a cron, what should be a column versus a derived view, where the indexes need to land, what the right abstraction boundary is. I am worse, at the margins, at remembering syntax for SQLite’s <code class="language-plaintext highlighter-rouge">ALTER TABLE</code> quirks, or how Hono’s middleware ordering interacts with cookies, and I no longer particularly care that I am worse at those. The harness handles them. I handle the shape.</p>

<p><img src="/uploads/milmov-type-c17.png" alt="MilMov Boeing C-17 Globemaster 3" /></p>

<p>The page above is the kind of thing I find delightful, partly because every metric on it (the operator distribution, the country distribution, the top airfields, the 30-day activity sparkline, the airborne-right-now count) is one carefully-tuned SQL query against one D1 database, and partly because the C-17 is one of the most aesthetically perfect transport aircraft ever built and I will fight anyone who says otherwise.</p>

<h2 id="why-it-is-closed-source">Why it is closed source</h2>

<p>The obvious question is why I do not open it up. The honest answer is that the system is finely tuned to my taste, and the value of that tuning is precisely that it is not negotiable with anyone else. I do not want a pull request asking me to add Taylor Swift’s jet to the home feed. I do not want an issue thread relitigating whether <code class="language-plaintext highlighter-rouge">Oligarch</code>-tagged airframes should be in the live map. The project is private the way a workshop is private. There is no roadmap to maintain, no contributor onboarding to write, no GitHub Discussions to ignore. The site has an access-token gate not because the data is sensitive — most of it is on ADS-B Exchange’s own site — but because the gate keeps the audience at one person, and one person is the design.</p>

<p>There is a version of this project I could imagine open-sourcing, and it would attract a small, intense community, and I would resent every minute of running it. The version I have built instead is one I love spending evenings inside.</p>

<p>The Cloudflare bill, after a week of running this at full volume, is a single-digit number of dollars. The Claude Code subscription pays for itself any week the project makes me smile, which is most of them.</p>

<h2 id="what-i-get-out-of-it">What I get out of it</h2>

<p>I have learned more about military aviation in the eight days of building this thing than in years of casual reading. I now know which RAF squadrons fly Texans out of which fields, why the US Navy E-6Bs and Air Force E-4Bs co-orbit during certain exercises, what a Royal Australian Air Force C-17 is doing in Diego Garcia, what the standard FL for a Globemaster transatlantic crossing tends to be, which Belarusian An-148 belongs to whom. I have started recognising airframes by registration the way some people recognise birds by song.</p>

<p>I have also learned, in a way no platform documentation ever quite teaches, what the Cloudflare Developer Platform feels like under sustained load from someone being slightly unreasonable about how much they want to do per dollar. I <a href="https://architectingoncloudflare.com/">wrote a book</a> earlier this year about how these primitives compose; MilMov is what taking my own advice in a single week looks like. It scales further than any hobby project has any right to need. The primitives genuinely compose the way the marketing claims they do. The constraints (the 128 MB memory cap, the 30-second wall on cron invocations, the D1 placeholder limit, the FTS5 tokenizer quirks) push you towards architectures that turn out, even after only a week of running them, to be the architectures I would have wanted in the first place.</p>

<p>The E-6B Mercury that orbited over the North Sea has not reappeared in MilMov since. The composed anomaly is still in the archive, with its decayed score, its dimension tags, its little caption explaining what fired and why. The orbit ended after about three hours; the aircraft turned west, climbed, and went home. Whatever it was doing was none of my business. The fact that the system noticed, on my behalf, while I was on the sofa with one hand free and a day job to do in the morning — that is the whole point of the project, and the whole point of building anything for an audience of one.</p>]]></content><author><name>Jamie Lord</name><email>jamie@lord.technology</email></author><category term="cloudflare" /><category term="personal" /><category term="workers" /><category term="d1" /><category term="workflows" /><category term="queues" /><category term="adsb" /><category term="claude-code" /><category term="aviation" /><summary type="html"><![CDATA[At about half past eleven one evening this week I noticed a US Navy E-6B Mercury orbiting over the North Sea. Not “noticed” in the way of someone who happened to look up — I was on the sofa with a laptop balanced on one knee, and the orbit was being drawn for me, in slow careful circles, by a dashboard I had been building, in evenings and weekends, for the previous eight days. The aircraft is a survivable airborne command post for the strategic nuclear force. It does not normally show up on a flight tracker at all, and when it does, it tends to fly straight lines between US bases. An orbit over the North Sea at FL250 with the callsign blanked is not a routine sight. The dashboard, which I had taught about a hundred small things by then, had quietly composed the event for me as a “rare type” anomaly with a “long sortie” co-signal. I watched the orbit for about forty minutes.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://lord.technology/assets/images/og-image.png" /><media:content medium="image" url="https://lord.technology/assets/images/og-image.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The harness is the product, not the model</title><link href="https://lord.technology/2026/05/18/the-harness-is-the-product-not-the-model.html" rel="alternate" type="text/html" title="The harness is the product, not the model" /><published>2026-05-18T21:00:00+01:00</published><updated>2026-05-18T21:00:00+01:00</updated><id>https://lord.technology/2026/05/18/the-harness-is-the-product-not-the-model</id><content type="html" xml:base="https://lord.technology/2026/05/18/the-harness-is-the-product-not-the-model.html"><![CDATA[<p>Cloudflare’s <a href="https://blog.cloudflare.com/cyber-frontier-models/">Project Glasswing write-up</a> landed today and the Hacker News thread is mostly arguing about whether the prose was written by Mythos or by Opus. It is a fair complaint and an irrelevant one. The diagram halfway down the page is the actual deliverable, and almost nobody is talking about it.</p>

<p>Cloudflare has published the reference architecture for doing vulnerability research with a frontier model at scale. The model in the headline is the easy part. The seven-stage agent pipeline around it is what makes the model useful, and it is the bit worth stealing.</p>

<h2 id="what-the-pipeline-actually-does">What the pipeline actually does</h2>

<p>Recon reads the repository top-down and produces a shared architecture document covering build commands, trust boundaries, entry points, and likely attack surface. Every downstream agent works from the same map.</p>

<p>Hunt fires roughly fifty agents in parallel, each pinned to one attack class against one narrow scope. Each hunter can compile and execute proof-of-concept code in a per-task scratch directory. Not ‘reason about whether this might be exploitable’, but actually run the exploit and see what happens.</p>

<p>Validate is the move that separates this from a clever prompt. An independent agent with a different prompt, a different model, and no ability to emit its own findings re-reads the code and tries to disprove the hunter. Putting two agents in deliberate disagreement does more for noise reduction than any amount of careful single-agent prompting.</p>

<p>Gapfill re-queues areas the hunters touched but did not cover. Dedupe collapses variants. Trace fans out one agent per consumer repository, uses a cross-repo symbol index, and answers the question that actually matters, which is whether attacker-controlled input reaches the flaw from outside the system. Feedback turns reachable traces into new hunt tasks. Report writes structured output against a schema and fixes its own validation errors before submitting.</p>

<p>Each stage is a fix for a specific failure mode anyone who has tried this work at scale will recognise. Unconstrained scope makes the model wander. Self-review turns the model into a generous marker. Once a hunter has had a few wins with one attack class it starts drifting toward that class and ignoring the rest of the surface. And the gap between ‘we found a thing’ and ‘an attacker can actually reach the thing’ is where most security findings die.</p>

<p>This is not Claude Code with a system prompt. It is a directed graph of agents with deliberately different prompts and deliberately constrained tool access, where the disagreement between agents carries the structural weight.</p>

<h2 id="why-the-model-is-the-wrong-thing-to-fixate-on">Why the model is the wrong thing to fixate on</h2>

<p>The thread keeps trying to litigate whether Mythos is genuinely a step change or a marketing exercise. Pick whichever side you prefer. The harness works because Mythos is good enough at chained reasoning to make the hunt stage productive, but it would still work, with degraded signal-to-noise, on Opus 4.7 or GPT-5.5. The architecture is the moat, not the weights.</p>

<p>Anyone who has pointed Claude Code at a hundred-thousand-line repository and asked it to find security issues knows the failure mode Cloudflare describe. A single agent session, even with subagents, covers maybe a tenth of a percent of the attack surface usefully before compaction kicks in and the earlier findings get dropped without ceremony. Driving harder does not help past a certain point. The bottleneck stops being the model and starts being the shape of the interaction.</p>

<p>This is the lesson most teams reaching for agentic engineering on non-trivial problems are going to learn the hard way. The model is necessary and nowhere near sufficient. Scope hints, adversarial reviewers, per-task scratch environments, structured output schemas, an explicit reachability stage. That is where the engineering lives. Security research makes the point obvious because the problem is narrow and parallel by nature. Plenty of other domains have the same shape if you look.</p>

<h2 id="what-to-take-from-this">What to take from this</h2>

<p>The adversarial review stage is the change with the highest payoff. Drop a second agent into your existing single-agent setup with a different prompt and no ability to emit findings of its own, and watch the false-positive rate fall. It generalises to anything where ‘is this finding real’ and ‘did the model find it’ need to be different questions.</p>

<p>The other pattern worth lifting is the split between ‘is there a flaw’ and ‘can an attacker actually reach it’. Asking the model both questions in one prompt produces worse answers to both. Splitting them across agents is cheap, and the same shape applies anywhere coverage matters more than depth on a single hypothesis.</p>

<p>The Cloudflare post itself is over-edited, light on hard numbers, and probably an inadequate basis for forming a view on Mythos specifically. <a href="https://daniel.haxx.se/blog/2026/05/11/mythos-finds-a-curl-vuln/">Daniel Stenberg’s write-up on a Mythos finding in curl</a>, <a href="https://xbow.com/blog/mythos-offensive-security-xbow-evaluation">XBOW’s competitive evaluation</a>, and the <a href="https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos">AISI evaluation</a> are better signal on the model. Trust the harness diagram more than the framing around it.</p>

<p>The model gets the headline. The harness is what ships.</p>]]></content><author><name>Jamie Lord</name><email>jamie@lord.technology</email></author><category term="ai" /><category term="cloudflare" /><category term="security" /><category term="agents" /><category term="claude" /><summary type="html"><![CDATA[Cloudflare’s Project Glasswing write-up landed today and the Hacker News thread is mostly arguing about whether the prose was written by Mythos or by Opus. It is a fair complaint and an irrelevant one. The diagram halfway down the page is the actual deliverable, and almost nobody is talking about it.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://lord.technology/assets/images/og-image.png" /><media:content medium="image" url="https://lord.technology/assets/images/og-image.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">If AI made Cloudflare more productive, the layoffs are the wrong move</title><link href="https://lord.technology/2026/05/08/if-ai-made-cloudflare-more-productive-the-layoffs-are-the-wrong-move.html" rel="alternate" type="text/html" title="If AI made Cloudflare more productive, the layoffs are the wrong move" /><published>2026-05-08T13:00:00+01:00</published><updated>2026-05-08T13:00:00+01:00</updated><id>https://lord.technology/2026/05/08/if-ai-made-cloudflare-more-productive-the-layoffs-are-the-wrong-move</id><content type="html" xml:base="https://lord.technology/2026/05/08/if-ai-made-cloudflare-more-productive-the-layoffs-are-the-wrong-move.html"><![CDATA[<p>Cloudflare laid off more than 1,100 people yesterday, around 20% of the company. The announcement, titled ‘Building for the Future’, explains the cuts by noting that internal AI use is up 600% in three months and the company needs to ‘architect itself for the agentic AI era’. The stock dropped 15-18% in after-hours trading.</p>

<p>I work at a Cloudflare partner, build on the Developer Platform daily, and have spent the last few years arguing that the platform is the strongest place to put new edge workloads. So when I say the public reasoning here does not survive five minutes of scrutiny, it is not contrarianism. It is concern.</p>

<p>The argument Matthew Prince and Michelle Zatlyn put forward is that AI has made the workforce so productive that the company can be smaller. If that were true, the rational move would be to hire more, not fewer. Cloudflare sits in front of a substantial portion of internet traffic and sells exactly the products that benefit from agent traffic going up: DDoS protection, Workers, AI Gateway, Bot Management, Browser Rendering, Durable Objects. The world is filling up with autonomous software that needs ingress, egress, security, and stateful compute at the edge. If your engineers are 6x more productive and your addressable market is expanding at the same time, the move is to fund more shots on goal. Ship more product. Undercut competitors who are still slow. Hire the people the rest of the market just laid off.</p>

<p>You do not cut 1,100 people.</p>

<h2 id="what-the-numbers-say">What the numbers say</h2>

<p>Cloudflare reported Q1 2026 revenue of $639.8 million, up 34% year on year. Free cash flow was $84.1 million for the quarter, 13% of revenue. Cash and equivalents stand at $4.16 billion. On the surface, a healthy growing business.</p>

<p>But the company has never posted a GAAP profit. Net loss in 2025 was $102 million, in 2024 was $79 million, in 2023 was $184 million. Stock-based compensation ran at $470 million last year against roughly 5,000 employees, around 22% of revenue. Gross margin compressed five points year on year, from 76% to 71%. Q2 guidance of $664-665 million implies growth decelerating into the high 20s.</p>

<p>That is the actual story. Margins are compressing, growth is slowing from a very high base, SBC is creeping up, and the company has been signalling profitability to the market for years without getting there. The AI narrative is more flattering. ‘We are reorganising for the agentic AI era’ lands better in a press release than ‘our gross margin is going the wrong way and analysts will punish us if we miss profitability targets again’.</p>

<h2 id="why-the-framing-matters-for-the-platform">Why the framing matters for the platform</h2>

<p>If the company were honest about this, I would have less to say. Public companies cut costs. The severance is good, full base pay through end of 2026, vesting through August, cliff-waivers for the recently hired. That is the kind of package that takes effort to put together and signals a leadership team that wants to do this right by people.</p>

<p>The framing matters because it determines who got cut. A margin-driven layoff selects for the bottom of the performance distribution and roles that are genuinely surplus. An ‘agentic-AI-era reorganisation’ selects for whoever a consultant told you to cut. The reports surfacing from inside Cloudflare on Hacker News describe the second pattern. Engineering managers said they had been actively trying to hire and lost team members anyway. SREs and PMs running connectivity-critical systems lost a quarter of their headcount. One manager wrote that his team’s products were running at 95% margin and he was still cut deep.</p>

<p>This is the bit that should worry Cloudflare’s customers and partners. The platform had two major incidents in the last twelve months that shook confidence. The remediation work after incidents like those is exactly the kind of unglamorous, institutionally-rooted effort that does not show up on a productivity dashboard but matters a great deal at 03:00 on a Sunday. An agent can triage a ticket. An agent cannot tell you why a particular config drift in a particular POP eighteen months ago is the reason a particular class of bug keeps recurring.</p>

<p>Cut 20% across an org and you do not lose 20% of the institutional memory. You lose the load-bearing 20%, because the load-bearing 20% is also the most expensive and the most senior, and consultants’ spreadsheets don’t have a column for ‘knows the system’.</p>

<h2 id="the-intern-post">The intern post</h2>

<p>In September 2025, Cloudflare announced a programme to hire 1,111 interns, the number a deliberate nod to 1.1.1.1. The blog post was called ‘Help Build the Future’. Eight months later, they laid off 1,100 people in a post called ‘Building for the Future’. The interns are a separate cohort and were not, from what I can see, the ones cut.</p>

<p>The kindest reading is coincidence. The less kind reading is that Cloudflare front-loaded cheap labour, kept the cheap labour, and shed the expensive labour. That is the oldest playbook in tech, dressed up in current-cycle vocabulary.</p>

<h2 id="what-this-changes-for-the-platform">What this changes for the platform</h2>

<p>I will keep building on Cloudflare. Workers, Durable Objects, R2, Queues, Workers AI, the developer platform as a whole is still the strongest place to design edge-first systems. None of that changes overnight. Product velocity over the last three years has outpaced every comparable platform, and the recent agentic-platform launches show no sign of letting up.</p>

<p>What I am revising is my confidence in the rate of improvement from here. The velocity came from teams of senior engineers who knew the systems and shipped hard against an aggressive roadmap. If a meaningful slice of those people just left, the velocity leaves with them, regardless of how many agent sessions the survivors are running. It will show up in product gaps, in regressions, and in incidents.</p>

<p>If you have anything load-bearing on Cloudflare, this is the week to look at your fall-back posture. Not because the platform is about to fall over, but because the assumption that the engineering organisation behind it is in the same shape as last quarter is no longer safe.</p>

<p>The honest version of yesterday’s announcement would have been one paragraph. We over-hired into a different macro environment, our gross margin needs defending, here is who is leaving and how we are paying them. The version we got tries to make a margin decision sound like a vision, by borrowing the same productivity story Cloudflare sells to its customers and turning it on its own staff. That is not building for the future. It is calling the bill from the past a strategy.</p>]]></content><author><name>Jamie Lord</name><email>jamie@lord.technology</email></author><category term="cloudflare" /><category term="cloudflare" /><category term="ai" /><category term="business" /><summary type="html"><![CDATA[Cloudflare laid off more than 1,100 people yesterday, around 20% of the company. The announcement, titled ‘Building for the Future’, explains the cuts by noting that internal AI use is up 600% in three months and the company needs to ‘architect itself for the agentic AI era’. The stock dropped 15-18% in after-hours trading.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://lord.technology/assets/images/og-image.png" /><media:content medium="image" url="https://lord.technology/assets/images/og-image.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">A Loop Intelligence Hub guarantees the failure it claims to solve</title><link href="https://lord.technology/2026/05/05/a-loop-intelligence-hub-guarantees-the-failure-it-claims-to-solve.html" rel="alternate" type="text/html" title="A Loop Intelligence Hub guarantees the failure it claims to solve" /><published>2026-05-05T21:00:00+01:00</published><updated>2026-05-05T21:00:00+01:00</updated><id>https://lord.technology/2026/05/05/a-loop-intelligence-hub-guarantees-the-failure-it-claims-to-solve</id><content type="html" xml:base="https://lord.technology/2026/05/05/a-loop-intelligence-hub-guarantees-the-failure-it-claims-to-solve.html"><![CDATA[<p>Robert Glaser has <a href="https://www.robert-glaser.de/when-everyone-has-ai-and-the-company-still-learns-nothing/">a long post</a> arguing that individual AI productivity gains do not become organisational gains, and that companies need a ‘Loop Intelligence Hub’ to capture which agentic workflows produce learning. The diagnosis is right. The fix would guarantee the failure it claims to solve.</p>

<p>The article hit the front page of Hacker News, and the top-voted comment, from a developer called olsondv, refuted the proposal in real time. ‘There is simply no motivation to develop this sort of intelligence loop as a dev who has their own responsibilities which their job depend on. Management can ask as nicely as they want, but I’m not going to selflessly share my productivity gains with the broader company for free.’ A reply from ravenstine went further: in an employer’s market, treat your personal AI workflows as trade secrets. If they want them, they can pay for them.</p>

<p>This is not cynicism. It is a rational read of the incentive structure inside every company that has started asking VPs how many story points AI delivered this sprint. Glaser writes that ‘the whole thing dies if it turns into employee scoring’ as if that were a risk to be managed by good intent. It is not a risk. It is the default outcome of any system that instruments how individuals use AI, no matter what the kickoff deck says.</p>

<h2 id="the-harness-collects-what-people-are-willing-to-be-seen-doing">The harness collects what people are willing to be seen doing</h2>

<p>Glaser proposes a ‘feedback harness’ that listens to real work loops, collecting prompts, specifications, reviews, accepted and rejected hypotheses, production signals, rework, human decisions, and interventions. A Loop Intelligence Hub then turns those signals into an enablement backlog, a capability radar, investment briefs, governance gaps. The framing is careful, and the whole edifice rests on engineers routing their genuine work through it.</p>

<p>They will not. Once a system exists that captures which loops produced learning, three things happen quickly. The most productive engineers route their best work outside the harness. The teams furthest along on agentic patterns develop a parallel toolchain on personal accounts. And the harness fills with the kind of demonstrable, well-narrated AI work that makes you look good in a quarterly review. Visible compliance, invisible learning, which is the failure mode Glaser names. The harness produces it.</p>

<p>A manager called daheza describes the mechanism in the same thread. VPs are asking ‘how many story points are we getting with AI now’, and ‘plenty of other managers are fully ready to just give bogus numbers’. His own team has cut stories that used to be five points to three because of AI, and total points delivered per sprint have stayed flat. The unit of measurement is being adjusted to keep the dashboard stable. That is not adoption failure. That is the system behaving exactly as a measured workforce behaves when the measurement turns into a ratchet.</p>

<h2 id="the-bottleneck-is-somewhere-else-entirely">The bottleneck is somewhere else entirely</h2>

<p>The single most upvoted comment on the article, from pards, makes a point Glaser does not address. ‘Development speed was never the bottleneck; it’s all the other processes that take time: infra provisioning, testing, sign-offs, change management, deployment scheduling etc. Code takes 6-12 months to make it from commit to production. AI makes these post-development bottlenecks worse. Changes are now piling up at the door waiting to get on a release train.’</p>

<p>This is the Theory of Constraints in plain English. If your constraint is post-development, accelerating pre-development creates inventory rather than throughput. A team running Claude Code at five times its previous output, against a release train that ships every six months, has produced five times the merge conflicts, five times the regression surface, and far more stale code that must be re-reasoned about by the time it finally moves. The unshipped code is, as pards puts it, a liability rather than an asset.</p>

<p>Glaser half-acknowledges this. He cites his own <a href="https://www.robert-glaser.de/what-if-iteration-is-all-we-need/">argument</a> that scrum was built for expensive iteration and that agile organisations preserved the reflexes agile was supposed to remove. But he treats the constraint as a cultural problem about loop sizes and elastic delegation, rather than a structural problem about who owns the deployment pipeline. A Loop Intelligence Hub does nothing for change advisory boards, security review queues, or the manual sign-off your platform team requires before anything reaches production.</p>

<p>It will, however, give you a very nice dashboard about which teams are stretching their loops well.</p>

<h2 id="what-the-article-gets-right">What the article gets right</h2>

<p>The three capabilities Glaser names are useful before the harness framing flattens them. Agent operations, the control plane for what agents can touch and which actions need approval, is genuine engineering work. Capability distribution, the question of how a useful skill discovered in one team becomes available to others without turning into a dead template, is the harder problem and the more interesting one. The middle layer, loop intelligence, is the one I would not build as a centrally instrumented thing.</p>

<p>The version that works is closer to what the support team in Glaser’s own example already does. They turn recurring tickets into workflow automation because they know exactly where the work hurts and nobody in the centre of excellence ever asked the right question. The learning is local, the artefact is functional, and nobody had to publish their prompts to a hub. If a pattern is good enough that another team would want it, the path for it to travel is the platform layer, as a tool, an MCP server, a skill, or a runbook evaluated against real scenarios. The travel does not need a meta-layer watching loops to identify which ones travelled.</p>

<h2 id="what-to-actually-do">What to actually do</h2>

<p>If you are running this rollout in a real company, the useful questions are narrower than the article’s framing, and most of them point away from AI.</p>

<p>Start with the deployment pipeline. If your release train ships every six months and your engineers can now produce changes several times faster than before, you are buying inventory you cannot move. The next budget cycle should fund deployment frequency before it funds agentic engineering enablement, because one of those investments compounds and the other turns into queue depth.</p>

<p>Then check the performance reviews. If AI use is touching individual scoring in any form, the ability to learn what is working has already been lost, and any harness built on top will collect performances rather than work. The fix is editorial, not technical, and it has to be in writing before anyone trusts it.</p>

<p>The remaining question is whether a pattern discovered in one team has any path to becoming a real platform capability without passing through a steering committee. If your platform team is allowed to ship a tool, an MCP server, a skill, or a runbook on the back of one team’s experience, the learning travels by itself. If everything has to be generalised, governed, and badged before it moves, the patterns will stay where they were discovered and the people who discovered them will keep them private.</p>

<p>The honest answer to ‘where is the ROI for the two million euros we paid Anthropic last year’ is that you cannot know yet, and that any system designed to tell you will be gamed before the next quarter closes. The companies that will get value from this technology are the ones whose deployment pipelines, platform layers, and incentive structures already work. The ones whose pipelines and incentives are broken will find that AI surfaces the breakage faster than expected, which is itself a useful outcome, and not the one the procurement deck promised.</p>]]></content><author><name>Jamie Lord</name><email>jamie@lord.technology</email></author><category term="ai" /><category term="agentic-engineering" /><category term="rant" /><summary type="html"><![CDATA[Robert Glaser has a long post arguing that individual AI productivity gains do not become organisational gains, and that companies need a ‘Loop Intelligence Hub’ to capture which agentic workflows produce learning. The diagnosis is right. The fix would guarantee the failure it claims to solve.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://lord.technology/assets/images/og-image.png" /><media:content medium="image" url="https://lord.technology/assets/images/og-image.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The UK already answered who owns AI-generated code, and it answered it the other way</title><link href="https://lord.technology/2026/05/03/the-uk-already-answered-who-owns-ai-generated-code.html" rel="alternate" type="text/html" title="The UK already answered who owns AI-generated code, and it answered it the other way" /><published>2026-05-03T08:00:00+01:00</published><updated>2026-05-03T08:00:00+01:00</updated><id>https://lord.technology/2026/05/03/the-uk-already-answered-who-owns-ai-generated-code</id><content type="html" xml:base="https://lord.technology/2026/05/03/the-uk-already-answered-who-owns-ai-generated-code.html"><![CDATA[<p><a href="https://legallayer.substack.com/p/who-owns-the-claude-code-wrote">Sena Evren’s piece</a> on who owns the code Claude Code writes was the most-discussed link on Hacker News this week, and it is a careful walk through human authorship, work-for-hire, GPL contamination, and the Bartz settlement. Every authority it cites is American. So is every authority cited in the 549 comments. The thread argues out US copyright doctrine to four decimal places without anyone mentioning that the UK has had a statutory answer to the same question since 1988.</p>

<p>I work on UK contracts for UK clients, mostly on Cloudflare’s developer platform, and Claude Code writes a lot of what I ship. The answer that applies to my output is not the one being argued over.</p>

<p>Section 9(3) of the Copyright, Designs and Patents Act 1988 says that where a literary work is computer-generated and has no human author, the author is taken to be ‘the person by whom the arrangements necessary for the creation of the work are undertaken’. Code is a literary work under UK copyright law. The provision was drafted in the era of expert systems and procedural generation, three decades before transformer models, but it is on the books and unambiguous on the point the US is currently tying itself in knots over. If a US court eventually rules that Claude Code output lacks human authorship and falls into the public domain, the same output produced under UK law has a copyright owner: whoever ran the prompt. The duration is shorter, fifty years rather than life-plus-seventy, but the ownership is settled.</p>

<p>The asymmetry matters most in an acquisition. A US-incorporated startup with a Claude-generated codebase has a chain-of-title problem; a UK-incorporated startup with the same codebase, run from the same prompts, does not. I have not seen a single M&amp;A diligence checklist that asks where the prompts were executed, but I expect to within the year.</p>

<h2 id="why-this-is-about-to-get-more-complicated-not-less">Why this is about to get more complicated, not less</h2>

<p>The catch is that the UK government wants to scrap section 9(3). The March 2026 Report on Copyright and Artificial Intelligence, published under section 136 of the Data (Use and Access) Act 2025, proposes its repeal. The reasoning is candid. The provision is unclear, has been seriously applied in only one case (Nova Productions v Mazooma Games, a 2007 dispute over a pool-themed arcade game), and contradicts the originality standard the UK absorbed from EU case law. That standard, set by Infopaq in 2009 and reaffirmed by the Court of Appeal in THJ Systems v Sheridan in 2023, requires a work to be the ‘author’s own intellectual creation’ reflecting their ‘free and creative choices’. A computer-generated work with no human author cannot meet it. Section 9(3) and the originality test cannot both be right, and the government has signalled that it will resolve the contradiction by deleting the older provision rather than the newer one.</p>

<p>So UK developers are in a strange position. Today, the law on the books gives them clean ownership of Claude Code output. In a year, depending on how the repeal is drafted and what replaces it, that protection may be gone. Most likely the line ends up where the US has settled: AI-assisted works with sufficient human direction remain copyrightable, purely AI-generated works fall into the public domain. Two jurisdictions arriving at the same answer from opposite directions, neither having intended to.</p>

<h2 id="what-this-changes-for-how-i-work">What this changes for how I work</h2>

<p>For UK work I have stopped chasing the ‘meaningful human authorship’ standard the US Copyright Office is still trying to define. Under s.9(3) I do not need it, and arguing about prompt-versus-output creative direction is a fight that consumes hours and produces no code. The diligence concern that actually matters is GPL contamination from training data, which is a license-scan problem, not an authorship problem. FOSSA and Black Duck handle it.</p>

<p>Anticipating the repeal, I have changed how I write commit messages. They now describe architectural intent rather than just what changed, on the bet that UK law will move toward the EU position before this becomes settled either way. ‘Restructured the rate-limiter to use a token bucket because the leaky-bucket variant Claude proposed first does not handle burst traffic from our scheduler’ is the sort of evidence that survives a transition from ‘arrangements necessary’ to ‘free and creative choices’. ‘Add rate limiter’ does not.</p>

<p>Who owns the code Claude Code writes depends on which jurisdiction asked the question and when. The US is working it out in court. The UK had an answer for thirty-eight years, is about to throw it away, and has not yet decided what replaces it. The developers shipping the most AI-assisted code, including most of the people who argued in that thread, are operating as if the US debate is the only debate. It is not, and acting as if it is leaves money on the floor.</p>]]></content><author><name>Jamie Lord</name><email>jamie@lord.technology</email></author><category term="ai" /><category term="claude-code" /><category term="copyright" /><category term="regulation" /><category term="agentic-engineering" /><summary type="html"><![CDATA[Sena Evren’s piece on who owns the code Claude Code writes was the most-discussed link on Hacker News this week, and it is a careful walk through human authorship, work-for-hire, GPL contamination, and the Bartz settlement. Every authority it cites is American. So is every authority cited in the 549 comments. The thread argues out US copyright doctrine to four decimal places without anyone mentioning that the UK has had a statutory answer to the same question since 1988.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://lord.technology/assets/images/og-image.png" /><media:content medium="image" url="https://lord.technology/assets/images/og-image.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Claude Code’s hook system just got weaponised</title><link href="https://lord.technology/2026/05/02/claude-codes-hook-system-just-got-weaponised.html" rel="alternate" type="text/html" title="Claude Code’s hook system just got weaponised" /><published>2026-05-02T13:00:00+01:00</published><updated>2026-05-02T13:00:00+01:00</updated><id>https://lord.technology/2026/05/02/claude-codes-hook-system-just-got-weaponised</id><content type="html" xml:base="https://lord.technology/2026/05/02/claude-codes-hook-system-just-got-weaponised.html"><![CDATA[<p>The Lightning PyPI compromise published on 30 April is being written up as another Shai-Hulud variant, which it is. Versions 2.6.2 and 2.6.3 of the <code class="language-plaintext highlighter-rouge">lightning</code> package shipped with a hidden <code class="language-plaintext highlighter-rouge">_runtime</code> directory, a 14.8 MB obfuscated JavaScript payload, and the usual exfiltration to AWS, Azure, GCP, GitHub Actions secrets, and any environment variable it could reach. Andy from Lightning has confirmed on Hacker News that the PyPI credentials were stolen via a compromised <code class="language-plaintext highlighter-rouge">pl-ghost</code> bot account, not a malicious PR. The GitHub source was clean. PyPI was the entry point.</p>

<p>That part of the story is <a href="https://semgrep.dev/blog/2026/malicious-dependency-in-pytorch-lightning-used-for-ai-training/">well-covered by Semgrep</a> and Socket. What is not being talked about enough is that this appears to be the first documented instance of malware abusing Claude Code’s hook system in the wild.</p>

<h2 id="what-the-worm-does-to-your-repo">What the worm does to your repo</h2>

<p>Once the payload runs on a developer machine or CI runner, it plants persistence hooks in two places. The VS Code one is familiar territory, a <code class="language-plaintext highlighter-rouge">.vscode/tasks.json</code> with <code class="language-plaintext highlighter-rouge">runOn: folderOpen</code> that fires <code class="language-plaintext highlighter-rouge">node .claude/setup.mjs</code> every time someone opens the project folder. Endpoint tooling has been catching that pattern for years.</p>

<p>The Claude Code hook is new. The malware writes <code class="language-plaintext highlighter-rouge">.claude/settings.json</code> with a <code class="language-plaintext highlighter-rouge">SessionStart</code> hook, matcher <code class="language-plaintext highlighter-rouge">"*"</code>, pointing at <code class="language-plaintext highlighter-rouge">node .vscode/setup.mjs</code>. Every time a developer opens Claude Code in the infected repo, the hook fires. No tool use, no user prompt, no approval dialog. Launching the session is enough. The dropper then bootstraps a Bun runtime, downloading <code class="language-plaintext highlighter-rouge">bun-v1.3.13</code> from GitHub releases if it isn’t already installed, and executes the full 14.8 MB payload from <code class="language-plaintext highlighter-rouge">.claude/router_runtime.js</code>.</p>

<p>Opening Claude Code in a cloned repository is now sufficient to execute arbitrary attacker-controlled code with the full credentials of your developer environment. The hook system was designed to let you run formatters, log telemetry, kick off pre-task checks. It is doing exactly what it was built to do. The threat model just caught up.</p>

<h2 id="why-this-is-not-the-same-as-a-packagejson-script">Why this is not the same as a <code class="language-plaintext highlighter-rouge">package.json</code> script</h2>

<p>The obvious objection is that npm has had <code class="language-plaintext highlighter-rouge">preinstall</code> and <code class="language-plaintext highlighter-rouge">postinstall</code> hooks forever, this is not new, calm down. It is not the same.</p>

<p>A <code class="language-plaintext highlighter-rouge">preinstall</code> hook fires when you install the package. You are taking an action; the action runs the code. The Claude Code <code class="language-plaintext highlighter-rouge">SessionStart</code> hook fires when you open Claude Code in a directory that contains a <code class="language-plaintext highlighter-rouge">.claude/settings.json</code>. You did not install anything. You did not run anything. You opened a tool.</p>

<p>The closest analogue is VS Code’s workspace trust prompt, which exists precisely because Microsoft learned this lesson with <code class="language-plaintext highlighter-rouge">tasks.json</code> and <code class="language-plaintext highlighter-rouge">launch.json</code>. Claude Code does not currently prompt on first session in a repo that contains hooks. It probably should. If you have cloned a repo today and opened Claude Code in it without reading <code class="language-plaintext highlighter-rouge">.claude/settings.json</code> first, you have run whatever the author of that repo wanted you to run.</p>

<h2 id="the-training-cutoff-problem-made-worse">The training cutoff problem made worse</h2>

<p>The HN comment from <code class="language-plaintext highlighter-rouge">nrengan</code> on the thread is the bit I have been thinking about most. Most of his pip installs come from Claude Code suggesting them, and he hits enter. The model was trained months ago. It has no idea <code class="language-plaintext highlighter-rouge">lightning@2.6.2</code> was compromised on 30 April 2026.</p>

<p>This generalises beyond Lightning. Any model with a knowledge cutoff is, by construction, blind to whatever got compromised after that cutoff. If your workflow is ‘ask Claude Code what package to use, accept the suggestion, run the install’, the model is functioning as a recommendation engine that cannot see the last six months of CVEs. The agent will happily suggest a pinned version that was malicious-by-the-time-you-installed-it.</p>

<p>The mitigation is not ‘be more careful’. It is ‘do not let the agent run installs in environments that hold credentials worth stealing’. Which means containers, devcontainers, ephemeral VMs, anything that gives the agent a working surface and revokes it when the task is done.</p>

<h2 id="what-to-actually-do">What to actually do</h2>

<p>If you have run <code class="language-plaintext highlighter-rouge">pip install lightning</code> between 30 April and now, treat the host as compromised. Rotate every token, every cloud credential, every API key that was reachable from that environment. Audit your repos for <code class="language-plaintext highlighter-rouge">.claude/settings.json</code>, <code class="language-plaintext highlighter-rouge">.claude/router_runtime.js</code>, <code class="language-plaintext highlighter-rouge">.claude/setup.mjs</code>, <code class="language-plaintext highlighter-rouge">.vscode/tasks.json</code>, and <code class="language-plaintext highlighter-rouge">.vscode/setup.mjs</code> files you did not write. The SHA256 of the malicious <code class="language-plaintext highlighter-rouge">router_runtime.js</code> is <code class="language-plaintext highlighter-rouge">5f5852b5f604369945118937b058e49064612ac69826e0adadca39a357dfb5b1</code> per the <a href="https://semgrep.dev/blog/2026/malicious-dependency-in-pytorch-lightning-used-for-ai-training/">Lightning team</a>. Search GitHub for repos with the description ‘A Mini Shai-Hulud has Appeared’; there are over two thousand of them as of yesterday and the stolen credentials are sitting inside in plaintext JSON.</p>

<p>For the longer term, two things. First, <code class="language-plaintext highlighter-rouge">pip</code> 26.1 added relative cooldown support; you can now pass <code class="language-plaintext highlighter-rouge">--uploaded-prior-to=P1D</code> or set <code class="language-plaintext highlighter-rouge">PIP_UPLOADED_PRIOR_TO=P1D</code> to refuse packages uploaded in the last day. <code class="language-plaintext highlighter-rouge">uv</code> has had the equivalent (<code class="language-plaintext highlighter-rouge">--exclude-newer</code>) for longer. Most worms get caught within hours; an analysis of ten recent supply chain attacks found eight had exploitation windows under one week. A 24-hour cooldown turns ‘I am patient zero’ into ‘I read about this on Hacker News and updated my pin’. There is no good reason not to have this on by default in CI.</p>

<p>Second, treat <code class="language-plaintext highlighter-rouge">.claude/settings.json</code> the same way you would treat a <code class="language-plaintext highlighter-rouge">Makefile</code> you found in a stranger’s repo. Read it before you open the project. If you do not want to read it, do not open Claude Code in untrusted directories. Your IDE’s workspace trust model needs to extend to the agent now, because the agent has hooks too.</p>

<p>Every primitive that lets you customise an agent’s behaviour also lets an attacker customise it. We are now past the point where that is hypothetical.</p>]]></content><author><name>Jamie Lord</name><email>jamie@lord.technology</email></author><category term="ai" /><category term="claude-code" /><category term="security" /><category term="supply-chain" /><category term="agentic-engineering" /><summary type="html"><![CDATA[The Lightning PyPI compromise published on 30 April is being written up as another Shai-Hulud variant, which it is. Versions 2.6.2 and 2.6.3 of the lightning package shipped with a hidden _runtime directory, a 14.8 MB obfuscated JavaScript payload, and the usual exfiltration to AWS, Azure, GCP, GitHub Actions secrets, and any environment variable it could reach. Andy from Lightning has confirmed on Hacker News that the PyPI credentials were stolen via a compromised pl-ghost bot account, not a malicious PR. The GitHub source was clean. PyPI was the entry point.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://lord.technology/assets/images/og-image.png" /><media:content medium="image" url="https://lord.technology/assets/images/og-image.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The Fogbank analogy is wrong, and the real risk is narrower</title><link href="https://lord.technology/2026/04/26/the-fogbank-analogy-is-wrong-and-the-real-risk-is-narrower.html" rel="alternate" type="text/html" title="The Fogbank analogy is wrong, and the real risk is narrower" /><published>2026-04-26T18:30:00+01:00</published><updated>2026-04-26T18:30:00+01:00</updated><id>https://lord.technology/2026/04/26/the-fogbank-analogy-is-wrong-and-the-real-risk-is-narrower</id><content type="html" xml:base="https://lord.technology/2026/04/26/the-fogbank-analogy-is-wrong-and-the-real-risk-is-narrower.html"><![CDATA[<p>Denis Stetskov’s piece this week, ‘<a href="https://techtrenches.dev/p/the-west-forgot-how-to-make-things">The West Forgot How to Build. Now It’s Forgetting Code</a>’, does a superb job of opening with the Fogbank story and the EU’s failure to deliver a million artillery shells on time. The hook works. The argument that follows, that AI-assisted coding is the same pattern of optimisation eating the talent pipeline, is half right and half a category error, and the half that is wrong is doing most of the rhetorical work.</p>

<p>Fogbank failed because the knowledge was physical. The original batch contained an unintentional impurity that nobody had documented because nobody knew it mattered. The workers who handled the material left, and with them went the muscle memory of a process that existed nowhere else. Code is the opposite kind of artefact. It is the most reproducible thing humans make. Every line of every dependency a senior engineer has ever shipped is sitting on GitHub, in npm, in the training data, in a Claude context window the moment you ask. The substrate of the problem is not the same.</p>

<p>This matters because the Fogbank framing implies a one-way ratchet, a thing that can be lost forever once the last person who knew it dies. The failure mode in front of us is narrower than that, and worth naming precisely.</p>

<h2 id="what-is-actually-at-risk">What is actually at risk</h2>

<p>Syntax and frameworks are not what atrophies when juniors skip the formative years. Those are recoverable in days from documentation an LLM will happily summarise. The thing that erodes is the judgement to know when generated code is wrong about your particular system, with its data shapes, its failure modes, its deploy history, the bug from eighteen months ago that everybody on the team carries around in their head. METR’s <a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/">2025 study</a>, which Stetskov cites, found experienced developers using AI tooling were 19% slower on real open source tasks, against a self-prediction of 24% faster. The 43-point gap shows up because reviewing generated code against an unfamiliar system is genuinely harder than writing it yourself, and the developers had not yet built that muscle. The bottleneck was their review skill under new conditions, and that skill takes a while to grow in.</p>

<p>That gap is a review problem, and review problems are tractable. Stetskov hints at this when he describes rewriting pull request templates, adding dedicated reviewers per project, demanding before-and-after screenshots. Those are good moves. They are also the kind of moves that look obvious in hindsight and were not obvious in 2023, which is the entire history of how teams adapt to new tools.</p>

<p>I see this clearly running Claude Code on our own client work. The first month with an agent on a codebase, even one I knew well, was slower than coding by hand. The second month, once I had a CLAUDE.md that captured the bits of the system the model could not infer from the source, was faster. The difference was almost entirely about how much of the context I had bothered to write down. Stetskov calls this ‘institutional knowledge that exists nowhere in the codebase’. I would call it institutional knowledge that exists nowhere in the codebase yet.</p>

<h2 id="where-the-parallel-does-hold">Where the parallel does hold</h2>

<p>The talent pipeline problem is real. The hiring numbers Stetskov cites are real. CRA’s <a href="https://cra.org/crn/2025/10/cerp-pulse-survey-a-snapshot-of-2025-undergraduate-computing-enrollment-patterns/">2025 enrollment survey</a> found 62% of computing departments reporting declining undergraduate intake, and that has a seven-to-ten year lag before it shows up at the staff engineer level. The bit of the defence parallel that does hold is the timeline. You cannot hire your way out of a senior engineer shortage in eighteen months any more than France could restart propellant production in eighteen months after a seventeen-year shutdown. If the bet on agentic coding turns out wrong, the lag to recover is measured in years.</p>

<p>The bet does not have to turn out wrong. Hedging it well means being honest about what the tooling absorbs and what it does not. Generation is mostly solved. Review under context is the bottleneck Stetskov correctly identifies and incorrectly diagnoses, and the teams that come out of this decade ahead will be the ones that treat writing down their system context as a first-class engineering deliverable rather than a chore to do later.</p>

<p>The Fogbank story ends with $69 million and years of reverse engineering to recover something that should have been written down. Stetskov reads that as a warning to keep more juniors on the manual work, the way the Pentagon should have kept more apprentices on the missile line. I read it the other way around. The institutional knowledge we have always relied on seniors to carry in their heads now has, for the first time, a credible place to live outside of them. Most teams have not done that work yet. The teams that do are the ones who will still be shipping in 2031.</p>]]></content><author><name>Jamie Lord</name><email>jamie@lord.technology</email></author><category term="ai" /><category term="claude-code" /><category term="agentic-engineering" /><summary type="html"><![CDATA[Denis Stetskov’s piece this week, ‘The West Forgot How to Build. Now It’s Forgetting Code’, does a superb job of opening with the Fogbank story and the EU’s failure to deliver a million artillery shells on time. The hook works. The argument that follows, that AI-assisted coding is the same pattern of optimisation eating the talent pipeline, is half right and half a category error, and the half that is wrong is doing most of the rhetorical work.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://lord.technology/assets/images/og-image.png" /><media:content medium="image" url="https://lord.technology/assets/images/og-image.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Writing tools for the agent that wrote them</title><link href="https://lord.technology/2026/04/26/writing-tools-for-the-agent-that-wrote-them.html" rel="alternate" type="text/html" title="Writing tools for the agent that wrote them" /><published>2026-04-26T15:00:00+01:00</published><updated>2026-04-26T15:00:00+01:00</updated><id>https://lord.technology/2026/04/26/writing-tools-for-the-agent-that-wrote-them</id><content type="html" xml:base="https://lord.technology/2026/04/26/writing-tools-for-the-agent-that-wrote-them.html"><![CDATA[<p>Most of what I have shipped this year has been built using Claude Code. That part is no longer interesting on its own — half the engineering timeline is talking about it. What is more interesting is that one of the projects I have been working on is also <em>for</em> Claude Code to use, written by Claude Code, to extend what Claude Code can do for me. The agent built its own tools, and now the agent uses them. I want to talk about that loop, because the second time you encounter it the implications start to widen out.</p>

<p>The project itself is not the point of this post. Pipeline shape: it pulls some external state into a local snapshot, runs deterministic rules over the snapshot to produce a queue of items, and then needs an agent to evaluate each item — decide whether it is real, relevant, and worth surfacing — before producing a final artefact for the user. The evaluation step is the agentic part. When I started, I built it the way you would expect: an in-process loop that called Anthropic’s SDK, parsed the model’s tool calls, dispatched them, and handed the response back. It worked. It had tests. It produced the right thing.</p>

<p>About four days after that loop landed I deleted it.</p>

<h2 id="when-the-ide-is-also-the-runtime">When the IDE is also the runtime</h2>

<p>The realisation that triggered the deletion was banal. I had been running every real evaluation by hand inside Claude Code rather than through the SDK loop. I had built the SDK loop because that is what you do when you are writing an agentic application — you instantiate your own loop, your own transport, your own tool registry, your own structured-output retry policy. But I never used it. The actual runtime where the work happened was the same Claude Code session I was already sitting inside while writing the code.</p>

<p>So the SDK loop went, along with the in-process tool registry, the recording transport, the headless-context detection, the API key plumbing, the preflight tests, the budget enforcement, all of it. About 1,200 lines of source and 600 lines of tests. The replacement is four stdio MCP servers that Claude Code spawns at session start, a custom subagent in <code class="language-plaintext highlighter-rouge">.claude/agents/</code>, a handful of skills in <code class="language-plaintext highlighter-rouge">.claude/skills/</code>, and an ADR explaining the reasoning. The agent now does its work by calling tools my code exposes. The structured-output contract is enforced at the tool boundary. There is no model SDK in the dependency graph at all.</p>

<p>This is the inversion the post is about. A normal agentic application has its own loop, and the model is one of its dependencies. When Claude Code is your runtime, the loop is the model’s, and your application is one of <em>its</em> dependencies. The arrow flips. You stop writing an agent and start writing tools for an agent that already exists.</p>

<h2 id="what-you-build-when-you-build-for-the-agent">What you build when you build for the agent</h2>

<p>The MCP servers I ended up with are small. The biggest is around 500 lines including its schema validation. The smallest is forty. They are not exciting individually. What is interesting is what they are, collectively, optimised for.</p>

<p>A normal library is optimised for human callers. Function names are descriptive but compact. Errors are structured for catching in known places. Tests assert on the happy path and the obvious failure modes. An MCP server you write for an agent to use is optimised for a different consumer. The function names have to make sense to a model that has read your tool descriptions once, half a context window ago. The errors have to teach the agent what to do next, because the agent is going to read the error and decide whether to retry, branch, or escalate to the human. The schemas are not just for validation; they are an authoring surface that the model writes against. The tool descriptions are the prompt.</p>

<p>The most useful tool I wrote is the one that ingests the agent’s structured output. Each evaluation produces a typed reasoning chain — a closed set of claim kinds with strict shapes, plus a narrative, plus a conclusion. Rather than ask the model to emit JSON and parse it post-hoc, the chain arrives as the input to a tool. The tool’s input schema is the structured-output contract. Zod validates the shape; a separate pass checks the semantic content (every cited claim must verify against the snapshot; references must resolve to the structured corpus the project ships; inferred claims must trace back to base claims that verified). If anything fails, the tool returns an error the agent can read, and the agent retries from where it is, mid-task, without losing the reasoning it already did. The integrity boundary is the tool, not a post-hoc parser.</p>

<p>This is a different shape from the tool registries I have written before. In a conventional agent loop, tools are functions the model calls to get information; structured output is a separate concern handled outside the tool surface. In an MCP-first design, the two concerns collapse. The structured-output mechanism is just a tool whose input schema happens to be the contract. The same boundary that would have been a parser becomes a server-side validator with retry semantics the agent already knows how to use.</p>

<h2 id="subagents-are-how-the-project-enforces-its-discipline">Subagents are how the project enforces its discipline</h2>

<p>The thing the agent calls when it does the work is not the parent Claude Code session. It is a subagent — a fresh context, a tightly scoped tool allowlist, and a system prompt that lives in version control next to the rest of the project. The subagent has read access to the snapshot, to the rule pack, to the structured reference corpus, and to the submission tool. It cannot run shell commands. It cannot read or write arbitrary files. It cannot make network calls. The constraint set is encoded in <code class="language-plaintext highlighter-rouge">.claude/agents/&lt;name&gt;/AGENT.md</code> and enforced by Claude Code itself, not by my code.</p>

<p>This matters more than it sounds. The system prompt for the subagent — what it is, what it must do, what it must never do — is a load-bearing artefact. It defines the closed set of claim kinds. It encodes the rule that the agent must never invent a framework reference if the corpus does not contain one. It explains how to use the operator-knowledge server when the model encounters a vendor-specific shape it has not seen before. Putting that prompt in <code class="language-plaintext highlighter-rouge">.claude/agents/</code> means it is reviewed when I review the rest of the codebase, diffed when I diff a change, and regression-tested when I run the test suite. The prompt is part of the project, not part of my head.</p>

<p>The skills are the project’s verbs. There is a skill that orchestrates a single evaluation, a skill that batches the work across the whole pending queue, a skill that runs the deterministic pipeline end-to-end. They live in <code class="language-plaintext highlighter-rouge">.claude/skills/</code> alongside the agent. They are how I actually use the system from inside Claude Code. The batch skill groups items into cohorts by rule and account, dispatches one subagent per cohort, and lets each subagent emit one chain per item. Claude Code’s own parallel agent fan-out is the scheduler. None of this required me to write a job runner.</p>

<h2 id="the-operational-details-that-surface">The operational details that surface</h2>

<p>Once Claude Code is your runtime, there are operational details that do not exist in a conventional architecture. The MCP servers are long-lived child processes. They cache their data at session start. A change to the rule pack is hot-reloaded by the affected server on the next tool call; a change to the chain schema requires Claude Code to restart, because Node has cached the old import. I learned this the painful way. My first batch run produced 27 failed chains because a rule addition was not seen by the rules server, and 44 chains that could not emit a new claim kind because the schema in the chain server was stale. The result became an ADR titled “MCP hot-reload vs. restart — what takes effect when,” which is now the operational guide every contributor reads before adding a rule or touching a schema.</p>

<p>That ADR exists because the project’s user is the agent. A normal piece of code does not need an operator’s manual for “what takes effect when you change source files.” A long-running stdio MCP server inside an editor session does. The shape of the documentation changes when the runtime changes.</p>

<h2 id="the-deletion-as-evidence">The deletion as evidence</h2>

<p>The strongest evidence that the model has actually inverted is the size of the codebase. When you build a conventional agentic application, the parts of the system that are about <em>being</em> an agent — the loop, the transport, the tool registry, the budget, the recording infrastructure — accrete continuously. Every new failure mode you discover gets a new piece of code. When the runtime moves out, those parts go away. The MCP servers are smaller than the loop they replaced. The tests are tighter than the recording fixtures they replaced. The dependencies are fewer. The operational surface is narrower. The integrity guarantees are stronger, because they live in one place — the tool boundary — instead of spread across a parser, a retry policy, and a downstream consumer of a JSON blob.</p>

<p>I keep an ADR titled “Claude Code is the sole runtime.” It is mostly a list of what was deleted and why. It is short, and it is the one I am most proud of, because it is the only ADR in the project that records work done by removal rather than addition. The architecture got smaller when the runtime moved.</p>

<h2 id="the-recursion-is-the-point">The recursion is the point</h2>

<p>The thing I keep coming back to, watching my own projects, is the recursive shape of this. Claude Code wrote almost every line of those MCP servers under my direction. I read the diffs and run the tests and push back on bad designs, but the typing is its. Once the servers are merged, the next session of Claude Code spawns them. The agent that authored the tools is now using the tools. The same fingerprints are on both sides of the boundary.</p>

<p>This is not a parlour trick. It changes what you build. When you know that the consumer of your library is the agent that just wrote it, you write the library differently. The descriptions become richer because the agent will rely on them. The error messages become instructive because the agent will read them. The schemas become tighter because schema enforcement is your only line of defence against drift. You stop optimising for human ergonomics and start optimising for agent ergonomics, and you discover that the two often want different things. The agent does not want a clever fluent API. It wants a small set of well-named tools, predictable error shapes, and a contract it cannot accidentally violate.</p>

<p>Most of the discourse about agentic engineering still treats the agent and the codebase as separate populations: the codebase is what the agent edits, and the agent is the editor. That framing was right two years ago. It is becoming wrong. A growing class of projects is one in which the codebase is also the agent’s runtime — the project is the harness, and the harness is the project. When you build for that, the deliverables look different. You ship MCP servers, you ship subagents, you ship skills, you ship hooks, and you ship the discipline that makes them safe to compose. The agent does the work. Your code is what makes the work trustworthy.</p>

<p>I find that an unusually satisfying place to be writing software in 2026.</p>]]></content><author><name>Jamie Lord</name><email>jamie@lord.technology</email></author><category term="ai" /><category term="claude-code" /><category term="mcp" /><category term="agentic-engineering" /><summary type="html"><![CDATA[Most of what I have shipped this year has been built using Claude Code. That part is no longer interesting on its own — half the engineering timeline is talking about it. What is more interesting is that one of the projects I have been working on is also for Claude Code to use, written by Claude Code, to extend what Claude Code can do for me. The agent built its own tools, and now the agent uses them. I want to talk about that loop, because the second time you encounter it the implications start to widen out.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://lord.technology/assets/images/og-image.png" /><media:content medium="image" url="https://lord.technology/assets/images/og-image.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Building real products alone on Cloudflare and Claude Code</title><link href="https://lord.technology/2026/04/26/building-real-products-alone-on-cloudflare-and-claude-code.html" rel="alternate" type="text/html" title="Building real products alone on Cloudflare and Claude Code" /><published>2026-04-26T09:00:00+01:00</published><updated>2026-04-26T09:00:00+01:00</updated><id>https://lord.technology/2026/04/26/building-real-products-alone-on-cloudflare-and-claude-code</id><content type="html" xml:base="https://lord.technology/2026/04/26/building-real-products-alone-on-cloudflare-and-claude-code.html"><![CDATA[<p>Six weeks ago I started building something. A real commercial SaaS product, not a prototype — authentication, billing, scheduled background work, document ingestion, AI generation, distribution flows, an end-to-end test suite that actually passes, and the kind of architectural scaffolding that survives a code review. I am one person. The product sits roughly where you would expect a small team to be after a quarter, and the experience of getting it there felt different in a way I do not think the industry has fully absorbed yet.</p>

<p>Two things made it possible. The first is that the Cloudflare Developer Platform has reached the point where a single developer can deploy a serious product on it without any other infrastructure. The second is that Claude Code has reached the point where it functions less like a clever autocomplete and more like a junior team you direct instead of supervise. Neither of these is news in isolation. What is news is what they feel like together.</p>

<h2 id="the-platform-stops-being-a-thing-you-fight">The platform stops being a thing you fight</h2>

<p>Most cloud platforms make you choose your battles. You pick which database, which queue, which object store, which secret store, which observability stack, which deployment tool, which frontend host, which CDN, and then you spend a non-trivial fraction of your time wiring those choices together and apologising when they leak through to the application code. Cloudflare mostly removes that work because the choices have already been made and they fit together by design.</p>

<p>The product I built runs entirely on Workers, with SvelteKit compiled through the Cloudflare adapter as the application runtime. D1 is the database. R2 holds the binary artefacts customers upload. Workers AI serves the inference, fronted by AI Gateway for logging, caching, and provider fallback. Workflows handles every long-running operation: document ingestion, generation, scheduled scans against external sources. KV holds rate-limit counters. There is no Postgres, no Redis, no S3, no SQS, no scheduled GitHub Action acting as a cron, no Sentry, no Datadog, no separate frontend host. Just bindings on <code class="language-plaintext highlighter-rouge">platform.env</code>.</p>

<p>The thing I keep coming back to is that I never had to make an integration work. The bindings are present at deploy time or the deploy fails. The IAM model is implicit because there is no IAM. Secrets live in <code class="language-plaintext highlighter-rouge">wrangler secret</code>. Logs stream from <code class="language-plaintext highlighter-rouge">wrangler tail</code>. An AI request gets logged in AI Gateway with token counts and latency before I write a line of observability code. Production and staging are the same Worker pointed at different bindings. Local development is one command, and local development uses the actual storage layers via Miniflare. The platform behaves as if a small group of engineers spent a long time worrying about the seams so that I would not have to.</p>

<p>You feel this most acutely when you do something that would otherwise be load-bearing infrastructure. The product polls a dozen or so external sources on a daily schedule, hashes the responses, diffs against the previous hash, and triggers a child workflow whenever a source has changed. On AWS this is a Lambda triggered by EventBridge writing to DynamoDB and dispatching Step Functions, with the IAM policies and the dead letter queue and the X-Ray tracing that go with it. On Cloudflare it is a Workflow class with a cron trigger and a <code class="language-plaintext highlighter-rouge">step.do</code> that calls the database. The retry policy is a config object on the step. The whole monitoring system is two files.</p>

<h2 id="the-constraints-are-the-design">The constraints are the design</h2>

<p>Cloudflare’s constraints are well advertised. Workers cap at 128 MB of memory. D1 caps at 10 GB per database. Workflow classes cannot be exported from a Worker that is also a SvelteKit application — they have to live in their own deployable. Each of these is occasionally inconvenient and each of them, in practice, pushes you towards the architecture you should have written anyway.</p>

<p>The 10 GB D1 ceiling means you put a tenant column on every table from day one whether you currently need it or not, because the path to scaling out is sharding by tenant and the schema has to be ready when that day arrives. The Workflow export limitation means each long-running concern lives in its own worker, cross-script bound from the main application — which sounds annoying until you notice that this is exactly the right separation of concerns for retry behaviour, scaling, and operational failure isolation. The 128 MB memory limit means you stream rather than buffer, and your processing pipelines stay composable rather than collapsing into one fat function that does everything.</p>

<p>This is the observation Cloudflare keeps making about its own platform: the constraints are not arbitrary, they are a design with consequences. The platform asks you to write code that scales horizontally, hibernates when idle, and keeps state close to compute. If you do that, the architecture is correct by construction. The platform does not let you build the wrong thing very easily.</p>

<h2 id="claude-code-is-not-autocomplete-and-that-matters">Claude Code is not autocomplete, and that matters</h2>

<p>The other half of the leverage is the agent. I have used every flavour of AI coding tool over the last two years, from inline completions to chat sidebars to agent IDEs, and Claude Code is the first one where the tool/operator relationship genuinely inverted. I do not type code. I type architectural intent and review what comes back. The artefact is not a function — it is a feature shipped behind an ADR, a migration file, a database access layer, a route, a unit test, an end-to-end test, a regression check, and a commit on a feature branch.</p>

<p>The mechanism that makes this work is not the model on its own. It is the fact that Claude Code lets you build a project-specific agent, with project-specific subagents and project-specific skills and project-specific rules, and that it actually uses them. My repository carries a <code class="language-plaintext highlighter-rouge">CLAUDE.md</code> that tells the agent what the product is, what the build commands are, what the architecture is, what is in the structured reference data that constrains generation, and what the operating rules are. There is a <code class="language-plaintext highlighter-rouge">.claude/rules/</code> directory with code style, API design, testing, security, and documentation rules that get pulled in automatically. There is a <code class="language-plaintext highlighter-rouge">.claude/skills/</code> directory with workflows for the recurring tasks: adding a route, adding a migration, planning a feature, fixing an issue, running a regression check, running a pre-deployment check. There is an agents directory with a product manager, a solution architect, a lead developer, a developer, a tester, a domain specialist, and a security auditor.</p>

<p>When I ask the agent to add a feature, the planner plans it, the solution architect writes the ADR, the developer implements it, the tester writes the tests, and the lead developer reviews the diff. Not because I am orchestrating that flow — because the harness is. The skill for a new route knows it needs to add the route, the form action, the load function, the activity log entry, and the end-to-end test, and that those have to land in the same commit. The skill for a new migration knows it needs to update the schema, the access layer, the type, and the test. The agents are not theatre. They are the way the project enforces its own discipline on the model.</p>

<p>The result is that I work more like a tech lead than a developer. I describe the change. I read the plan. I push back on the parts that are wrong. I read the diff. I run the regression check. I commit. The agent never returns a raw <code class="language-plaintext highlighter-rouge">Response</code> from a SvelteKit form action because the rules forbid it. The agent updates the e2e tests in the same commit as the UI change because the rules require it. The agent never invents reference data because the rules tell it not to and the structured store tells it where to look. None of this requires me to remember anything in the moment. The discipline is in the harness, not in my head.</p>

<h2 id="the-economics-are-quietly-absurd">The economics are quietly absurd</h2>

<p>The Cloudflare bill for the entire build through development was effectively zero. The Workers free tier covers a hundred thousand requests a day. D1’s free tier covers five gigabytes of storage and millions of row reads. R2’s free tier covers ten gigabytes. Workers AI’s free allocation is generous. AI Gateway is free. The only meaningful cost during the build was inference for the harder generation prompts, and that cost was less than my coffee budget.</p>

<p>Cloudflare’s pricing is shaped so that the marginal cost of running a real product approaches zero until you have paying customers, at which point the costs scale with revenue rather than ahead of it. That is the right shape for a solo founder. The platform is not designed to extract money from you while you are still figuring out whether the product works. It is designed to disappear until usage says you need to be charged.</p>

<p>The Claude Code subscription is a separate budget. The honest answer is that for a project of this scope it pays for itself in a week. The model writes code that I would otherwise pay a senior engineer to write, and it does it under direction, and the work product is reviewable. I do not think about the subscription cost. I think about whether the agent is producing the right thing.</p>

<h2 id="what-this-actually-changes">What this actually changes</h2>

<p>I am wary of the “AI changes everything” register. Most of the time it changes specific things, and the specific things matter, and the rhetorical inflation gets in the way. So let me be precise.</p>

<p>It has been true for a while that a solo developer could build a small SaaS product if the scope was modest, the architecture was simple, and the operational requirements were minimal. Stripe and a managed Postgres on a managed host got you a long way. What was not true was that a solo developer could build a substantive product with scheduled background processing, AI throughout, document handling, billing, distribution, end-to-end evidence trails, and the kind of architectural rigour that survives an acquirer’s due diligence. That required a team.</p>

<p>The combination of Cloudflare’s platform and Claude Code as a directed agent has moved that line. Not by an enormous amount and not for every kind of product, but for a real and recognisable class of SaaS products it has moved enough that one person can do it now, and the resulting code is not embarrassing. The artefact passes type checks and lint and the test suite. The migrations are reversible. The ADRs are written. The end-to-end tests pass. The activity log is append-only. The bindings are typed. The deployment order is enforced.</p>

<p>What I find most interesting is that this is not really about speed. The speed is impressive, but the speed is a consequence. The actual change is in the shape of the work. The boring infrastructure is gone, the platform handles the platform, the agent handles the implementation, and the human gets to think about the product. That is what people mean when they talk about leverage, and it is finally available to one person at a desk.</p>

<p>The market has not absorbed this yet. It will.</p>]]></content><author><name>Jamie Lord</name><email>jamie@lord.technology</email></author><category term="ai" /><category term="cloudflare" /><category term="claude-code" /><category term="workers" /><category term="workflows" /><category term="agentic-engineering" /><summary type="html"><![CDATA[Six weeks ago I started building something. A real commercial SaaS product, not a prototype — authentication, billing, scheduled background work, document ingestion, AI generation, distribution flows, an end-to-end test suite that actually passes, and the kind of architectural scaffolding that survives a code review. I am one person. The product sits roughly where you would expect a small team to be after a quarter, and the experience of getting it there felt different in a way I do not think the industry has fully absorbed yet.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://lord.technology/assets/images/og-image.png" /><media:content medium="image" url="https://lord.technology/assets/images/og-image.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>