“There is now ample evidence, collected over the last few years, that AI systems are unpredictable and difficult to control.” That’s Dario Amodei in January, writing about the technology his company sells.
Compare with what’s on your LinkedIn timeline this week. Here’s the script: Schema markup ensures AI engines parse your content. The first sentence of every section must be the answer. Optimize for chunk-level retrieval. There’s a 13% citation lift available if you do X, a 2.8x conversion improvement if you do Y.
It’s one of the cleanest patterns going right now, and the industry has elected not to notice. The people closest to these systems are increasingly cautious about claims of control. The people furthest from it are increasingly certain they know how it works … they’ve cracked it. That gradient runs the wrong way.
What The People Who Built It Actually Say
Anthropic published its main interpretability research post in May 2024. It opens:
“We mostly treat AI models as a black box: something goes in and a response comes out, and it’s not clear why the model gave that particular response instead of another.”
Anthropic, writing about its own model, two years ago.
Things haven’t gotten more confident since. Neel Nanda, who runs Google DeepMind’s mechanistic interpretability team, gave an interview to 80,000 Hours in September 2025 in which the headline finding was that the most ambitious version of mech interp is probably dead. He doesn’t see a realistic world where the discipline delivers “the kind of robust guarantees that some people want from interpretability.” Worth re-reading. The person whose job is to read AI minds is publicly conceding that the project, as originally conceived, won’t get there.
At NeurIPS 2024, Ilya Sutskever, co-founder of Safe Superintelligence and formerly chief scientist at OpenAI, accepted his Test of Time award and used the platform to say something the room wasn’t expecting from him:
“The more it reasons, the more unpredictable it becomes.”
Sutskever’s career is essentially the scaling hypothesis with a face on it. Hearing him say the next phase produces less predictable outputs is itself an admission.
Now scroll back to your timeline. The gradient is Dunning-Kruger redrawn at an industry scale: Mt. Stupid with a pricing page, and the valley of calibration where the actual work happens. Image Credit: Pedro Dias
What The People Selling It Actually Say
A practitioner posts a four-pillar framework for “Technical GEO.” A consultant guarantees inclusion in AI Overviews. An agency markets a 13% lift in citation likelihood, derived from data the agency itself produced about the agency’s own prescriptions. A widely shared post promises that maintaining a 300-character paragraph limit dictates how a vector database chunks your content. A vendor claims a 78% “share of model.” A senior figure in your inbox describes a 2.8x improvement in conversion from being cited in SGE. The vocabulary is deterministic: “ensures,” “guarantees,” “dictates,” percentages precise to the decimal, frameworks confidently named. None of it sounds anything like the language the people who built these systems use when describing how the systems behave.
This is the part I keep getting stuck on. The consultants are confident about the tactics they’ve measured against themselves. Run the same playbook on a few clients, watch some metric move, call it evidence. No control groups, no pre-registered hypotheses, no measurement of what the tactic is actually claimed to change. That’s the bar a real test has to clear; everything else has been confirmation in costume. The problem is the confidence level, which is wrong by an order of magnitude regardless of whether the underlying tactic does anything. The same model that Anthropic publicly says it cannot fully account for is being…