Opus 4.7 + Mythos Preview — Working Timeline

§ 01Why this episode is harder than it looks

The public discourse has fused three distinct critiques of Opus 4.7 into a single "Claude is getting nerfed" narrative. Pulling them apart is the whole editorial opportunity here — each has a different evidence base, a different moral weight, and a different human-implications tail.

Critique A — Quality regression ("nerfing"). Users reported Claude Code getting worse across March and April. Anthropic's April 23 post-mortem attributes this to three separate unintentional changes, not deliberate throttling.

Critique B — Adaptive Reasoning removes user control. Opus 4.7 decides for itself how hard to think about a prompt. The old manual "Extended Thinking" toggle is gone in the chat app. This is real, and it's not quite what users are describing when they say "it routes to a dumber model" — it's the same model choosing how much compute to spend on you, without showing its work.

Critique C — The Acceptable Use Classifier as gatekeeper. A separate safeguard system pre-screens prompts for cyber-misuse risk before the model ever sees them. Its false-positive rate has surged since Opus 4.7 shipped with tighter cyber-guardrails as the testbed for Mythos-class safety.

All three are happening at once. The podcast's job isn't to adjudicate — it's to help listeners see the three stories, and notice what each one reveals about living with systems that are increasingly making decisions on our behalf.

§ 02The Timeline

Opus 4.7 thread

Mythos / Glasswing thread

Both converge

Meta / context

Feb 2026

OpusOpus 4.6 ships

The prior flagship. Establishes the baseline users will soon accuse of "silent regression." A detail that matters later: Claude Code defaults to high reasoning effort.

src: Anthropic / Axios

Mar 04
2026

OpusSilent default change: high → medium

Anthropic lowers Claude Code's default reasoning effort to reduce UI-freeze complaints at high effort. The change is disclosed via in-product dialog but most users never notice it. This is the first of three ingredients in what users will soon experience as degradation.

src: Anthropic post-mortem, Apr 23

Early Mar
2026

OpusFirst public complaints about 4.6 quality

Scattered reports on X, Reddit, and GitHub. Hard to distinguish from normal variance. An AMD senior director's GitHub post will later crystallize the frustration: "Claude has regressed to the point it cannot be trusted to perform complex engineering."

src: Axios, GitHub issues

Mar 26
2026

OpusThe thinking-cache bug ships

A caching optimization meant to run once on idle sessions instead runs every turn, silently discarding Claude's prior reasoning. Users experience it as forgetfulness, repetition, and strange tool choices. Usage limits also drain faster than expected because cache misses compound. This is the second ingredient.

src: Anthropic post-mortem, Apr 23

Apr 07
2026

MythosMythos Preview + Project Glasswing announced

Anthropic announces a new frontier model — "strikingly capable at computer security tasks" — and simultaneously announces it won't be released publicly. Instead, 12 launch partners (AWS, Apple, Google, JPMorgan, Microsoft, Nvidia, Cisco, CrowdStrike, Linux Foundation, Broadcom, Palo Alto Networks) plus ~40 additional orgs get access through Project Glasswing, with $100M in usage credits committed.

Claims: Mythos found "thousands" of zero-day vulnerabilities across every major OS and browser. Autonomously discovered and exploited a 17-year-old FreeBSD NFS vulnerability (CVE-2026-4747). 72.4% full-code-execution rate on a 250-trial exploit benchmark.

src: Anthropic, TechCrunch, Schneier

Apr 10
2026

ContextThinking-cache bug fixed

Anthropic ships v2.1.101. But attribution of the problem is not yet public — users still believe something broader is wrong.

src: Anthropic post-mortem

Apr 13
2026

MythosSchneier publishes early skepticism

Bruce Schneier's blog: "This is very much a PR play by Anthropic — and it worked." Notes that the announcement led to Treasury and White House meetings. OpenAI quickly follows with its own "too dangerous to release" cybersecurity model announcement.

src: schneier.com

Apr 16
2026

BothOpus 4.7 launches — the convergence day

Anthropic releases Opus 4.7. The blog post explicitly positions it as a testbed for Mythos-class safety infrastructure: "Opus 4.7 is the first such model... We are releasing Opus 4.7 with safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses. What we learn from the real-world deployment of these safeguards will help us work towards our eventual goal of a broad release of Mythos-class models."

Anthropic publicly concedes that Opus 4.7 does not match Mythos performance. This is unusual: a flagship launch that names a stronger, withheld sibling.

Ships simultaneously: new xhigh effort level, task budgets, 1M context at standard pricing, a high-res vision ceiling (3.75MP), a new tokenizer (1–1.35× token count vs. 4.6), and a Cyber Verification Program for security researchers who need the guardrails relaxed.

src: Anthropic launch blog, Axios, CNBC

Apr 16
2026

OpusAdaptive Reasoning becomes mandatory in chat

The manual Extended Thinking toggle is removed from the Claude web and desktop apps. The model now decides for itself whether a given prompt warrants deep reasoning. Developers in Claude Code can still set explicit effort; end users in the chat UI cannot override.

This is distinct from "model routing." Opus 4.7 is not dispatching your prompt to a weaker sibling — it's the same model choosing how many thinking tokens to spend. But the experience is similar: users feel like responses are inconsistent, and there's no visible dial to turn.

src: Anthropic docs, Xlork blog

Apr 16
2026

OpusA stray system-prompt line lands in Claude Code

A verbosity-reduction instruction ("keep text between tool calls to ≤25 words, final responses to ≤100 words unless needed") ships alongside 4.7. It affects Sonnet 4.6, Opus 4.6, and Opus 4.7. Intended to be cosmetic; actually suppresses problem-solving on hard tasks. This is the third ingredient.

src: Anthropic post-mortem

Apr 16–20
2026

OpusThe backlash crystallizes

A Reddit post titled "Opus 4.7 is not an upgrade but a serious regression" clears 2,300 upvotes in 48 hours. An X post claiming no improvement over 4.6 gets 14,000 likes. The MRCR long-context-retrieval benchmark regression is documented publicly. Narrative hardens: Anthropic is quietly degrading the product to redirect compute to Mythos.

src: Xlork, MindStudio, community trackers

Apr 20
2026

OpusVerbosity-reduction prompt reverted

v2.1.116 ships. The third of the three quality issues is resolved. Root cause is still not public.

src: Anthropic post-mortem

Apr 22
2026

MythosThe "nothingburger" counter-narrative lands

The Register publishes a deep-dive concluding Mythos may be substantially overhyped. Key findings from independent analysts:

• Mozilla CTO Bobby Holley: "We also haven't seen any bugs that couldn't have been found by an elite human researcher."
• VulnCheck researcher Patrick Garrity: maybe 40 confirmed high-severity finds, not thousands.
• The headline 72.4% exploit rate drops to 4.4% when the top two cherry-picked bugs are removed (from Anthropic's own system card, Fig 3.3.3.B).
• A Linux kernel bug Anthropic cited as proof was actually first found by Opus 4.6 — the publicly available model.
• Separately: unauthorized users accessed Mythos by guessing the URL, via data from the Mercor / LiteLLM supply-chain breach.

src: The Register, flyingpenguin, Boing Boing

Apr 23
2026

BothThe reckoning day

Morning: Anthropic publishes "An update on recent Claude Code quality reports." Three separate issues disclosed (the two covered above + the verbosity prompt). API was unaffected throughout. Usage limits reset for all subscribers as a good-faith gesture.

Afternoon: The Register publishes "Claude Opus 4.7 has turned into an overzealous query cop." AUP (Acceptable Use Policy) classifier false positives have surged from ~5/month to 30+ in April alone. Notable example: Golden G. Richard III, director of the LSU Cyber Center, blocked from having Claude proofread a textbook lab because it contained crypto exercises. Another user blocked because a PDF they uploaded contained stream-encoded characters that decoded to "CHARACTER OR FOR DONKEY UNDERNEATH" — a Shrek toy ad.

The Cyber Verification Program exemption works in claude.ai but reportedly doesn't propagate to the Claude Code API, leaving approved security researchers stuck.

src: Anthropic, The Register

Apr 24
2026

NowWhere we stand today

Opus 4.7 remains the flagship. Mythos remains restricted. The "nerfing" narrative has been partly defused by the post-mortem but retains emotional force. The AUP classifier keeps generating complaint threads. The question of whether Opus 4.7 represents progress, regression, or something more ambiguous is genuinely unsettled.

§ 03Three Stories, Separately

For the show, it's worth naming these distinctly. Each has a different human-implications tail. Each maps to a different futures-oriented question worth sitting with.

Story A

The "nerfing" narrative

What users experienced: Claude Code getting measurably worse across March and April. What Anthropic says happened: three unrelated mistakes — a default-effort change, a caching bug, a verbosity prompt — that each affected a different slice of traffic and compounded in users' perception. What critics point out: the company didn't reproduce the issue internally for weeks, and the opacity between "we pushed an experiment" and "your tool got worse" is structural, not incidental.

Tension worth holding When a product updates itself silently and continuously, users can't tell the difference between a bug, a policy change, and a capability ceiling. The company's good faith doesn't resolve the opacity.

Story B

Adaptive Reasoning as the disappearing dial

Opus 4.7 removed a user-facing control. In the chat app, you can no longer tell the model to "think harder." It decides. Anthropic argues this is better on average. Users argue their judgment about task complexity doesn't match the model's, and that workflows built on predictable reasoning depth are now unpredictable.

This isn't routing between models (the common misconception). It's the model self-allocating its own attention. Which is arguably a more interesting futures question: what does it mean to interact with a system that triages you?

Tension worth holding Automating the decision of how much effort a request deserves is a kind of sorting. Even when benign, it makes the user legible to the system before the system is legible to the user.

Story C

The AUP classifier as gatekeeper

This is the one that most resembles what's often meant by "gatekeeping" — a separate safety system that inspects prompts before the model answers, and refuses anything it flags as potential cyber-misuse. Opus 4.7 is explicitly the testbed for the stricter version of this classifier, which Anthropic plans to iterate on before eventually releasing Mythos-class models more widely.

In practice, false positives hit cybersecurity educators and researchers — exactly the people whose legitimate work most resembles the behavior being guarded against. A textbook lab. A PDF of a toy advertisement. Russian-language prompts. The exemption program exists but hasn't fully propagated to the API.

Tension worth holding Every safety system draws a line, and every line produces a class of legitimate users who now need to apply for permission to do their jobs. The question isn't whether to have guardrails — it's who pays the bureaucratic cost of them.

Story D — bonus thread

The Mythos hype cycle in real time

You're watching a capability claim get audited by independent researchers within two weeks of announcement. That's new. The skeptics (Devansh, Davi Ottenheimer, Mozilla's CTO, VulnCheck, Aisle's replication study) aren't saying Mythos is bad — they're saying the framing of "too dangerous to release" is doing rhetorical work that the system card itself doesn't support.

Companion to this: Mythos got leaked via URL-guessing through a contractor-staffing supply-chain breach. The "it's contained" part of "too dangerous to release but contained" got tested inside of two weeks.

Tension worth holding The faster AI capabilities ship, the more "too dangerous to release" becomes both a safety claim and a marketing claim simultaneously — and the public has to hold both readings at once.

"So far we've found no category or complexity of vulnerability that humans can find that this model can't. We also haven't seen any bugs that couldn't have been found by an elite human researcher." — Bobby Holley, Mozilla CTO, after Mythos found 271 flaws in Firefox 150

§ 04Futures-Oriented Questions Worth Sitting With

These are not for resolution on the show. They're invitations — the kind of questions Modem Futura is built to hold open.

On legibility

What happens to trust when we can't tell whether a system is broken, policy-changed, or just thinking differently about us today?

Three months of quality complaints, and the answer turned out to be three unrelated mistakes — but the narrative of deliberate "nerfing" was already doing its work. What does good-faith diagnosis look like when the system is opaque by design?

On agency

What gets lost when the user can no longer decide how much thinking their question deserves?

Adaptive Reasoning might be better on average. But "better on average" is not the same as "right for you." The disappearance of the Extended Thinking toggle is a small moment of autonomy evaporation that will keep happening across interfaces we use daily.

On access

When safeguards fail predictably on legitimate work, what kind of two-tier system are we building?

A cybersecurity professor at LSU cannot proofread his own textbook without applying for an exemption that may not propagate to the tool he uses. Which professions get the friction? Which ones don't?

On narrative

What's the half-life of "too dangerous to release"?

GPT-2 was too dangerous to release. Now we chuckle about it. Mythos was too dangerous to release, and two weeks later a supply-chain breach let unauthorized users guess the URL. The claim is doing work — but what work, for whom?

On capability

When independent researchers can audit a frontier model's claims inside 14 days, who sets the pace of public understanding — the lab, or the replication community?

The Mythos system card's own Figure 3.3.3.B is what undid the headline number. That's a new kind of accountability infrastructure forming in real time.

§ 05Futures Improv seeds (for Sean, not for Andrew to see yet)

Not a finalized set — just seeds that emerge naturally from this material. Can flesh these out in a separate pass if useful:

The dial disappears. Every AI system you use in 2030 decides for itself how much attention your request deserves. The effort level you experience is based on a profile of you. What changes about how humans ask for things?
The two-tier AI economy. Verified cybersecurity researchers, verified medical professionals, verified journalists — everyone else gets the safety-filtered version. What professions become "credentialed" that weren't before?
Reverse-transparency. A lab releases a model and says "it's too dangerous." Independent researchers reply with replication studies inside 72 hours. Capability claims become claims-to-be-audited by default. Who becomes the referee?
The model audits the audit. In five years, the model running your workflow also audits the safety classifier that's gate-keeping its own outputs. What happens to the chain of accountability?

§ 06Sources

Anthropic — Introducing Claude Opus 4.7 launch blog · Apr 16
Anthropic — An update on recent Claude Code quality reports post-mortem · Apr 23
Anthropic — What's new in Claude Opus 4.7 API docs
Anthropic — Project Glasswing initiative page
Anthropic Frontier Red Team — Claude Mythos Preview (technical) Apr 7
Axios — Anthropic releases Claude Opus 4.7, concedes it trails unreleased Mythos Apr 16
CNBC — Anthropic releases Claude Opus 4.7, a less risky model than Mythos Apr 16
TechCrunch — Anthropic debuts preview of powerful new AI model Mythos Apr 7
The Register — Anthropic Mythos shaping up as nothingburger Apr 22
The Register — Claude Opus 4.7 has turned into an overzealous query cop Apr 23
Schneier on Security — On Anthropic's Mythos Preview and Project Glasswing Apr 13
Foreign Policy — Anthropic's Claude Mythos Preview Changes Cyber Calculus Apr 20
Xlork Blog — What's New and Why Developers Are Frustrated Apr 19
MindStudio — Opus 4.7 Review: What Regressed Apr 19
flyingpenguin — The Boy That Cried Mythos independent system-card dissection
Boing Boing — Mythos accessed by guessing the URL Apr 23
Picus Security — The Glasswing Paradox industry perspective