Incident Report for Anthropic

  • Do they credit your account if you were impacted? Or it's just "sorry 'bout 'dat month of trash"?

    Unfortunate timing, as I am rooting for Anthropic as the underdog, but feel compelled to use whatever works best. Since mid-August I've demoted Claude to only putting the fire on UIs and am getting amazing results with GPT-5 for everything else. Given the nonstop capacity warnings on codex cli, I might not be the only one.

  • My guess would be that they tried to save money with speculative decoding and they had too loose thresholds for the verification stage.

    As someone who has implemented this myself, I know that it’s pretty easy to make innocent mistakes there. And the only visible result is a tiny distortion of the output distribution which only really becomes visible after analysing thousands of tokens. And I would assume that all providers are using speculative decoding by now because it’s the only way to have good inference speed at scale.

    As a quick recap, you train a small model to quickly predict the easy tokens, like filler words, so that you can jump over them in the recurrent decoding loop. That way, a serial model can predict multiple tokens per invocation, thereby easily doubling throughput.

    And the fact that they need lots of user tokens to verify that it works correctly would nicely explain why it took them a while to find and fix the issue.

  •   "we often make changes intended to improve the efficiency and throughput of our models.." 
    https://status.anthropic.com/incidents/h26lykctfnsz

    I thought Anthropic said they never mess with their models like this? Now they do it often?

  • I think this is directly related to https://x.com/sama/status/1965110064215458055

    And I think it was 100% on purpose that they degraded the model performance as Claude Code got so popular and they either ran out of capacity or were losing money too fast.

    But now that people are fleeing to Codex as it improved so much during the time, they had to act now.

  • So they aren’t saying what the bug was that caused this issue? Would love a more detailed explanation, what could possibly cause the model degradation apart from potentially pointing the queries to the wrong model?

  • This is why it is hard to take a subscription or dependency on them, if they degrade the services willy nilly. Bait and switch tactic.

    In Cursor I am seeing varying degrees of delays after exhausting my points, for On-Demand Usage. Some days it works well, other days it just inserts a 30s wait on each message. What am I paying for? You never know when you buy.

  • The model providers should analyse the tone of the instructions.

    Before I finally gave up on Claude Code, I noticed that I got more aggressive towards it, the more stupid it got as I could not believe how dumb it started to be.

    And I am sure I was not the only one.

  • There are loads of people who just used Claude and left unimpressed and moved on to something else. They would never know about this regression.

    And this bad memory might stick for a while.

  • You're absolutely right! The degraded model quality finally pushed me to stop paying for the max plan. Still on the Pro for now.

  • Anthropic can't seem to get a win lately. Claude code hangover, their lunch is getting eaten by Chinese OSS on the value side and GPT-5 High on the quality side.

    I may be in a minority but I am still quite bullish on them as a company. Even with GPT-5 out they still seem to have a monopoly on taste - Claude is easily the most "human" of the frontier models. Despite lagging in features compared to ChatGPT Web, I mostly ask Claude day-to-day kinds of questions. It's good at inferring my intent and feels more like a real conversation partner. Very interested to see their next release.

  • My biggest concern now is — if the issue they have is as vague as “reports of degraded quality”, how do they even approach fixing it? And what measurable criteria will they use, to declare that it is fixed? Would they take a vibes-check opinion poll?

    Curious why they can’t run some benchmarks with the model (if they suspect the issue is with the model itself) or some agentic coding benchmarks on Claude-code (if the issue might be with the scaffolding, prompts etc).

  • Here’s a report: Claude Code (the software) is getting worse by the day.

    Removing the shown token comsumption rates (which allowed understanding when tokens were actually being sent / received!) … sometimes hiding the compaction percentage … the incredible lag on ESC interruption on long running sessions, the now broken clearing of the context window content on TASK tool usage

    Who the fuck is working on this software and do they actually use it themselves?

    Maybe the quality of Claude Code on any given day is indicative of whether their models are degraded …

  • what kind of incident report is this ? “It’s a bug, we fixed it !” - Anthropic

  • How do people even identify degraded output when the output for the same input can change so much each time its submitted?

  • Anthropic only has _one_ product that people want: Claude Code. Everything else about their offerings sucks compared to the competition:

    - shitty voice to text (why not just use Whisper at this point?)

    - clunky website

    - no image/video generation models

    - DeepResearch sucks big time

    - "Extended Thinking" doesn't seem to do much "thinking". I get the same results without it.

    - API too expensive for what it is.

    - No open-weight model to boost their reputation. Literally every other player has released an open model at this point..

  • One of the many reasons why any advice du jour on "just use this methodology to make agentic coding produce amazing results" is utter crap.

  • This RCA is too vague: ‘a bug’

    I want to know how i could have been impacted.

  • One man’s bug is another man’s load balancing experiment.

  • Opus has been utter garbage for the last one month or so.

  • >Importantly, we never intentionally degrade model quality as a result of demand or other factors, and the issues mentioned above stem from unrelated bugs.

    Sure. I give it a few hours until the prolific promoters start to parrot this apologia.

    Don't forget: the black box nature of these hosted services means there's no way to audit for changes to quantization and model re-routing, nor any way to tell what you're actually getting during these "demand" periods.