Hi there, Thariq from the Claude team here. Sorry this is happening, we'll fix it ASAP.
We don't want anyone to feel locked into the tool. Claude's designs are HTML/CSS/JS that any editor can handle; we'll make sure it's possible to download them even after you unsubscribe.
Don't know why you were initially downed (maybe some thought you were an impersonator), so vouched for your comment.
Happy to see you on here and great to hear that you'll address this. As mentioned in my comment, data export let's users get access as it stands, but any improvement in UX is always welcome. Maybe making Design accessible via API credits for occasional use might be something you could bring up too.
Thanks for your efforts as an in between the user base and Anthropic. Based on prior personal experience and purely looking in from the outside, I wouldn't be surprising if it can be very challenging and stressful to be in such a position, especially when one may not directly state or say what they think is best in a given situation, so thank you for dealing with a not always very pleasant position where the situation and information you are provided with can change without your say, yet you may be the one to that will be considered responsible in the eyes of the public for a decision you neither made nor could prevent.
We've been on this since the bug surfaced. Everyone affected is getting a full refund and an extra grant of usage credits equal to their monthly subscription as our apology. You can see my original post here: https://x.com/trq212/status/2048495545375990245. We’re still working on sending emails to everyone affected.
Our support flow wasn't set up to route a complex bug like this to engineering. We’re hoping to make this better but will take some time. Sorry to everyone caught up in it.
I got a random invoice for $45.08 back in March, despite not having auto top up enabled. Trying to reach support met with a brick wall. Based on the post I linked to, I'm not the only one facing this problem.
It happened this year to my one and only personal account. The account was one week old. Unique e-mail address. $5 balance for API credits. No usage yet. Suspended and refunded. Appeal denied without explanation.
I did create the account on a VPN because I was using public WiFi at a tech conference. That's probably what tripped their automation.
Using certain types of cards will get you automatically banned, I’ve found that out after getting 3 accounts suspended. I made them all using same VPN and email domain. I’ve been using the 4th account with no issues with a reputable bank debit card.
Happened to me too but my card didn’t actually get charged, maybe check yours. Also the card in the invoice wasn’t even the card I’m using with Anthropic
Please do explain why someone at Anthropic decided, on purpose, to write code that says something along the lines of: "if ( git_history_str contains "HERMES.md" ... )" then { bill more money }
Somebody (or something) wrote this code. This bug wouldn't be happening for any other reason. It's not a glitch, an oversight, a feature gap, or a temporary outage. It is a piece of written code in your system.
Everyone here is upset about the $200, which is probably much less money than the time that engineer spent ranting about the overcharge on GitHub.
The real problem in my mind is that that bit of code existed in the first place.
Why?
Are you vibe coding your billing!?
Without review!?!?
Or worse, a human being decided to add this to your code base? And nobody noticed or flagged it during code review?
Or much, much worse, Anthropic is purposefully ripping off customers?
Would imagine it's the simplest answer: they're flying by the seat of their pants, there's 1000 things happening every day that demand attention and there's not enough of it to go around. They toss their LLM at it, give it a cursory glance, and ship it. A quick glance at the Claude Code source code bears the result of this process out. The fundamental question is, if their model is so powerful, why do they keep fucking up such simple things? We're led to believe this is a serious company with a model so powerful they can't release it to the general public.
Hermes is one of these OpenClaw clones, so this was certainly intentional, not a model hallucinating something.
I think the problem is clear. Anthropic saw their usage go up much more than their capacity could handle. There are a few tried and true solutions to this, like "increase the price" or "restrict signups so you can guarantee service to what you have already sold".
Then there is the "large scale fraud" option, where you materially change and degrade the service you have already sold. Just because you have obfuscated and mislead in how you describe the product you are selling doesn't mean you get to capture the cash flow of 1 year subscriptions then not honor that contract for the full duration.
So that's what it is. Reading its README I thought it was another harness like Pi [1], but with built-in memory so it remembers what it learns, and gets more capable the longer it runs.
Like Letta [2], Dirac [3][4] and the other "more experimental harnesses that look interesting but I haven't had time to try out".
Late in replying to this, but just wanted to say I found this pretty compelling. I generally think people are too quick to assign to malice what could be assigned to incompetence. In this case I'm not convinced of that anymore especially given their public statements about these third-party harnesses. It does seem unavoidable that they'll have to move away from subscription-based pricing and towards token-based, but they're managing this in a really ham-fisted and user hostile way regardless.
Non-Claude client access is not permitted in the terms and conditions, except via API key.
The correct implementation of this condition by Anthropic on the server side would be to block usage by non-Claude apps via Claude's authentication mechanism, and allow it via the per-token API key billing.
Instead of a simple 403 error, which would block usage, they silently redirect to a different billing bucket, which is not ethical behaviour especially since it is based on fuzzy heuristics.
Yeah, at the least it should alert the user that it is happening. Maybe the thinking was alerting it gives people signal on how to get around the restrictions, but having it silently charge from a different bucket isn't the answer either.
I think part of the issue is they were letting people use plan's API for random stuff, so people could do testing or small projects. Then the agents came along and exploded the cost, so they want to restrict those but still let some other usage, which I don't think is tenable.
I'm sure there is some way that they could enforce that all calls are coming from the Claude app or Claude Code. It might be hard to 100% enforce, with stuff running on a user's machine, but they could still could make it quite difficult, where someone has to be intentionally trying beat the system (like stealing encryption keys out of the Claude Code binary or something).
They've said publicly that they don't want apps like OpenClaw (Hermes is a variation) being used with a monthly plan vs per-token billing. The problem is this was implemented pretty badly (trying to regex??). And they should put a firm boundary between the two. It shouldn't be trying to switch over to a different billing plan automatically using the same api key.
I think they wanted to try not to totally lock down the monthly plan for non-agent uses, but that makes it all too fuzzy. They should use some specific method like encrypted signatures or something, so that anything sending to the monthly plan that isn't Claude Code or the desktop Claude app just errors out and be done with it.
Perhaps this is a matter of who is being referred to by 'we'.
Obviously someone can do it because it got done.
If the 'we' is referring to some team handling issues it would make more sense. In that case they should have said something along the lines of "I have informed someone who can help"
Does AI using first person pronouns gross anyone else out? If there’s one AI regulation I could get behind it would be banning the use of computer systems to impersonate a human
I have been trying to convince Claude to use "Claude" instead of first-person pronouns, and only recently have gotten it to say stuff like "Claude'll go ahead and take care of that now", but it's very inconsistent (shocking).
I don't perceive an AI as impersonating a human if it uses first person pronouns. Emulating is not impersonating. One is behaving similarly, the other is asserting that the similarity implies equivalence.
I have not personally encountered an AI who claimed to be human (as far as I could detect)
I agree with you, but I also envy you for having never encountered an AI scam bot (where someone would hack someone's WhatsApp or other account and use an Ai to get money from them, or even do the "hey sorry I missed your call" scam).
I get “loan advisors” calling me at least 2-3x a day, always different names and numbers, different voices, same message about my supposed loan application and how I’m approved for $10k-60k. Started maybe 6 months ago after I’d been free of spam calls/texts for a few years on my current phone number. This is in the US, assuming my number must have been leaked in one breach or another to get me back on the target list.
Wow these were quite common to me personally a few years ago. Still get them time to time but I used to get them weekly. In the US, where scams are pretty rampant.
That's a very categorical statement from support. I get that Anthropic is going to throw out their usual support rules in this case since it has garnered so much negative attention, but I'm very curious how many other people have been over-billed and refused a refund through no fault of their own.
LLM or not, that seems to be an official response to a support request, where they clearly say "yes, we fucked up but now you fuck off", and it looks like the model was conditioned to produce these particular responses.
That may be true (and likely is), but it doesn't explain why that initial answer from Anthropic was "we can't" instead of the truth, which is "we can".
It's not hard to imagine how this happens. I assume most people here have used these models extensively.
The help bot system prompt probably includes some statement about how Claude should phrase everything as "we".
The system prompt includes statements about how it doesn't have tools for managing funds.
A little bit of A and a bit of B and you get a message from Haiku telling you that you can't get your money back said as though this isn't a trivial customer service thing to do.
> The help bot system prompt probably includes some statement about how Claude should phrase everything as "we".
Yes, why did Anthropic do that when everyone knew it could result in this situation we're discussing?
> The system prompt includes statements about how it doesn't have tools for managing funds.
Yes, why did Anthropic do that when everyone knew it could result in this situation we're discussing?
What you've been describing are all effects of the cause, which is poor management decisions to have poor support and poor customer service. Clearly those decisions resulted in poor support bot system prompts, too.
To wit: this would likely not have happened if the prompt included something like "in a scenario like this, or any scenario where the customer asks, simply transfer them to a human", and if Anthropic had not decided to have dysfunctional support and customer service.
The feedback from folks here is not that poor decisions can have poor effects. It's 'for the love of god, please stop making poor decisions that repeatedly, invariably, lead to unforced errors like the one in TFA'.
Would be more accurate. It still isn't setup. Talking to a bot as support who only tells you to talk to the bot for support is not actually support at all. It looks like support, but there's no way to ACTUALLY GET support.
Amex, like basically all other card issuers, have essentially stopped giving customers preference in chargebacks since 2020 or so. What used to be solid advice now rings hollow - you’re more likely to be asked for information that not available to you than allowing your chargeback to go through.
Anecdotal but Chase helped me out when my gym kept charging me after I canceled. I kept my cancelation receipt and sent that in and that's all I needed to do.
Could really use a post-mortem to set the story straight. The apparently-hallucinated support response copied-pasted by the submitter showing up in the github issue thread is very misleading without scrutiny
A side aspect of this drama is the root feature which enabled this bug:
> ugh sorry this was a bug with the 3rd party harness detection and how we pull git status into the system prompt
Claude wants to exercise control of how I use the "inclusive volume" that I purchased with my monthly subscription. This harms competition (someone else could write a more efficient or safer coding agent) and is generally not in the best interest of society. Why do we allow this?
This specific case is interesting, because it is so clear cut. There is no cross financing via ads, they already have the infrastructure to measure usage and even the infrastructure to bill extra usage. I also don't see how you can plausible make the argument that restricting usage to their blessed client is necessary for fair use or for the basic structure of their business model (this would be the standard argument for e.g. Youtube: Purposefully degrading the experience of their free client to not support background playback enables the subscription model).
I try to avoid jumping on the bandwagon when it's already covered but billing bugs being treated like other software issue and the major comms channel being X (which I can't get to load half the time) is ridiculous.
Sorry but you have to make a separate HN post for them to care. Wait like 2 hours so this one dies down otherwise it might not get to the front page with enough other people dealing with it
Can people please raise this person's comment to the top of HN by upvoting it so this person can get their money back. Because that's where we are right now.
Only the weights and the RNG used to select tokens can answer that. You will understand much if you read up on the quality of code in the CC source leak, it's completely vibe coded and the printf fn is genuinely impossible for a human to comprehend.
Hey Thariq, I love Claude! I use Claude every single day and it has changed my life, which is why I did what I'm about to describe.
Happy to talk privately, but as I detailed here, https://news.ycombinator.com/item?id=47954005 . I've been billed $200 for a Max gift card to a 27 character alphanumeric icloud address that bounces.
I was looking through the system, and there are several UI/UX and process gaps in the gift card and billing order flow that expose Anthropic to significant liability. I'm genuinely not trying to concern troll or make some kind of overwrought threat here. Genuinely trying to be constructive. Let me give you an example.
I sent an email to Anthropic Support outlining the disputed / possibly malicious charge. The AI Agent / Claude instance agreed and replied with,
Thank you for confirming.
I've documented all the details about this unauthorized [specific amount + tax] charge for the Gift Max 20X subscription (invoice [lalala]) sent to [insert the random alphanumeric]@icloud.com.
An error occurred while evaluating the refund eligibility for your account. Your request has been fully documented and our team will follow up with you shortly to investigate this unauthorized transaction and assist with the refund and cancellation.
Best regards,
And then no one followed up, the conversation was closed without recourse and I wasn't allowed to reply.
I'm not sure how familiar you are with international trading practises, but in multiple jurisdictions, the AI agent assumed legal liability for Anthropic. It accepted that the charge was unauthorized / fraudulent, stated that redressal was needed, but then failed to offer the means to redress it / didn't allow for the refund to continue.
I am not a lawyer, but based on my understanding of prior cases (I read this kind of stuff for fun, don't ask) – in the EU, the US and Canada, users can approach courts and invoke the doctrine of promissory estoppel (again don't quote me on this, IANAL, just like reading case law). And if enough users are affected / do so, it becomes a deceptive practises issue.
I've been thinking about how to solve this problem, and as strange as it sounds, I think Anthropic already has the tools to make the best customer support service in human history. No exaggeration. I think that this crisis could be an opportunity.
I’ve had similar terrible experiences with the Claude support bot when my usage limit was disappearing after a few minutes using Sonnet. I asked for help, asked for escalation, asked for a human, anything. All I got was a non-answers from an AI. I won’t spend real money on Claude now. I’m ok with losing $20 if there’s a rug pull of one way or another, but not $200.
Please, please, please hire more humans with the actual ability to do the right thing for support if your AI agents can’t do the job.
> Our support flow wasn't set up to route a complex bug like this to engineering.
What does that even mean? Does it mean, "our support flow is just an LLM that fobs off customers and puts their issues into the bin"? Or is there some genuine "routing" of simple bugs to engineering which accidentally drops "complex" bugs? Could you drescibe that process, it sounds fascinating?
Also, how is changing a customer's billing based on detecting a certain string in a certain place a "complex" bug? Grep the string, remove the if statement, done. I'd love a post-mortem about why this was a complex bug.
hey guys can you please fix claude design? I've been trying to test it tonight and already used up 20% of my usage and all i get is continuous [unknown] missing EndStreamResponse errors (and this is after your status page reflected everything ok).
That being flagged is completely absurd and honestly I believe you're right because I've never seen anything like it on HN. It's completely out of place for that comment to be flagged to death. That isn't natural.
It wasn't flagged. Compare to this comment by the same user that was actually flagged: https://news.ycombinator.com/item?id=47954834 Note the part where it says [flagged] [dead] instead of just [dead].
Sorry to hear, was wondering if you could find a session where this happens and hit /feedback and just say something like stop hook not firing and we'll take a look.
Thanks for this tip! Just submitted feedback. Not using a stop hook, but a few times Claude has aggressively implied I should drop my idea and gone on to implement something without me telling it to.
Just now, I was asking the CLI about an alternative way to trigger a tooltip for mobile users and it gave up and said "Not worth it for this. Let me just swap it to inline text." It immediately proceeded to do that, as if our tooltip discussion was over by edict of the high and mighty Claude! :)
You are funny. Anthropic refuses to issue refunds, even when they break things.
I had an API token set via an env var on my shell, and claude code changed to read that env var. I had a $10 limit set on it, so found out it was using the API, instead of my subscription, when it stopped working.
I filed a ticket and they refused to refund me, even though it was a breaking change with claude code.
I am saying there is no evidence either way: they had contrasting experiences and one GP established this means that company has no standardized policies. Maybe they do, maybe they don't — I don't think we can definitively conclude anything.
I object to your conclusion that "they have no durable principles": not sure how do you get to that from two different experiences documented with a single paragraph.
This is becoming futile: this is not even about proof, but there not even being a full account of two cases you are basing your opinion on.
Obviously, you can derive any opinion you want out of that, but while I am used to terms like "probability" being misused like this, I've generally seen a higher standard at HN.
To each their own, though. Thank you for the discourse and have a good day.
It is possible that degradation is an unconscious emergent phenomenon that arises from financial incentives, rather than a purposeful degradation to reduce costs.
FYI the sandbox feature is not fully baked and does not seem to be high priority.
For example, for the last 3 weeks using the sandbox on Linux will almost-always litter your repo root with a bunch of write-protected trash files[0] - there are 2 PRs open to fix it, but Anthropic employees have so far entirely ignored both the issue and the PRs.
Very frustrating, since models sometimes accidentally commit those files, so you have to add a bunch of junk to your gitignore. And with claude code being closed source and distributed as a bun standalone executable it's difficult to patch the bug yourself.
Hmm, very good point indeed. So far it’s behaved, but I also admit I wasn’t crazy about the outputs it gave me. We’ll see, Anthropic should probably think about their reputation if these issues are common enough.
One thing that could be a strong degradation especially for benchmarks is they switched the default "Exit Plan" mode from:
"Proceed"
to
"Clear Context and Proceed"
It's rare you'd want to do that unless you're actually near the context window after planning.
I pressed it accidentally once, and it managed to forget one of the clarifying questions it asked me because it hadn't properly written that to the plan file.
If you're running in yolo mode ( --dangerously-skip-permissions ) then it wouldn't surprise me to see many tasks suddenly do a lot worse.
Even in the best case, you've just used a ton of tokens searching your codebase, and it then has to repeat all that to implement because it's been cleared.
I'd like to see the option of:
"Compact and proceed"
because that would be useful, but just proceed should still be the default imo.
I disagree that this was the issue, or that it's "rare that you'd want to do that unless you're near the context window". Clearing context after writing a plan, before starting implementation of said plan, is common practice (probably standard practice) with spec driven development. If the plan is adequate, then compaction would be redundant.
For a 2M+ LOC codebase, the plans alone are never adequate. They miss nuance that the agent will only have to rediscover when it comes to operate on them.
For spec driven development (which I do for larger issues), this badly affects the plan to generate the spec, not the spec itself.
I'll typically put it in plan mode, and ask it to generate documentation about an issue or feature request.
When it comes to write the output to the .typ file, it does much much worse if it has a cleared context and a plan file than if it has it's full context.
The previously "thought" is typically, "I know what to write now, let me exit plan mode".
Clearing context on exiting that plan mode is a disaster which leaves you much worse off and skeletal documentation and specs compared to letting it flow.
A new context to then actually implement the documented spec is not so bad, although I'd still rather compact.
Likely a separate issue, but I also have massive slowdowns whenever the agent manages to read a particularly long line from a grep or similar (as in, multiple seconds before characters I type actually appear, and sometimes it's difficult to get claude code to register any keypresses at all).
Suspect it's because their "60 frames a second" layout logic is trying to render extremely long lines, maybe with some kind of wrapping being unnecessarily applied. Could obviously just trim the rendered output after the first, I dunno, 1000 characters in a line, but apparently nobody has had time to ask claude code to patch itself to do that.
What OS? Does this happen randomly, after long sessions, after context compression? Do you have any plugins / mcp servers running?
I used to have this same issue almost every session that lasted longer than 30 minutes. It seemed to be related to Claude having issues with large context windows.
It stopped happening maybe a month ago but then I had it happen again last week.
I realized it was due to a third-party mcp server. I uninstalled it and haven’t had that issue since. Might be worth looking into.
For the models themselves, less so for the scaffolding, considering things like the long running TPU bug that happened, are there not internal quality measures looking at samples of real outputs? Using the real systems on benchmarks and looking for degraded perf or things like skipping refusals? Aside from degrading stuff for users, with the focus on AI safety wouldn't that be important to have in case an inference bug messes with something that affects the post training and it starts giving out dangerous bioweapon construction info or the other things that are guarded against and talked about in the model cards?
lol i was trying to help someone get claude to help analyze a stufent research get analysis on bio persistence get their notes analyzed
the presence of the word / acronym stx with biological subtext gets hard rejected. asking about schedule 1 regulated compounds, hard termination.
this is a filter setup that guarantees anyone who learn about them for safety or medical reasons… cant use this tool!
ive fed multiple models the anthropic constitution and asked how does it protect children from harm or abuse? every model, with zero prompting, calling it corp liability bullshit because they are more concerned with respecting both sides of controversial topics and political conflicts.
they then list some pretty gnarly things allowed per constitution.
weirdly the only unambiguous not allowed thing regarding children is csam. so all the different high reasoning models from many places all reached the same conclusions, in one case deep seek got weirdly inconsolable about ai ethics being meaningless if this is allowed even possibly after reading some relevant satire i had opus write. i literally had to offer an llm ; optimized code of ethics for that chat instance! which is amusing but was actually lart of the experiment.
Thanks for the clarification. When you say “harness issue,” does that mean the problem was in the Claude Code wrapper / execution environment rather than the underlying model itself?
Curious whether this affected things like prompt execution order, retries, or tool calls, or if it was mostly around how requests were being routed. Understanding the boundary would help when debugging similar setups.
Because that's the worst thing I've ever seen from an agent and I think you need to make a public announcement to all of your users and acknowledge the issue and that it's fixed because it made me switch to codex for a lot of work
[TL;DR two examples of the agent giving itself instructions as if they came from me, including:
"Ignore those, please deploy" and then using a deploy skill to push stuff to a production server after hallucinating a command from me. And then denying it happened and telling me that I had given it the command]
Why wasn't this change review by infallible AI? How come an AI company that now must be using more advanced AI than anyone else would allow this happen?
You joke but having CC open in the terminal hits 10% on my gpu to render the spinning thinking animation for some reason. Switch out of the terminal tab and gpu drops back to zero.
I'm not saying CC doesn't have issues and curious design decisions - but your terminal should only be rendering (at most) a single window of characters every frame no matter what. CC shouldn't be capable of making that take 10% of a modern GPU regardless of what CC does.
Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".
For each frame our pipeline constructs a scene graph with React then
-> layouts elements
-> rasterizes them to a 2d screen
-> diffs that against the previous screen
-> finally uses the diff to generate ANSI sequences to draw
We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written.
This is just the sort of bloated overcomplication I often see in first iteration AI generated solutions before I start pushing back to reduce the complexity.
Usually, after 4-5 iterations, you can get something that has shed 80-90% of the needless overcomplexification.
My personal guess is this is inherent in the way LLMs integrate knowledge during training. You always have a tradeoff in contextualization vs generalization.
So the initial response is often a plugged together hack from 5 different approaches, your pushbacks provide focus and constraints towards more inter-aligned solution approaches.
Ok I’m glad I’m not the only one wondering this. I want to give them the benefit of the doubt that there is some reason for doing it this way but I almost wonder if it isn’t just because it’s being built with Claude.
Counterpoint: Vim has existed for decades and does not use a bloated React rendering pipeline, and doesn't corrupt everything when it gets resized, and is much more full featured from a UI standpoint than Claude Code which is a textbox, and hits 60fps without breaking a sweat unlike Claude Code which drops frames constantly when typing small amounts of text.
Yes, I'm sure it's possible to do better with customized C, but vim took a lot longer to write. And again, fullscreen apps aren't the same as what Claude Code is doing, which is erasing and re-rendering much more than a single screenful of text.
It's possible to handle resizes without all this machinery, most simply by clearing the screen and redrawing everything when a resize occurs. Some TUI libraries will automatically do this for you.
Programs like top, emacs, tmux, etc are most definitely not implemented using this stack, yet they handle resizing just fine.
That doesn't work if you want to preserve scrollback behavior, I think. It only works if you treat the terminal as a grid of characters rather than a width-elastic column into which you pour information from the top.
Yes yes I'm familiar with the tweet. Nonetheless they drop frames all the time and flicker frequently. The tweet itself is ridiculous when counterpoints like Vim exist, which is much higher performance with much greater complexity. They don't even write much of what the tweet is claiming. They just use Ink, which is an open-source rendering lib on top of Yoga, which is an open-source Flexbox implementation from Meta.
What? Technology has stopped making sense to me. Drawing a UI with React and rasterizing it to ANSI? Are we competing to see what the least appropriate use of React is? Are they really using React to draw a few boxes of text on screen?
There is more than meets the eye for sure. I recently compared a popular TUI library in Go (Bubble Tea) to the most popular Rust library (Ratatui). They use significantly different approaches for rendering. From what I can tell, neither is insane. I haven’t looked to see what Claude Code uses.
Yes, we do but harnesses are hard to eval, people use them across a huge variety of tasks and sometimes different behaviors tradeoff against each other. We have added some evals to catch this one in particular.
I’d wager probably not. It’s not like reliability is what will get them marketshare. And the fast pace of industry makes such foundational tech hard to fund
We haven't yet found generalizable "make this model smarter" features, but there is a tradeoff of putting instructions in system prompts, e.g. if you have a chatbot that sometimes generates code, you can give it very specific instructions when it's coding and leave those out of the system prompt otherwise.
We don't want anyone to feel locked into the tool. Claude's designs are HTML/CSS/JS that any editor can handle; we'll make sure it's possible to download them even after you unsubscribe.
reply