Hacker Newsnew | past | comments | ask | show | jobs | submit | rudedogg's commentslogin

This is more like if Google took action against Thunderbird and open-source email clients

No, because in those cases you're still a user of gmail. When you tell people your email address, or send people email, and it contains "@gmail.com", you're still implicitly advertising for Google. From Google's perspective that's still worth the few KB per day of bandwidth and 1GB storage (which the vast majority of people never use the entirety of, anyway) they're giving away.

But when you use gmail accounts as file storage, you're both a higher-cost user and also doing nothing to further Google's ecosystem (since the email address itself is probably not being used for genuine messaging at all).


And here, you're still using Claude Opus, and when people ask you what you used, you'd say OpenCode client with Claude (Thunderbird client with Gmail).

As analogies go it's pretty close.


there is nothing about claude code that prevents you from using it for non coding use cases. nothing that happens in open code or any harness for that matter is hidden from anthropic. neither does open code allow access in some nefarious use case that claude code does not.

the difference is not like the difference between gmail and gmailfs like you seem to be misunderstanding. a more accurate comparison would be the difference between curl, or httpie vs postman.


It's not analogous at all because Google intentionally provided interfaces for those clients and even instructions for using them.

An analogous situation would be if someone reverse engineered the Google Maps API and provided their own app that showed maps using the Google Maps data.


And if Google Maps charged per tile viewed, so the user pays the same amount regardless of which maps client they used, would your opinion hold?

I get that it’s a ToS violation, but I’m saying it shouldn’t be. They’re trying to make the harness the moat because they all have no moat.


> And if Google Maps charged per tile viewed, so the user pays the same amount regardless of which maps client they used, would your opinion hold?

Yes. Why wouldn't it hold?

Anthropic has a pay-per-token API. You can use OpenCode with it.

Maybe my consistency comes from having worked with contracts and agreements in the real world, where the end user doesn't get to pick and choose which terms they want to abide by.

When you sign up to use a service, you're not signing up to use it however you would like, on your own terms. You're paying for a service that they offer. They are not obligated to continue offering it to you if you try to use it a different way.


Anthropic has no issue with the use of OpenCode using Anthropic's API which does charge per token.

Google explicitly allows third party email clients to work with Gmail, so no that hypothetical does not apply to this situation at all.

My point is that model providers are just a compute service, and should have no say in what sends the data, or displays the data. Especially when they only bill based on the quantity of data.

They are basically a utility.


They have an API for exactly that. You can use it.

They offer a separate plan with discounts for use with their tools. You can also choose to take advantage of those discounts with the monthly fee, within the domain where that applies. You cannot, however, expect to demand that discount to apply to anything you want.

You can argue about what you want it to be all day long, but when you go to the subscription page and choose what to purchase it's very clear what you're getting.

> They are basically a utility

Utilities like my electric company also have different plans for different uses. I cannot, for example, sign up for a residential plan and then try to connect it to my commercial business, even though I'm consuming power from them either way.

Utilities do not work like that. They do have contractual agreements about how you can use the resources provided.


> And on top of it, if you develop for native macOS, There’s no official tooling for visual verification. It’s like 95% of development is web and LLM providers care only about that.

I think this is built in to the latest Xcode IIRC


I think the real divide is over quality and standards.

We all have different thresholds for what is acceptable, and our roles as engineers typically reflect that preference. I can grind on a single piece of code for hours, iterating over and over until I like the way it works, the parameter names, etc.

Other people do not see the value in that whatsoever, and something that works is good enough. We both are valuable in different ways.

Also, theres the pace of advancement of the models. Many people formed their opinions last year, and the landscape has changed a lot. There’s also some effort requires in honing your skill using them. The “default” output is average quality, but with some coaxing higher quality output is easily attained.

I’m happy people are skeptical though, there are a lot of things that do require deep thought, connecting ideas in new ways, etc., and LLMs aren’t good at that in my experience.


> I think the real divide is over quality and standards.

I think there are multiple dimensions that people fall on regarding the issue and it's leading to a divide based on where everyone falls on those dimensions.

Quality and standards are probably in there but I think risk-tolerance/aversion could be behind some how you look at quality and standards. If you're high on risk-taking, you might be more likely to forego verifying all LLM-generated code, whereas if you're very risk-averse, you're going to want to go over every line of code to make sure it works just right for fear of anything blowing up.

Desire for control is probably related, too. If you desire more control in how something is achieved, you probably aren't going to like a machine doing a lot of the thinking for you.


This. My aversion to LLMs is much more that I have low risk tolerance and the tails of the distribution are not well-known at this point. I'm more than happy to let others step on the land mines for me and see if there's better understanding in a year or two.

I think there is more to it than that.

I am a high quality/craftsmanship person. I like coding and puzzling. I am highly skilled in functional leaning object oriented deconstruction and systems design. I'm also pretty risk averse.

I also have always believed that you should always be "sharpening your axe". For things like Java delelopment or things where I couldn't use a concise syntax would make extensive use of dynamic templating in my IDE. Want a builder pattern, bam, auto-generated.

Now when LLMs came out they really took this to another level. I'm still working on the problems.. even when I'm not writing the lines of code. I'm decomposing the problems.. I'm looking at (or now debating with the AI) what is the best algorithm for something.

It is incredibly powerful.. and I still care about the structure.. I still care about the "flow" of the code.. how the seams line up. I still care about how extensible and flexible it is for extension (based on where I think the business or problem is going).

At the same time.. I definately can tell you, I don't like migrating projects from Tensorflow v.X to Tenserflow v.Y.


> I'm looking at (or now debating with the AI) what is the best algorithm for something.

That line always makes me laugh. There’s only 2 points of an algorithm, domain correctness and technical performance. For the first, you need to step out of the code. And for the second you need proofs. Not sure what is there to debate about.


Not true. There is also cost, money or opportunity. Correctness or performance isn't binary -- 4 or 5 nines, 6 or 7 decimal precision, just to name a few. That drives a lot discussion.

There may be other considerations as well -- licensing terms, resources, etc.


I think it's a little bit more complicated.

I, for example, would claim to be rather risk-tolerant, but I (typically) don't like AI-generated code.

The solution to the paradox this creates if one considers the model of your post is simple:

- I deeply love highly elegant code, which the AI models do not generate.

- I cannot stand people (and AIs) bullshitting me; this makes me furious. I thus have an insanely low tolerance for conmen (and conwomen and conAIs).


I think this is a false dichotomy because which approach is acceptable depends heavily on context, and good engineers recognize this and are capable of adapting.

Sometimes you need something to be extremely robust and fool-proof, and iterating for hours/days/weeks and even months might make sense. Things that are related to security or money are good examples.

Other times, it's much more preferable to put something in front of users that works so that they start getting value from it quickly and provide feedback that can inform the iterative improvements.

And sometimes you don't need to iterate at all. Good enough is good enough. Ship it and forget about it.

I don't buy that AI users favor any particular approach. You can use AI to ship fast, or you can use it to test, critique, refactor and optimize your code to hell and back until it meets the required quality and standards.


Yes, it is a false dichotomy but describes a useful spectrum. People fall on different parts of the spectrum and it varies between situations and over time as well. It can remind one that it is normal to feel different from other people and different from what one felt yesterday.

> Also, theres the pace of advancement of the models. Many people formed their opinions last year, and the landscape has changed a lot.

People have been saying this every year for the last 3 years. It hasn't been true before, and it isn't true now. The models haven't actually gotten smarter, they still don't actually understand a thing, and they still routinely make basic syntax and logic errors. Yes, even (insert your model of choice here).

The truth is that there just isn't any juice to squeeze in this tech. There are a lot of people eagerly trying to get on board the hype train, but the tech doesn't work and there's no sign in sight that it ever will.


Maybe I'm solving different problems to you, but I don't think I've seen a single "idiot moment" from Claude Code this entire week. I've had to massage things to get them more aligned with how I want things, but I don't recall any basic syntax or logic errors.

With the better harness in Claude code and the >4.5 model and a somewhat thought out workflow we’ve definitely arrived at a point where I find it very helpful. The less you can rely on one-shot and more give meaningful context and a well defined testable goal the better it is. It honestly does make me worry how much better can it get and will some percentage of devs become obsolete. It requires less hand holding than many people I’ve worked with and the results come out 100x faster

I saw a few (Claude Sonnet 4.6), easily fixed. The biggest difference I’ve noticed is that when you say it has screwed up it much less likely to go down a hallucination path and can be dragged back.

Having said that, I’ve changed the way I work too: more focused chunks of work with tight descriptions and sample data and it’s like having a 2nd brain.


Very good way to describe it. I am enjoying Opus a lot.

I swear some people are using some other tech than I'm using the past few months. Where I work, Claude Code is developing major changes to our very large code base (many repos, millions upon millions of lines of really important code) and pushing to prod regularly. Even the most bearish of engineers are now using it to ship important code daily. It still has issues and you have to know how to use it, but it is a shocking productivity increase (although Amdahl's Law applies for software engineering, too. Coding is only a relatively small percentage of what is done)

> I swear some people are using some other tech than I'm using the past few months.

I'm curious about this discrepancy too. I assume that you're being facetious and the discrepancy is with people's perceptions of AI’s capabilities or usefulness or whatever subjective metric. Some, myself included, seem to perceive it as basically useless, while others, yourself included, seem to imply that it's at a level where it genuinely replaces competent coders.

If the discrepancy were small, it could just be chalked up to the metric being subjective. But it seems to be like night and day. A difference of orders of magnitude. I wanna know what's going on there.


Did you lay off some engineers or keep them but make the software better?

All I know is it feels very different using it now then it did a year ago. I was struggling to get it to do anything too useful a year ago, just asking it to do a small function here or there, often not being totally satisfied with the results.

Now I can ask an agent to code a full feature and it has been handling it more often than not, often getting almost all of the way there with just a few paragraphs of description.


And yet I just eliminated three months (easily) of tech debt on our billing system in the past two weeks.

Thanks for mentioning, but I did update it to High Effort when I got the notification

I've been thinking this too. I frequently do deep research on some systems programming technique, ask it to generate a .md for it, and then I use that in later sessions with Claude Code "look at the research I collected in {*-research}.md and help me explore ways to apply it to {thing}".

At the research step it frequently (always?) uses memory to direct/scope the research to what I typically work on, but I think that kind of pigeon holes the model and what it explores. And the memory doesn't quite capture all the areas I'm interested in, or want to directly apply the research to.

And regarding the crap in memories, I found the same. Mine at work mentioned I'm an expert at a business domain I have almost zero experience with.

I feel like the companies building this stuff accept a lot of "slop" in their approach, and just can't see past building things by slopping stuff into prompts. I wish they'd explore more rigid approaches. Yes, I understand "the bitter lesson" but it seems obvious to me some traditional approaches would yield better results for the foreseeable future. Less magic (which is just running things through the cheapest model they have and dumping it in every chat). It seems like poison.

Related: https://vercel.com/blog/agents-md-outperforms-skills-in-our-...

Also, agent skills are usually pure slop. If you look through https://skills.sh on a framework/topic you're knowledgeable in you'll be a bit disheartened. This stuff was pioneered by people who move fast, but I think it's now time to try and push for quality and care in the approach since these have gotten good enough to contribute to more than prototype work.


It's not much but I was planning to cancel my Anthropic subscription to try Codex over the weekend, but I'll skip that. I don't want to support a company with someone like this at the top. Massive donations to the administration, sneaky backdoor deals. No thanks, fuck you.


> And if China gets AI, they're more than likely to use it to further raise people out of poverty and automate away more menial jobs without making those displaced workers homeless.

Your comment is very optimistic. But the quoted part reminded me of something I heard (again) about China using slave labor in their lithium mines:

https://www.state.gov/forced-labor-in-chinas-xinjiang-region...


I made a Zig agent skill yesterday if interested: https://github.com/rudedogg/zig-skills/

Claude getting the ArrayList API wrong every time was a major reason why

It’s AI generated but should help. I need to test and review it more (noticed it mentions async which isn’t in 0.15.x :| )


The linked blog post about making this is an excellent read.


Thanks! I think I spent as much time writing the post as I did making the skill, so I’m happy someone got some value out of it.


Fighting fire with fire


A little bit! I wrote a long blog post about how I made it, I think the strategy of having an LLM look at individual std modules one by one make it actually pretty accurate. Not perfect, but better than I expected


From what I’ve read there’s a pretty sizable performance gap between SQLite and pglite (with SQLite being much faster).

I’m excited to see things improve though. Having a more traditional database, with more features and less historical weirdness on the client would be really cool.

Edit: https://pglite.dev/benchmarks actually not looking too bad.. I might have something new to try!


This is kind of where I'm at.

I don't think everything is for certain though. I think it's 50/50 on whether Anthropic/whoever figures out how to turn them into more than a boilerplate generator.

The imprecision of LLMs is real, and a serious problem. And I think a lot of the engineering improvements (little s-curve gains or whatever) have caused more and more of these. Every step or improvement has some randomness/lossiness attached to it.

Context too small?:

- No worries, we'll compact (information loss)

- No problem, we'll fire off a bunch of agents each with their own little context window and small task to combat this. (You're trusting the coordinator to do this perfectly, and cutting the sub-agent off from the whole picture)

All of this is causing bugs/issues?:

- No worries, we'll have a review agent scan over the changes (They have the same issues though, not the full context, etc.)

Right now I think it's a fair opinion to say LLMs are poison and I don't want them to touch my codebase because they produce more output I can handle, and the mistakes they make are too subtle that I can't reliably catch them.

It's also fair to say that you don't care, and your work allows enough bugs/imprecision that you accept the risks. I do think there's a bit of an experience divide here, where people more experienced have been down the path of a codebase degrading until it's just too much to salvage – so I think that's part of why you see so much pushback. Others have worked in different environments, or projects of smaller scales where they haven't been bit by that before. But it's very easy to get to that place with SOTA LLMs today.

There's also the whole cost component to this. I think I disagree with the author about the value provided today. If costs were 5x what they are now, I think it would be a hard decision for me to decide if they are worth it. For prototypes, yes. But for serious work, where I need things to work right and be reasonably bug free, I don't know if the value works out.

I think everyone is right that we don't have the right architecture, and we're trying to fix layers of slop/imprecision by slapping on more layers of slop. Some of these issues/limitations seem fundamental and I don't know if little gains are going to change things much, but I'm really not sure and don't think I trust anyone working on the problem enough to tell me what the answer is. I guess we'll see in the next 6-12 months.


> I do think there's a bit of an experience divide here, where people more experienced have been down the path of a codebase degrading until it's just too much to salvage – so I think that's part of why you see so much pushback.

When I look back over my career to date there are so many examples of nightmare degraded codebases that I would love to have hit with a bunch of coding agents.

I remember the pain of upgrading a poorly-tested codebase from Python 2 to Python 3 - months of work that only happened because one brave engineer pulled a skunkworks project on it.

One of my favorite things about working with coding agents is that my tolerance for poorly tested, badly structured code has gone way down. I used to have to take on technical debt because I couldn't schedule the time to pay it down. Now I can use agents to eliminate that almost as soon as I spot it.


I've used Claude Code to do the same (large refactor). It has worked fairly well but it tends to introduce really subtle changes in behaviour (almost always negative) which are very difficult to identify. Even worse if you use it to fix those issues it can get stuck in a loop of constantly reintroducing issues which are slightly different leading to fixing things over and over again.

Overall I like using it still but I can also see my mental model of the codebase has significantly degraded which means I am no longer as effective in stopping it from doing silly things. That in itself is a serious problem I think.


Yes, if you don't stay on top of things and rule with an iron fist, you will take on tons of hidden tech debt using even Opus 4.5. But if you manage to review carefully and intercede often, it absolutely is an insane multiplier, especially in unfamiliar domains.


LLM is like a chef that cooks amazing meals in no time, but his meals often contain small pieces of broken glass.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: