More

3uler · 2026-05-25T10:34:45 1779705285

Golang is an amazing runtime with a bad language, one that conflates simple with easy. I view it the same way I view Java: a fine choice for a corporation, but nothing to love. Although Java’s gotten a lot better lately.

DeathArrow · 2026-05-25T10:36:18 1779705378

For me having the proper tool for the job trumps loving. I don't have to love the language, I have to love the process and the end result.

3uler · 2026-05-24T15:14:26 1779635666

Opencode has really bad cache stability issues that they seem uninterested in fixing at the moment.

dathery · 2026-05-24T16:44:35 1779641075

The OpenCode devs talk about this on Twitter a lot, e.g. https://xcancel.com/thdxr/status/2048268697790300343

> tool call pruning breaks cache and people will tell you this is horrible and expensive

> except i looked at some anthropic data and real user behavior ends up with better cache hits and 30% less spend

> even this is needs to be analyzed further, it's just not simple

> for openai data it's inverted! cache hit ratio is actually better [sic: I think he meant worse based on the screenshot] with tool call pruning turned on

> but the net $ saved is only 5%

> kimi is a funny one - it has better cache hits with pruning on...but is also more expensive!

There was also another thread recently where he discussed that pruning improves user experience (models are smarter with less context) but I can't find it.

This can also be disabled in the config: https://opencode.ai/docs/config/#compaction

soerxpso · 2026-05-24T19:36:56 1779651416

My understanding of caching with most models/providers is that a prefix substring of the context has to be reused for a cache hit, but not necessarily the whole entire context window. So if you prune tool calls from the history, you're going to get one cache miss on the newly-pruned history, and then you're going to be getting cache hits on every subsequent turn, with a lower number of input tokens. If you prune subsequent tool calls after that, you would still get a cache hit for the already-pruned portion of the context, just not the full context.

__natty__ · 2026-05-24T20:05:05 1779653105

So it makes sense to first send stable prompt, reasoning and files content, tool calls summary and actual tool calls at the very end?

leemoore · 2026-05-24T23:26:09 1779665169

The way you do this (and the way opencode does it) is you do most of your pruning in more recent history. Last I looked at opencode, they start pruning tool call results after 2 full agentic turns. So you probably dont get quite as good hits on cache for the most recent 1-5% of your turns, but after that everything else caches fine and those tool calls that likely aren't relavent to your session anymore are gone.

awoimbee · 2026-05-25T07:43:39 1779695019

You didn't quote the interesting part:

> our implementation is it only prunes calls from > 3 user messages ago, if context is > 40K, and only if there's at least 20K tokens to be removed

Seems reasonable to me and explains why I can have long sessions (way longer than with zed agents) while still hitting cache. Opencode is just missing per-provider TTL.

arthurcolle · 2026-05-25T09:03:01 1779699781

I found that keeping current context utilization at 18% of total context length was best for minimizing spend, across all models with 400k context length or more

hirako2000 · 2026-05-24T19:10:14 1779649814

They are. Empirical evidence on my side. Because attention is sparse across the context. It's not truly treating a million token the way it treats a fraction of that count. For performance.

huqedato · 2026-05-24T16:25:09 1779639909

I can't confirm this. Having utilized Opencode for a large project over the past 10 months, with multiple models and agents, we've never run into such 'cache stability issues'."

embedding-shape · 2026-05-24T15:35:09 1779636909

That'd be really easy to spot and also fix, most likely. Any open issue you could point us to, must surely been reported already?

nolok · 2026-05-24T16:17:03 1779639423

> That'd be really easy to spot and also fix, most likely

Ah, reminds me of good old "There are only 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors."

criemen · 2026-05-24T16:27:04 1779640024

> Ah, reminds me of good old "There are only 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors."

You quip, but LLM KV caching (from the harness side) is quite easy: You get a cache hit on stable prompt prefixes, period. That means you want to keep the prefix stable, and only append at the end of the conversation. Made up example: Don't put the git branch name into the system prompt part (that comes first), as whenever the branch name changes, that'd trigger a cache invalidation of the entire prompt.

Getting this right requires some care to not by accident modify the prefix, basically, and some design on communicating the things that can change (user configuration, working dir, git information, ...).

franknord23 · 2026-05-24T17:58:59 1779645539

That sounds like the experience of writing Containerfiles; since steps are cached you want to pull the thing you are iterating on as far down as possible.

gopher_space · 2026-05-24T19:26:39 1779650799

All of this work has been done before in different contexts. Memory management with bigger blocks and weaker definitions that change whenever some grad student gets a bright idea.

vidarh · 2026-05-25T07:49:13 1779695353

100%. Since you mention memory management: Generational GC is pretty much the same idea: Keep the stuff that's least likely to change an important property (liveness) together.

Conceptually the underlying general idea is to sort things based on stability if you can avoid recomputing properties of the stable part.

verdverm · 2026-05-25T17:10:42 1779729042

It's even closer to prefix matching on super long strings by chunk

krzyk · 2026-05-24T16:22:10 1779639730

Opencode (and other coding agents) have hundreds of open issues reported. It is quite discouraging when they are not being closed/fixed.

verdverm · 2026-05-25T17:11:56 1779729116

These projects have also been the recipients of PR spam, lots of duplicates and unconfirmed in there for less technical people and clawd operators

3uler · 2026-05-25T10:53:16 1779706396

https://github.com/anomalyco/opencode/pull/14743

polski-g · 2026-05-26T09:12:05 1779786725

Yeah that's frustrating. Everytime they pop up on my Twitter timeline I shame them into addressing their issue backlog.

They're aware of the issues length and they're "looking into a solution".

estebarb · 2026-05-24T23:25:31 1779665131

I'm not sure that is really the case, or relevant in practice. I have been using OpenCode with DeepSeek lately (regular coding). For instance, today I got 120 million input tokens hitting cache, vs just 2.59million missing cache.

ctxc · 2026-05-25T03:19:04 1779679144

Reads like a LOT of tokens to me. What does your usage /workflow look like? I'm v curious because although I do use Claude code, my token counts aren't nearly as much

I want to know if I'm missing something cool!

mordae · 2026-05-25T08:23:30 1779697410

Not OP, but I routinely load 150k tokens into context. A full sub-package to work on, select other files in the monorepo, e.g. front-end visualization and back-end data loader. Then work some 150k tokens, then start again.

At the end, cache hit rate is like 99.5% if Novita is not having issues.

For official DeepSeek API, 99.9% or something.

Custom harness that never compacts or otherwise doctors the history.

ctxc · 2026-05-25T12:18:34 1779711514

Those numbers make sense to me...120 million input tokens is like 120 sessions of hitting the full context limit, which seems like a lot to me though

magicalhippo · 2026-05-25T05:41:53 1779687713

What I noticed when using OpenCode with llama.cpp, was that the default host RAM prompt cache size in llama.cpp was way too small for say 128k Qwen3.6 27B.

The default is just 8GB and a full 128k context for the dense model can take most of that. So then comes an agent and causes eviction and subsequent cache miss.

Bumped the cache size (--cram IIRC) up to 48GB and had much better results.

metalspot · 2026-05-24T18:40:51 1779648051

I am getting 98.6% cache hit ratio on deepseek-v4-flash with opencode

bobkb · 2026-05-24T19:17:03 1779650223

That’s impressive!

On the sheer performance it’s comparable to Opus ?

stavros · 2026-05-24T22:41:41 1779662501

Here are my stats (from DeepSeek directly, with a script I wrote). The prices are what equivalent Sonnet usage would have cost, the actual amount I paid was $10. On performance, DeepSeek V4 Pro is comparable to Sonnet for me.

     ./cost.py amount-2026-5.csv 0.3 3.75 15
    input_cache_hit_tokens: 472,971,520 tokens -> $141.8915
    input_cache_miss_tokens: 13,299,013 tokens -> $49.8713
    output_tokens: 3,334,962 tokens -> $50.0244
    cache hit rate: 97.27% (472,971,520/486,270,533)
    cache miss rate: 2.73% (13,299,013/486,270,533)
    total: $241.7872

All of this usage was with an OpenCode subagent exclusively.

upcoming-sesame · 2026-05-24T19:44:30 1779651870

out of curiosity, how do you measure cache hit rate in opencode ?

malikNF · 2026-05-24T20:40:54 1779655254

opencode stats

lugu · 2026-05-24T22:17:44 1779661064

So the calculation is:

Total input token = input + cache read + cache write Cache hit rate = cache read / total input token.

That is 71% in my very limited use of opencode.

hackernows_test · 2026-05-24T22:53:48 1779663228

The first

verdverm · 2026-05-25T17:07:40 1779728860

There are some that are specific to certain models like qwen/gemma

I switched to vLLM and those went away. Need to look at my opencode config and adjust some others based on things I see here

3uler · 2026-05-23T11:10:38 1779534638

Also when most of you income comes from your wealth, your income tax rate is effectively 0%…

So complaining about having to contribute to the society that gave the conditions for your vast wealth is going to get you 0 sympathy

3uler · 2026-05-22T16:22:56 1779466976

But if you look at the node compliance tests, deno has better compliance now days…

CharlesW · 2026-05-22T17:28:41 1779470921

Insanely better, at 76% Node compliance in Deno 2.8.

Bun 1.3.14 is at just 40.6% with same compliance test.

https://node-test-viewer.deno.dev/

garbagepatch · 2026-05-22T19:34:39 1779478479

I guess Bun had the better marketing then. I liked how every new feature came with a benchmark against the previous version and node. See this for example: https://xcancel.com/bunjavascript/status/2048228152397459590

I'd love to see a site comparing the 3 of them in a similar way.

3uler · 2026-05-21T20:00:06 1779393606

What do you mean? The whole point of Ruby on Rails is the rails way? Also the problems you are describing are not new and the community settled on adding some sort of service layer

https://shopify.engineering/shopify-monolith http://sporto.github.com/blog/2012/11/15/a-pattern-for-servi...

3uler · 2026-05-12T04:21:52 1778559712

I’ve always found Ruby to be way more readable, what keeps me using python is the depth of libraries is unmatched.

So unless you’re into burning tokens having AI generate untested libraries, I’d stick to using the most idiomatic tool for the problem you are tackling.

irjustin · 2026-05-12T04:39:37 1778560777

So, it's really interesting. We've started moving away from python libs because 25% OSS is out of date and another % is custom tweaks to the software help our use cases. In both scenarios it means our own fork.

And honestly it's not burning that many tokens if you've got an existing example lib to point to.

3uler · 2026-05-12T04:15:18 1778559318

Tbh that is some engineering teams I’ve worked on…

3uler · 2026-04-24T06:17:51 1777011471

These models are open and there are tons of western providers offering it at comparable rates.

3uler · 2026-04-22T04:58:43 1776833923

I can not find a description of how it works on the site, magic hands daemons !

Cool story, but what runs when?

woud420 · 2026-04-22T14:43:59 1776869039

tl;dr : https://charlielabs.ai/how-it-works/ and https://docs.charlielabs.ai/daemons

to get started, look at https://docs.charlielabs.ai/installation, but essentially

1) signup with your github login

2) install the CharlieCreates GitHub App on the repos you want Charlie to work in

3) create an issue, tag @CharlieHelps to help you create your first daemon!

the daemon will run on the charlie runtime and follow the watch / schedule conditions you set in your DAEMON.md file

full disclosure: i am part of the engineering team behind this.

3uler · 2026-04-13T05:10:45 1776057045

Fix them if needed, the OP’s point is that for a lot of applications it is not needed.

For most cases you will still be comfortably in the JVM/golang performance window.

Rust is great language, fighting the borrow checker sucks, don’t do it if you don’t need to.