I was a NNW user for years and it's why I eventually built my own news reader. NNW had a lot of great features and I wanted to mostly keep them. You might find that NewsBlur takes a similar path but with a different set of opinions.
> I think most of these writeups are packaging familiar engineering moves into LLM-shaped language.
They are, and that's deliberate.
Something I'm finding neat about working with coding agents is that most of the techniques that get better results out of agents are techniques that work for larger teams of humans too.
If you've already got great habits around automated testing, documentation, linting, red/green TDD, code review, clean atomic commits etc - you're going to get much better results out of coding agents as well.
My devious plan here is to teach people good software engineering while tricking them into thinking the book is about AI.
> And there are going to be problems that come from using vast quantities of AI on a code base, especially of the form "created so much code my AI couldn't handle it anymore and neither could any of the humans involved". There's going to need to be a discussion on techniques on how to handle this. There's going to be characteristic problems and solutions.
That's essentially the thing we are calling "cognitive debt".
Yeah, it's hard to even get started until we can go three months without a significant improvement in the AIs. Today's characteristic failures may not be 2027's characteristic failures. Example: Today I'm complaining that the AIs tend not to abstract as often as I'd like, but it's not hard to imagine it flipping until they're all architecture astronauts instead.
Maintaining a high-quality requirements / specification document for large features prior to implementation, and then referencing it in "plan mode" prompts, feels like consensus best practice at this stage.
However a thing I'm finding quite valuable in my own workflows, but haven't seen much discussion of, is spending meaningful time with AI doing meta-planning of that document. For example, I'll spend many sessions partnered with AI just iterating on the draft document, asking it to think through details, play contrarian, surface alternatives, poke holes, identify points of confusion, etc. It's been so helpful for rapidly exploring a design space, and I frequently find it makes suggestions that are genuinely surprising or change my perspective about what we should build.
I feel like I know we're "done" when I thoroughly understand it, a fresh AI instance seems to really understand it (as evaluated by interrogating it), and neither of us can find anything meaningful to improve. At that point we move to implementation, and the actual code writing falls out pretty seamlessly. Plus, there's a high quality requirements document as a long-lived artifact.
Obviously this is a heavyweight process, but is suited for my domain and work.
ETA one additional practice: if the agent gets confused during implementation or otherwise, I find it's almost always due to a latent confusion about the requirements. Ask the agent why it did a thing, figure out how to clarify in the requirements, and try again from the top rather than putting effort into steering the current session.
I'm not sure I agree with this. I don't think there needs to be a whole spec & documentation process before plan mode.
There is alternative thought leadership that the waterfall approach for building out projects is not the right Agentic pattern[1].
Planning itself can be such an intensive process where you're designing and figuring out the specs on the fly in a focused manner for the thing the agent will actually develop next. Not sure how useful it is to go beyond this in terms of specs that live outside of the Agentic loop for what should be developed now and next.
I've evolved my own process, originally from plain Claude Code to Claude Code with heavy spec integrated capabilities. However, that became a burden for me: a lot of contextual drift in those documents and then self managing & orchestrating of Claude Code over those documents. I've since reoriented myself to base Claude Code with a fairly high-level effort specific to ad-hoc planning sessions. Sometimes the plans will revolve around specific GitHub issues or feature requests in the ticketing system, but that's about it.
I think there's a Danger Zone when planning is light-weight and iterative, and code is cheap, but reviewing code is expensive: it leads to a kind of local hill-climbing.
Suppose you iterate through many sessions of lightweight planning, implementation, and code review. It _feels_ high velocity, you're cranking through the feature, but you've also invested a lot of your time and energy (planning isn't free, and code review and fit-for-purpose checks, in particular, are expensive). As often happens -- with or without AI -- you get towards the end and realize: there might have been a fundamentally better approach to take.
The tradeoff of that apparent velocity is that _now_ course correction is much more challenging.
Those ephemeral plans are now gone. The effort you put into providing context within those plans is gone.
You have an locally optimal solution, but you don't have a great way of expressing how to start over from scratch pointed in a slightly different direction.
I think that part can be really valuable, because given a sufficiently specific arrow, the AI can just rip.
Whether it's worth the effort, I suppose, depends on how high-conviction you are on your original chosen approach.
It does 2 things that are very important, 1: reviewing should not be done last, but during the process and 2: plans should result into verifyable specs, preferably in a natural language so you can avoid locking yourself into specific implementation details (the "how") too early.
It's the number of active parameters for a Mixture of Experts (misleading name IMO) model.
Qwen3.5-35B-A3B means that the model itself consists of 35 billion floating point numbers - very roughly 35GB of data - which are all loaded into memory at once.
But... on any given pass through the model weights only 3 billion of those parameters are "active" aka have matrix arithmetic applied against them.
This speeds up inference considerably because the computer has to do less operations for each token that is processed. It still needs the full amount of memory though as the 3B active it uses are likely different on every iteration.
It will benefit from a full amount of memory for sure, but AIUI if you use system memory and mmap for your experts you can execute the model with only enough memory for the active parameters, it's just unbearably slow since it has to swap in new experts for every token. So the more memory you have in excess to that, the more inactive but often-used experts can be kept in RAM for better performance.
Agreed but for a dense model you'd have to stream the whole model for every token, whereas with MoE there's at least the possibility that some experts may be "cold" for any given request and not be streamed in or cached. This will probably become more likely as models get even sparser. (The "it's unusable" judgmemt is correct if you're considering close-to-minimum reauirements, but for just getting a model to fit, caching "almost all of it" in RAM may be an excellent choice.)
I use the term "harness" for those - or just "coding agent". I think orchestrator is more appropriate for systems that try to coordinate multiple agents running at the same time.
This terminology is still very much undefined though, so my version may not be the winning definition.
Yesterday I test ran Qwen3.5-35B-A3B on my MBP M3 Pro with 36GB via LM Studio and OpenCode. I didn’t have it write code but instead use Rodney (thanks for making it btw!) to take screenshots and write documentation using them. Overall I was pretty impressed at how well it handled the harness and completed the task locally. In the past I would’ve had Haiku do this, but I might switch to doing it locally from now on.
I suppose this shows my laziness because I'm sure you have written extensively about it, but what orchestrator (like opencode) do you use with local models?
I've used opencode and the remote free models they default to aren't awful but definitely not on par with Gemini CLI nor Claude. I'm really interested in trying to find a way to chain multiple local high end consumer Nvidia cards into an alternative to the big labs offering.
Kimi K2.5 is pretty good, you can use it on OpenRouter. Fireworks is a good provider, they were giving free access to the model on OpenCode when it first released.
When you say you use local model in OpenCode, do you mean through the ollama backend? Last time I tried it with various models, I got issues where the model was calling tools in the wrong format.
That's exactly why I'm asking! I'm still mystified about whether I can use ollama at all. I'm hopeful that the flexiblity might become interesting at some point.
reply