Hacker Newsnew | past | comments | ask | show | jobs | submit | throwaway7783's commentslogin

But doesn't your counter point assume all products that failed were not good enough?

Can you share your local LLM setup?

Its just my desktop running Krasis. 96GB ddr5, 16GB nvidia 5060. Im running qwen-3.5-80B in int4 mode.

It uses like 45GB system ram and 11GB g-ram.


I'm in the same camp. Do you use any specific extensions? Especially for OLAP and time series (partitioned tables + related extensions work fine, but curious if you use anything else)

The native extensions are fine but I don't have good experience with any third party extensions, so far tried Timescale, pg_lake, citus, and pgvectorscale. They look very appealing but it's usually a trap as you can't get the value without using the vendor's cloud offerings.

I think if you grow enough to look for these extensions, it's usually better to bet on purpose-specific tooling. For example, I use DuckDB/Iceberg combination extensively for columnar data and connect DuckDB to PG when I need it.


Fair enough. How do you do BM25?

From experience, I'd suggest using ClickHouse beyond a few billion rows of timeseries data in Postgres.

Nice thing about our use case is that its not strictly analytics, but looking at most recent raw data. ClickHouse is definitely the powerhouse for analytics

ClickHouse is fine for looking at recent data (with simple / efficient TTL). I'd still (probably) use Postgres for smaller scale things however.

I follow the same process. I have a design in mind for the problem at hand, but I don't reveal it to Codex. I go back and forth a bit to see if its proposals are better than mine. I go back and forth on tradeoffs of various approaches. And then I ask it to compare its proposals with mine. I "win" most of the time but there are many times where it shows a me a better, or simpler approach, or makes me rethink the solution altogether.

Once this is done, the mechanical coding parts are mostly routine (for codex)


I really like this pattern and use it often, this 'not showing my cards'. The second I hint towards the LLM what I prefer it will become sycophantic and invent nonsense why my preferred solution is better.

I'm sure there's an interesting study on how users 'leak' their preference unintentionally to the LLM; perhaps when users list their options, they often put their prefered option first; but not showing the cards on my hand has been very useful when thinking through a problem with LLMs.


LLMs flip positions when users push back ~70% of the time even when they were right. RLHF optimizes for approval, not correctness

> LLMs flip positions when users push back

Same experience. Claude rarely pushes back once you give a plausible/logical reason for your initial decision, even if it flagged concerns at first.


I have noticed this as well, but I think it's somewhat a good thing. I know what I want for my application more than Claude does for example, especially when it comes to what's in production.

An example from earlier, Claude strongly suggested a migration that would run a full vacuum on postgres. However, in production this would lock tables which would grind the application to a halt. After I informed Claude that there were millions of rows in production, it accepted that and helped me get to the right thing.

Another example, I'm developing a TOTP authentication app because I'm dissatisfied with all those that I've tried. I want something strictly local, and with a very easy use case when you have dozens or even a hundred or more accounts on there, that is also efficient when left open for long periods of time. Claude strongly suggested that we force users to encrypt their vault with a passphrase all the time. However this makes the CLI extremely painful to use if you are using a strong passphrase. I told Claude about the user experience impacts and that I wanted to allow users to optionally use a vault with no passphrase encryption, and it accepted that and suggested as a medium that we have a checkbox for the user to explicitly acknowledge that they're creating an unencrypted vault on disc. This is the right thing IMHO.


It's a good thing except when it's not. The problem is the AI does not understand when to use which approach.

Contrast this with a human. We generally understand when the other person knows what they're doing and we should just listen, and when the other person is asking for an honest opinion and wants a push back if necessary.


Skills help there.

I have a linus-reviewer skill that focuses on architectural integrity, no bs, etc modeled on Torvald's code preferences.

And I have an enrico-reviewer one (I'm Enrico), that focuses on correct design, strict typing, simplification.

They have different prios, but they both push back on feedback, till you convince them.


Care to share the skill behind the Linus reviewer ? I tend to as it to do that but leave it up to LLM to decide what the means. Interested to see any specifics you might have included there if it’s ok to share.

Sure.

Would be interested in the experience others may have, took me weeks of iterations to get reviews in a format and utility I liked.

https://gist.github.com/enricopolanski/2bde8619f53307c9bcd5e...


I agree completely. Skills definitely keep it in line and sticking to the script. Thanks for sharing the skills you use, I’ll definitely take a look.

I almost always end with something like: “, but I am not sure, evaluate.” Or other things and avoid ever stating a preference.

I don't think that "fixes" the problem, but it does seem to help. I also have found adding "please feel free to ask questions" seems to help it stop from making an assumption and spinning merrily onward for tens of thousands of tokens based on a bad idea rather than asking you something. I theorize this is because the training and refinement data overprioritize one-shot solutions, both because that's easier to evaluate at training time and improves their benchmarks. But I emphasize the italicized words because that's all gut feel and I can't prove any of it.

They do still attenuate their latent space on prior conversations turns as authority. That is why I like pure design/review sessions and pure coding sessions, often at the same time. I can often keep design and review in the critic and review role without becoming a sycophant. Coding agent just picks up dispatches and works with very little opinion at all.

Tangentially related but I’ve been using Claude to practice interviewing on system design problems, and it’s actually pretty great. But even when it likes my answers it always finds something, however small, to push on. Once it actually was completely wrong and admitted it after I had it realize. So maybe you have to prime it to be contrary and not agree with everything you say, putting it in the role of a tough interviewer seems to do this implicitly.

Take a look at hellointerview.com their model is very stubborn, similar to some interviewers who refuse to acknowledge even valid solutions that differ from the canon.

No affiliation.


It's actually a reasonable way to think about alignment. Sometimes you want the agent to just listen to you and sometimes you want the agent to think critically.

I think about this line a lot. For example, as it happens sometimes you'll have a typo in something you want the agent to do. Llms typically will correct that typo silently and implement the actually intended thing. But if you said, "no, I want the thing I typed," I think everyone's expectation is that is says, "ok done."

I've found that leaving clues in the system prompt / exchange that are open to critique largely mitigate sycophancy with most recent models.

As engineers were trained to represent our positions strongly. Strong opinions loosely held, etc. when you speak authoritatively to a person, "I think we should do x...", the person understand that that's just you're opinion and have the autonomy to push back.

An llm imo _shouldnt_ have that same kind of autonomy by default and it should be rlhf'ed out.


Interesting thing about psychponancy is it’s asymmetric. If an LLM is used to train an LLM it may not have the same level of aggressiveness that humans do when punishing back on trainee. Human pushback has specific patterns which we might be able to compensate due to asymmetry.

Obviously this is just my experience. Claude code pushes back much harder than Codex.

I have totally opposite experience.

Same. Alternatively (or in addition), I sometimes present my preferred idea as being a "bad/naive/stupid option" (or a suggestion from someone who can't be trusted) to see how it stands up to sycophancy to it being bad. As expected the LLM will usually say "yeah it's bad!" and give plausible-sounding reasons for it, but if these reasons are nonsensical it's a good sign that I'm not missing anything

LLMs are very prone to priming in my experience. That is the human psychology name for what you are describing; whether it should be applied to LLMs I don't know, but it describes the phenomenon perfectly.

Makes sense as priming is at the core of how an LLM is trained.

“Given these words, predict the next word.”


It's not limited to arguing with LLMs but if you want a honest opinion you should remember to push back even when it agrees with your hidden preference at first. Sometimes it is only being contrarian or supporting the underdog. Steelman the opposition.

Yes, outside of coding too, it’s a good idea to ask open ended questions rather than ask for confirmation, to avoid this sycophantic bias

There's an easy workaround that helps instead of listing options, just describe the problem constraints and ask it to propose approaches independently.

> I go back and forth a bit to see if its proposals are better than mine

I find it useful to let it generate benchmarks comparing the approaches. Turns out AI is terrible at guessing whats faster or allocates less


I had the exact experience yesterday.

I have a performance problem and went down the path of optimising part of a pipeline that when benchmarked was not the bottleneck, even if it looked plausible for me and the llm. When I asked it to make a final benchmark for documentation I found most of the work I did improved 30% while another path would have improved a magnitude more.

Thankfully iteration is now faster than ever and given how fast it creates tests, previous tests created for the aborted optimisation were helpful.


> Turns out AI is terrible at guessing whats faster or allocates less

s/AI/a human being/ would work equally well, lol.

Jokes aside, I do like the approach of letting the AI build something deterministic and make decisions based on that.


Yup, just like people!

I think this approach is more common than the hype for actual work. I do something similar, many back and forth, then settle on something often with now known tradeoffs, written by hand to spot issues as a final guard/ keep consistent naming etc.

i bet you've contributed a lot of training trajectories for those AI's.

Good!

I'm in the java ecosystem, so YMMV.

- Automatic spring service detection

- Debugger (remote, local , with access to state, stack and ability to modify the state while stepping through), though I assume this is possible with neovm?

- built-in profiler

- can run individual tests seamlessly

- understands bytecode enhancers like Lombok

- Find Usage, find symbol, language specific navigation, showing class hierarchies, going up/down the hierarchies etc (maybe in conjunction with LSP, other editors can do a decent job?)

- Advanced refactoring (extracting classes, interfaces, inlining functions, extracting functions/methods)

- built-in database explorer

- built-in Git support (I have struggled mightily with VSCodes git interactions - but this might just be an individual preference)

- markdown/html previews

Basically, I barely have to get out of the IDE.


> - can run individual tests seamlessly

This is the main one for me. If I am working on a large project with decent unit test coverage, the feedback loop in IntelliJ or Visual Studio is just much quicker than the alternatives because you can run and debug the specific tests you need.


Why can you not run specific tests from neovim?

If you really wanted to you could add some trigger on save for a file that would re-run tests for said file. Maybe a plugin or key bind could run a specific test which you choose in buffer.


I'm sure there is a way to use a step debugger with a specific unit test in Neovim or whatever, but it's one of the things heavyweight IDEs do with no setup or configuration.

Characterizing Zed as a text editor is disingenuous.

Zed has documentation, go-to definition/usage, local/remote debugger, can run/debug individual tests, has git and markdown. Essentially all the core IDE functionality is there. All of these work as well or better for Rust in Zed than IntelliJ.

ByteCode decompilation is a very Java-specific thing and I've not used Zed for Java yet. I suspect that they'll get around to it eventually if they don't already have it.

I've never used database plugins in any IDE I've ever used so I can't compare/contrast between Zed and JetBrains products. However, as soon as I see an IDE gain a database plugin, I know it's the beginning of the end for that IDE. After the database plugin is the CSS minimizer, the JavaScript Bundler, I guess the AI plugin is the new hip thing. For the twilight years of my IntelliJ/CLipn usage, the first thing I'd do upon installation would be to go remove all the damn plugins.


I am not sure if you are replying to me or someone else. I never said I'm characterizing Zed as a text editor. Never used it, and have no opinion on it. My listing above was on why I am sticking with Jetbrains. Also did not say anything about AI.

IDEs/Editors ultimately are a very personal choice assuming they have sufficient features for a given language ecosystem.


We use GitLab. They are no way in an incredible position to moonshot anything. They are yet another git provider with a management plane around it.


I've built a developer platform around GitLab, and it's got some nice stuff, but it's not revolutionary.

But that's not all that relevant to the opportunity in front of them.

The opportunity, generally, exists because of their place in an industry that most folk believe will be very different a decade from now.

That belief is going to lead a lot of CTOs to try new things. When a company tries something new, it almost always picks a new vendor to work with, rather than adding complexity or risk to an existing vendor engagement.

Yes, there are other alternatives, but they are less well known, require self hosting, and/or are secondary products of companies with very broad focus.

Atlassian might be another, given how much of the rest of the software development cycle they have their hooks in, but many tech leaders have unresolved JIRA trauma. :)


I agree they probably have the opportunity (next biggest brand after GitHub), but nothing they are doing is looking like they are taking it. Duo seems to be their best effort so far to ride the AI wave, which everyone and their mother does already.

The article mixes transaction semantics (distributed or otherwise) with idempotency.

Idempotency or not, many points in the articles are are about atomic transactions.


By that logic, all "injustice" is "I don't like it when X happens" - there is nothing more.


Wow, I contracted in Jamnagar for Reliance building software back in 1999-2000. It was fun building a web interface to report on their IoT (not called IoT back then) devices - sensors, meters and whatnots through a CORBA/C++ interface. That was very advanced for those days.


This is not true in my experience at all. I never write such detailed spec for AI - and that is my value as the human in the loop - to be iterative, to steer and make decisions. The AI in fact catches more edge cases than I do, and can point me to things that I never considered myself. Our productivity has increased manyfold, and code quality has increased significantly because writing tests is no longer a chore or an afterthought, or the biggest one for us - "test setup is too complicated". All of that is gone. And it is showing in a decrease in customer reported issues


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: