Hacker Newsnew | past | comments | ask | show | jobs | submit | more airstrike's commentslogin

Make it look like the TouchID isn't working and switch to password mode, boom. User password obtained

Most LLM "harnessing" seems very lazy and bolted on. You can build much more robustly by leveraging a more complex application layer where you can manage state, but I guess people struggle building that

Common failure mode I've observed is people building a stateful harness for the LLM and then forgetting to tell the LLM about it. Leads to funny/disturbing results whenever the two "desync" in some way.

Example: a plan/act division, with the harness keeping state of which mode is active, and while in "plan mode", removing/disabling tools that can write data. Cue a mishandled timeout or an UI bug that prevents switching to "act mode", and suddenly the agent is spinning for 10 minutes questioning the nature of their reality, as the basic tools it needs to write code inexplicably ceased to exist, then opting for empirical experimentation and eventually figuring out a way to reimplement "search/replace" using shell calls or Python or whatever alternative wasn't properly sandboxed by the harness writers...

Part of this is just bugs in code, but what irks me is watching the LLM getting gaslighted or plain confused by rules of reality changing underneath it, all because the harness state wasn't made observable to the agent, or someone couldn't be arsed to have their error messages and security policies provide feedback to the LLM and not just the user.


Not sure why you'd get that from this post, which says it required careful small prompts over the course of weeks.

In the hands of experienced devs, AI increases coding speed with minimal impact to quality. That's your differentiator.


Thought it was going to be a blog post about Jeopardy for a sec

We're not solving for efficient pricing at the expense of one insider reaping all the benefits because they already either knew or set the price before hand.

if you have no knowledge of any inside information and you are trying to make a buy/sell investment decision, your decision benefits from the price being as accurate as possible. That is what every individual is "trying to solve" so it's a net benefit.

except there is nothing to buy/sell here other than the prediction itself.

Or do you also think it would make sense, in a hypothetical scenario, to buy options on a stock price that is already known by some in advance? There’s a reason you’re not allowed to trade on insider information. You’re totally missing the point here.


I love the unabbreviated $1,660,000,000,000 lol It reminded me of Waxahatchee's

> You let me take my own damn car

> To Brooklyn, New York, USA


I'm honestly puzzled by this take.

Clearly the people who benefit from insider trading in any market are those doing the insider trading, not all market participants.

The argument is not that Polymarket et al are "insider trading only" but rather that insider trading in those markets is not regulated so people can get ahead of trades based on confidential information and make a lot of money off of all the suckers gambling their money away on ridiculously frivolous bets.

If you don't see the problem with that, you're complicit, misinformed or brainwashed.

A similar issue is that of market manipulation, since many markets in these platforms can be directly manipulated by participants in manners as easily as spamming some words on an earnings call.


> If you don't see the problem with that, you're complicit, misinformed or brainwashed.

The problem that I imagine you see with this, is that it doesn't conform to a particular, special, notion of fairness that you think the market should have.

Informed parties have an edge over uninformed parties. This edge is "unfair" if you believe the market should be a lottery. The market is designed to pay people with accurate beliefs, by taking from people with inaccurate beliefs. Everyone's belief is valued based on its accuracy, and the market is fair in that sense. Fairness is actually irrelevant to the societal good the market provides, which is to produce accurate prices. A third party, who doesn't participate, shouldn't care about the market being "fair", they should care about it giving good information.

> A similar issue is that of market manipulation, since many markets in these platforms can be directly manipulated by participants in manners as easily as spamming some words on an earnings call.

If you are betting on what a person will say, and the person knows about the market, that is a chaotic system. If you bid the price away from max entropy then you deserve the outcome.


This has nothing to do with fairness of outcomes but with defrauding naive investors of their money.

For context, I'm a former Investment Banker so this isn't coming from a place of naivete but an informed view. I did have to study SEC regulations for those FINRA examinations....


Not a bad person, just lacking in wisdom.


Not really


it's nowhere near claude opus

but claude and claude code are different things


My take has been...

Gemini 3.1 (and Gemini 3) are a lot smarter than Claude Opus 4.6

But...

Gemini 3 series are both mediocre at best in agentic coding.

Single shot question(s) about a code problem vs "build this feature autonomously".

Gemini's CLI harness is just not very good and Gemini's approach to agentic coding leaves a lot to be desired. It doesn't perform the double-checking that Codex does, it's slower than Claude, it runs off and does things without asking and not clearly explaining why.


(Claude Code now runs claude opus, so they're not so different.)

>it's [Gemini] nowhere near claude opus

Could you be a bit more specific, because your sibling reply says "pretty close to opus performance" so it would help if you gave additional information about how you use it and how you feel the two compare. Thanks.


there are about a dozen startups doing variations of this right now. it's good to see an open source alternative pop up


yes felt long overdue for OSS :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: