More

airstrike · 2026-02-24T20:45:29 1771965929

Make it look like the TouchID isn't working and switch to password mode, boom. User password obtained

airstrike · 2026-02-24T20:24:32 1771964672

Most LLM "harnessing" seems very lazy and bolted on. You can build much more robustly by leveraging a more complex application layer where you can manage state, but I guess people struggle building that

TeMPOraL · 2026-02-25T00:49:54 1771980594

Common failure mode I've observed is people building a stateful harness for the LLM and then forgetting to tell the LLM about it. Leads to funny/disturbing results whenever the two "desync" in some way.

Example: a plan/act division, with the harness keeping state of which mode is active, and while in "plan mode", removing/disabling tools that can write data. Cue a mishandled timeout or an UI bug that prevents switching to "act mode", and suddenly the agent is spinning for 10 minutes questioning the nature of their reality, as the basic tools it needs to write code inexplicably ceased to exist, then opting for empirical experimentation and eventually figuring out a way to reimplement "search/replace" using shell calls or Python or whatever alternative wasn't properly sandboxed by the harness writers...

Part of this is just bugs in code, but what irks me is watching the LLM getting gaslighted or plain confused by rules of reality changing underneath it, all because the harness state wasn't made observable to the agent, or someone couldn't be arsed to have their error messages and security policies provide feedback to the LLM and not just the user.

airstrike · 2026-02-23T11:57:58 1771847878

Not sure why you'd get that from this post, which says it required careful small prompts over the course of weeks.

In the hands of experienced devs, AI increases coding speed with minimal impact to quality. That's your differentiator.

airstrike · 2026-02-22T17:49:12 1771782552

Thought it was going to be a blog post about Jeopardy for a sec

airstrike · 2026-02-22T16:40:57 1771778457

We're not solving for efficient pricing at the expense of one insider reaping all the benefits because they already either knew or set the price before hand.

fsckboy · 2026-02-25T17:44:41 1772041481

if you have no knowledge of any inside information and you are trying to make a buy/sell investment decision, your decision benefits from the price being as accurate as possible. That is what every individual is "trying to solve" so it's a net benefit.

airstrike · 2026-02-27T12:11:03 1772194263

except there is nothing to buy/sell here other than the prediction itself.

Or do you also think it would make sense, in a hypothetical scenario, to buy options on a stock price that is already known by some in advance? There’s a reason you’re not allowed to trade on insider information. You’re totally missing the point here.

airstrike · 2026-02-22T15:24:59 1771773899

I love the unabbreviated $1,660,000,000,000 lol It reminded me of Waxahatchee's

> You let me take my own damn car

> To Brooklyn, New York, USA

airstrike · 2026-02-22T03:37:44 1771731464

I'm honestly puzzled by this take.

Clearly the people who benefit from insider trading in any market are those doing the insider trading, not all market participants.

The argument is not that Polymarket et al are "insider trading only" but rather that insider trading in those markets is not regulated so people can get ahead of trades based on confidential information and make a lot of money off of all the suckers gambling their money away on ridiculously frivolous bets.

If you don't see the problem with that, you're complicit, misinformed or brainwashed.

A similar issue is that of market manipulation, since many markets in these platforms can be directly manipulated by participants in manners as easily as spamming some words on an earnings call.

alphazard · 2026-02-22T04:16:40 1771733800

> If you don't see the problem with that, you're complicit, misinformed or brainwashed.

The problem that I imagine you see with this, is that it doesn't conform to a particular, special, notion of fairness that you think the market should have.

Informed parties have an edge over uninformed parties. This edge is "unfair" if you believe the market should be a lottery. The market is designed to pay people with accurate beliefs, by taking from people with inaccurate beliefs. Everyone's belief is valued based on its accuracy, and the market is fair in that sense. Fairness is actually irrelevant to the societal good the market provides, which is to produce accurate prices. A third party, who doesn't participate, shouldn't care about the market being "fair", they should care about it giving good information.

> A similar issue is that of market manipulation, since many markets in these platforms can be directly manipulated by participants in manners as easily as spamming some words on an earnings call.

If you are betting on what a person will say, and the person knows about the market, that is a chaotic system. If you bid the price away from max entropy then you deserve the outcome.

airstrike · 2026-02-22T14:52:55 1771771975

This has nothing to do with fairness of outcomes but with defrauding naive investors of their money.

For context, I'm a former Investment Banker so this isn't coming from a place of naivete but an informed view. I did have to study SEC regulations for those FINRA examinations....

airstrike · 2026-02-21T20:08:23 1771704503

Not a bad person, just lacking in wisdom.

marxisttemp · 2026-02-21T20:46:34 1771706794

Not really

airstrike · 2026-02-20T13:38:10 1771594690

it's nowhere near claude opus

but claude and claude code are different things

dudeinhawaii · 2026-02-20T16:49:59 1771606199

My take has been...

Gemini 3.1 (and Gemini 3) are a lot smarter than Claude Opus 4.6

But...

Gemini 3 series are both mediocre at best in agentic coding.

Single shot question(s) about a code problem vs "build this feature autonomously".

Gemini's CLI harness is just not very good and Gemini's approach to agentic coding leaves a lot to be desired. It doesn't perform the double-checking that Codex does, it's slower than Claude, it runs off and does things without asking and not clearly explaining why.

logicallee · 2026-02-20T15:03:46 1771599826

(Claude Code now runs claude opus, so they're not so different.)

>it's [Gemini] nowhere near claude opus

Could you be a bit more specific, because your sibling reply says "pretty close to opus performance" so it would help if you gave additional information about how you use it and how you feel the two compare. Thanks.

airstrike · 2026-02-20T04:33:40 1771562020

there are about a dozen startups doing variations of this right now. it's good to see an open source alternative pop up

tmustier · 2026-02-20T09:31:20 1771579880

yes felt long overdue for OSS :)