Hacker Newsnew | past | comments | ask | show | jobs | submit | mosselman's commentslogin

what a nonsense, generated, article.

> For context: GLM 5.1 ran the same task and reached 7.3x. Kimi K2.6 reached 5x. DeepSeek V4 Pro reached 3.3x. The models that stopped early did so because they issued no tool calls for five consecutive rounds, they concluded they couldn’t make further progress and stopped. Qwen3.7-Max didn’t stop.

By this reasoning I could release a model that lacks all the basic optimisations. Have it optimise itself for hours to reach 20x the throughput and then claim that the model is superior to the others?

I am not saying that is what happened here, but the reporting is abysmal.


It is not the model's job to stop or continue, it's the agent. Qwen has nothing to do with it.

Right now now I switched to the latest codewhale agent (in Rust), and it would perform much better according to his qualifications. Much better async IO implementation and orchestration, no more deadlocks as in the typical typescript tooling. It just doesnt stop out the blue, as claude, kimi or opencode.


It optimized the Extend Attention operator in triton. All models were optimizing the same operator

They didn't optimize their own kernels and optimize their own runtime, which I think is what you are implying.

So all that needs to happen now is that The Doors will do another tour and I am in!

I am super impressed by this. The landing page is GREAT, it really shows you what it can do and I tried creating a polypad and it works just as well as the landing page makes it seem.

I would have loved to have this when I was in school and I can imagine it is a great tool for teachers. I am going to let my kids play around with this.

In rereading my comment it sounds like some sort of lame ad, which I can assure you it isn't. I am just excited about it.


Same! It’s one of those projects where as soon as I scroll the landing page, I can hear the fireworks of ideas going off in my head! It seems to work really well on mobile as well. I think it’s got real visual aesthetic appeal - it takes me right back to the handdrawn colour diagrams in the textbooks when I was at primary school.


How come I see my own github avatar in the designed editor along with others? I assume people also visiting the builder? Not very cool if I may say so.


I think those are placeholders, lol


They aren't I saw my own avatar.


You even traveled in time to deliver us this benchmark.

I really like this benchmarking. Have you evaluated the judge benchmark somehow? I'd love to setup my own similar benchmark.


Haha, just fixed the date!

I haven't evaluated the judge benchmark. You have everything needed in the repo to do so though, so be my guest. It took me a bit of time to put all this together and won't have much more time to dedicate to it before a couple of weeks.

BTW, if you explore the repo, sorry for all the French files...


I would like to second this solid advice.

I have a very nice grinder: a solis caffissima digital coffee grinder. It is available under a different brand name in the US I think.

I make filter coffee with a very basic earthenware filter holder with melitta high quality yet very normal filters and sometimes I mix it up with an aeropress which offers a different type of taste because of the low acidity way of making coffee. I just drip the coffee into a nice thermos so I can make 4 cups in one go and just pour from the thermos.

My coffee is much nicer than I get in most places, both professional and at homes and it doesn’t cost me a lot in effort, money and, very importantly, workspace footprint.

Espresso machines require a lot of space and maintenance and trouble to make.

Having said all this, I am quite intrigued about all the stories about the negative effects of coffee. I just thought it was about influencing sleep, but I had never thought about the memory and mood effects. I will study this some more in the coming months.


Does this have a unified API? In playing around with some of these, including unified libraries to work with various providers, I've found you are, at some point, still forced to do provider-specific works for things such as setting temperatures, setting reasoning effort, setting tool choice modes, etc.

What I'd like is for a proxy or library to provide a truly unified API where it will really let me integrate once and then never have to bother with provider quirks myself.

Also, are you also planning on doing an open-source rug pull like so many projects out there, including litellm?


1. Yes, we have OpenAI-compatible API and we develop GoModel with Postel’s law in mind: https://gomodel.enterpilot.io/docs/about/technical-philosoph... .

2. Regarding being open-source and the license, I've described our approach here transparently: https://gomodel.enterpilot.io/docs/about/license


Just look at the second file in the list then I guess.

This post is about exploring code, not documentation. Nobody is going to warn you about the README unless it is super outdated.


You should be concerned about a government issuing these ridiculous and dangerous controls on what you can do in society. Not whether, within that dystopia it is fair to submit in one way or another.

Also, kids understand perfectly well that different parents have different rules.

I don’t think the government or Apple should be responsible for protecting you from mopey teenagers by blocking free internet access for everyone just so that it “is fair”. Are you even hearing yourself?


So if someone kicks you in the nuts (apt for your username) you shouldn’t be mad because some other person 10000km away got shot?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: