Every time a new image gen comes out I keep saying that it won't get better just...

lehmacdj · 2026-04-21T20:00:33 1776801633

So do you think there will be a better image model in a year?

throw310822 · 2026-04-21T20:29:28 1776803368

I'll bite: no I don't think so. If the examples are not cherry-picked and by "image model" we mean just the ability to generate pictures, this looks like parity with human excellence, there isn't much space for further improvement. The images don't just look real, they look tasteful- the model is not just generating a credible image, it's generating one that shows the talent of a good photographer/ designer/ artist.

Vachyas · 2026-04-21T20:26:46 1776803206

I'm honestly unsure what could be improved at this point.

Consistency? So it fails less often?

Based on the released images, (especially the one "screenshot" of the Mac desktop) I feel like the best images from this model are so visually flawless that the only way to tell they're fake is by reasoning about the content of the image itself (ex. "Apple never made a red iPhone 15, so this image is probably fake" or "Costco prices never end in .96 so this image is probably fake")

thevinter · 2026-04-21T20:31:37 1776803497

There is definitely room for improvement: https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...

Especially when it comes to detailed outputs or non-standard prompts.

I do believe it will get even better - not sure it will happen within a year but I wouldn't be incredibly surprised if it did.

vunderba · 2026-04-21T20:37:27 1776803847

Yep. “Where’s Waldo” has been a classic challenge for generative models for a while because it requires understanding the entire concept (there’s only one Waldo), while also holding up to scrutiny when you examine any individual, ordinary figure.

I experimented with the concept of procedural generation of Waldo-style scavenger images with Flux models with rather disappointing results. (unsurprisingly).

Vachyas · 2026-04-21T22:52:20 1776811940

That's a good example, actually.

If you asked me what I expected, since this one has "thinking", it'd be that it would've thought to do something like generate the image without Waldo first, then insert Waldo somewhere into that image as an "edit"

throw310822 · 2026-04-21T20:40:05 1776804005

I wonder if at this point you could just ask the agent to iteratively refine the image in smaller portions.

RobinL · 2026-04-21T20:47:42 1776804462

I'm been impressed when testing this model today, but it still can't consistently adhere to the following prompt: make me an image of a pizza split into 10 equal slices with space in between the them, to help teach fractions to a child.

It doesn't reliably give you 10 slices, even if you ask it to number them. None of the frontier models seem to be able to get this right

jinushaun · 2026-04-21T21:22:40 1776806560

Cost? Speed?

vunderba · 2026-04-22T01:51:30 1776822690

> I'm honestly unsure what could be improved at this point.

That's because you're focusing a little bit too much on visual fidelity. It's still relatively trivial to create a moderately complex prompt and have it fail miserably.

Even SOTA models only scored a 12 out of 15 on my benchmarks, and that was without me deliberately trying to "flex" to break the model.

Here's one I just came up with:

  A Mercator projection of earth where the land/oceans are inverted. (aka land = ocean, and oceans = land)

Vachyas · 2026-04-24T21:50:56 1777067456

Good point.

So I guess while "realism" (or believability) is really good now, prompt adherence has much room for improvement.

(though put it another way, realism has always been "solved" if the model gets to output whatever it wants as long as it looks realistic, though now it looks less like a malfunction and more like an inattentive human mistake or oversight, so even when it gets it wrong it's hard to tell it's wrong without knowing what the prompt was)

vunderba · 2026-04-24T22:31:43 1777069903

> it's hard to tell it's wrong without knowing what the prompt was.

Yeah this is actually a huge point of frustration on reddit where lots of people post their "impressive generative images" but fail to disclose the prompts so the audience is only able to evaluate realism/fidelity and not how faithfully the model actually followed the prompt.