Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No, not every combination. The question is about the specific combination of a pelican on a bicycle. It might be easy to come up with another test, but we're looking at the results from a particular one here.


More likely you would just train for emitting svg for some description of a scene and create training data from raster images.


None of this works if the testers are collaborating with the trainers. The tests ostensibly need to be arms-length from the training. If the trainers ever start over-fitting to the test, the tester would come up with some new test secretly.


You can easily make a RLAIF loop.

- Take a list of n animals * m vehicule

- Ask a LLM to generate SVG for this n*m options

- Generate png from the svg

- Ask a Model with vision to grade the result

- Change your weight accordingly

No need to human to draw the dataset, no need of human to evaluate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: