Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've been trying this out with real estate (interior and exterior of single family homes) photos, and it's pretty good but definitely hallucinates. I gave it one of the front of a house, and it talked about windows on the second story, even though it was a one story house. Gave it the interior of a bathroom, and it mentioned a couple of pieces of art on the wall (there were none) and doubled down by describing them when I asked it to.

I am curious if any of the smart folks here can tell me - is that an issue with the image recognition, or the language piece? Is it mistakenly identifying things in the picture, or is the identification correct but the LLM is just starting to predict partway through its response that a description of this sort of bathroom would usually have a couple of photos?



I don’t know that Google has said exactly how it’s implemented this feature under the hood. The underlying Palm 2 model does not natively handle images, though, so it’s relying on other services like Lens to handle the image understanding part. I suspect these are feeding imperfect information back to Bard, who relays that info to the user.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: