Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is great but does anyone know of essentially the same thing BUT it can actually get content visually? ie., it is great to have a transcript but unless the models can 'see' the video they will miss valuable info like if the in-video captions show a github repo link or something
 help



There are many models that extracts context from real time video. But they often just rely on identifying simpel object and tasks. Not longer context spanning a video.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: