- Llava is not the SOTA open VLM, InternVL-1.5 is https://huggingface.co/spaces/...

gigel82 · on May 28, 2024

Like InternVL, no llama.cpp support severely limits its applications. Close to GPT4v performance level and runnable locally on any machine (no need for a GPU) would be huge for the accessibility community.

valine · on May 28, 2024

Very curious how it performs on OCR tasks compared to InternVL. To be competitive at reading text you need tiling support, and InternVL does tiles exceptionally well.

hovering_nox · on May 29, 2024

I think CogVLM2 is even better than Intern at OCR (my usecase is extracting information from an invoice)

mkesper · on May 29, 2024

After some superficial testing I with bad quality scans you can find on kaggle I can not confirm that. CogVLM2 refuses to handle scans that InternVL-V1.5 still can comprehend.

abrichr · on May 29, 2024

Thank you for the link! Our initial testing suggests MiniCPM outperforms InternVL for GUI understanding: https://github.com/OpenAdaptAI/OpenAdapt/issues/637#issuecom...

(InternVL appears to hallucinate more.)

yieldcrv · on May 29, 2024

I’m going to be saying First Ever AI something for the next 15 years for clout and capital, not going to be listening to anybody’s complicated ten step funnel if they’re not doing the obvious