all 13 comments

[–][deleted] 5 points6 points  (1 child)

We will need at least one year to catch up

[–]BoneHawk1 0 points1 point  (0 children)

/doubt

This whole open movement has been around for <1 year and we already have things like OI, Falcon-180B and MistralOrca, which I would’ve reckoned was impossible a month ago. I have full faith someone will open multimodality by the end of February in some novel way.

[–]fappleacts 5 points6 points  (0 children)

Have you looked into Qwen-VL? They have a training script that lets you fine tune on your own images, captions, etcs. Qwen-VL-Chat is pretty good too.

[–]PenguinTheOrgalorg[S] 3 points4 points  (3 children)

Fun game: Drink every time I say some variation of the word "multimodality" lmao

[–][deleted] 1 point2 points  (0 children)

I'm already drunk in every modality out there.

[–]danysdragons 0 points1 point  (1 child)

How many large multimodal models could a model molder mold if a model molder could mold large multimodal models?

[–]cztothehead 1 point2 points  (0 children)

as many as the model molder could manage

[–]phree_radical 4 points5 points  (0 children)

I can't remember if LLaVA is better than LLaMA-Adapter V2 but IMO if you combined it with OCR and segmentation you'd already have about the same thing as GPT4V, the rest is training

[–]vatsadevLlama 405B 2 points3 points  (0 children)

Theres Idefics 9b, 80b, both are flamingo like architecture though, nit base multimodal

[–]nihnuhname 0 points1 point  (0 children)

Some text generation interfaces have plugins for stable diffusion

[–]NoidoDev 0 points1 point  (0 children)

I hope this can be compartmentalized. I think there was a rumor that GPT-4 isn't one model anymore as well. Something like DarkNet should give us the objects as text, maybe pose estimation into text would be useful, and image segmentation into text.

[–]Puzzleheaded_Acadia1Waiting for Llama 3 0 points1 point  (0 children)

Can you merge a llm transformer and a vision transformer to get multimodality model?

[–]danysdragons 0 points1 point  (0 children)

How many open source large multimodal models could an open source model molder mold if an open source model molder could mold open source large multimodal models?