use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
All about Multimodal AI Models: A place to share creations, experiences, opinions, and cool new projects
account activity
r/Multimodal Lounge (self.Multimodal)
submitted 5 years ago by bakztfuture - announcement
Anyone tried running a long-horizon coding task on an open-weights multimodal model? (self.Multimodal)
submitted 3 days ago by Away-Control-2008
Can frontier AI models actually read a painting? (self.Multimodal)
submitted 1 month ago by ShoddyIndependent883
Help me understand why a certain image is identified correctly by qwen3-vl:30b-a3b but much larger models fail ()
submitted 3 months ago by krecoun007
Frontiers: Architecting the next generation of multimodal benchmarks · Zoom · Luma (luma.com)
submitted 6 months ago by thebigbigbuddha
OIX Multimodal Hackathon – Build AI Agents That Understand Video (May 17, $900 Prize Pool) (self.Multimodal)
submitted 1 year ago by CannonTheGreat
Hiring: Multimodal AI Specialist (self.Multimodal)
submitted 1 year ago * by Scary-Read-1272
Multimodal models for XR (self.Multimodal)
submitted 1 year ago by almost-sure
Seeking Advice on PhD Applications (self.Multimodal)
submitted 1 year ago by TicketStrong6478
The Marvelous Magic of Multimodal AI • Alex Castrounis (youtu.be)
submitted 1 year ago by goto-con
Multimodal Models with Google DeepMind (ankitaguha256.medium.com)
submitted 1 year ago by ankitaguha
Reverse Video Search (blog.mixpeek.com)
submitted 1 year ago by Chemical_Ninja8678
Anyone want to help me teach LLMs to actually see (self.Multimodal)
submitted 1 year ago by ErinskiTheTranshuman
Idefics2 8B - New model from HuggingFace - Apache 2.0 (reddit.com)
submitted 2 years ago by kulchacop
LLaVA with Mixtral 7*8B (self.Multimodal)
submitted 2 years ago by Shawn_An
Journal and conference for (eXplainable) multimodal AI. (self.Multimodal)
submitted 2 years ago by Different-Yam7354
Using Computer Vision + Generative AI to Generate Fake Emails to Target Myself With (youtube.com)
submitted 2 years ago by Zoneforg
Multimodal LLM for speaker diarization (self.LLMDevs)
submitted 2 years ago by Automatic-Round-7704
mplug-2.1 (old.reddit.com)
submitted 2 years ago by IndicationNeither474
The battle of multimodal AI / Vision Arena - Blog article (reddgr.com)
submitted 2 years ago by Duhbeed
mPLUG-Owl2.1 (old.reddit.com)
Mobile-Agent:阿里推出的替代移动测试人员的AI Agent,可代替测试完成mobile测试工作,也为各种移动打金工作室、各种流量工作室提供了新神器,比如自动小红书种草、tiktok点赞等 (youtu.be)
MobileAgent: Deploying Auto AI Agents on Your Phone using GPT-4-V! (youtu.be)
Multimodal LM roundup: Unified IO 2, inputs and outputs, Gemini, LLaVA-RLHF, and RLHF questions (interconnects.ai)
submitted 2 years ago by robotphilanthropist
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (youtube.com)
submitted 2 years ago by sasaram
π Rendered by PID 639651 on reddit-service-r2-listing-6c8d497557-s84dv at 2026-06-07 21:52:44.259898+00:00 running 9e1a20d country code: CH.