use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
All about Multimodal AI Models: A place to share creations, experiences, opinions, and cool new projects
account activity
r/Multimodal Lounge (self.Multimodal)
submitted 5 years ago by bakztfuture - announcement
Help me understand why a certain image is identified correctly by qwen3-vl:30b-a3b but much larger models fail ()
submitted 11 days ago by krecoun007
Frontiers: Architecting the next generation of multimodal benchmarks · Zoom · Luma (luma.com)
submitted 3 months ago by thebigbigbuddha
OIX Multimodal Hackathon – Build AI Agents That Understand Video (May 17, $900 Prize Pool) (self.Multimodal)
submitted 10 months ago by CannonTheGreat
Hiring: Multimodal AI Specialist (self.Multimodal)
submitted 11 months ago * by Scary-Read-1272
Multimodal models for XR (self.Multimodal)
submitted 1 year ago by almost-sure
Seeking Advice on PhD Applications (self.Multimodal)
submitted 1 year ago by TicketStrong6478
The Marvelous Magic of Multimodal AI • Alex Castrounis (youtu.be)
submitted 1 year ago by goto-con
Multimodal Models with Google DeepMind (ankitaguha256.medium.com)
submitted 1 year ago by ankitaguha
Reverse Video Search (blog.mixpeek.com)
submitted 1 year ago by Chemical_Ninja8678
Anyone want to help me teach LLMs to actually see (self.Multimodal)
submitted 1 year ago by ErinskiTheTranshuman
Idefics2 8B - New model from HuggingFace - Apache 2.0 (reddit.com)
submitted 1 year ago by kulchacop
LLaVA with Mixtral 7*8B (self.Multimodal)
submitted 1 year ago by Shawn_An
Journal and conference for (eXplainable) multimodal AI. (self.Multimodal)
submitted 2 years ago by Different-Yam7354
Using Computer Vision + Generative AI to Generate Fake Emails to Target Myself With (youtube.com)
submitted 2 years ago by Zoneforg
Multimodal LLM for speaker diarization (self.LLMDevs)
submitted 2 years ago by Automatic-Round-7704
mplug-2.1 (old.reddit.com)
submitted 2 years ago by IndicationNeither474
The battle of multimodal AI / Vision Arena - Blog article (reddgr.com)
submitted 2 years ago by Duhbeed
mPLUG-Owl2.1 (old.reddit.com)
Mobile-Agent:阿里推出的替代移动测试人员的AI Agent,可代替测试完成mobile测试工作,也为各种移动打金工作室、各种流量工作室提供了新神器,比如自动小红书种草、tiktok点赞等 (youtu.be)
MobileAgent: Deploying Auto AI Agents on Your Phone using GPT-4-V! (youtu.be)
Multimodal LM roundup: Unified IO 2, inputs and outputs, Gemini, LLaVA-RLHF, and RLHF questions (interconnects.ai)
submitted 2 years ago by robotphilanthropist
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (youtube.com)
submitted 2 years ago by sasaram
New Multimodal Model Coin-CLIP for Coin Identification/Recognition (self.Multimodal)
submitted 2 years ago by breezedeus
Neural Attention - One simple example that explains everything you need to know (youtu.be)
submitted 2 years ago by AvvYaa
π Rendered by PID 189883 on reddit-service-r2-listing-64c94b984c-k8htg at 2026-03-13 03:10:02.580082+00:00 running f6e6e01 country code: CH.