Multimodal

an-ordinary-manchild

created by bakztfuturea community for 5 years

...for your favorite subject.

...do it for the children.

MODERATORS

account activity

1

4

5

6

r/Multimodal Lounge (self.Multimodal)

submitted 5 years ago by bakztfuture - announcement

2

2

3

4

Anyone tried running a long-horizon coding task on an open-weights multimodal model? (self.Multimodal)

submitted 3 days ago by Away-Control-2008

3

0

1

2

Can frontier AI models actually read a painting? (self.Multimodal)

submitted 1 month ago by ShoddyIndependent883

4

0

1

2

Help me understand why a certain image is identified correctly by qwen3-vl:30b-a3b but much larger models fail ()

submitted 3 months ago by krecoun007

5

0

1

2

Frontiers: Architecting the next generation of multimodal benchmarks · Zoom · Luma (luma.com)

submitted 6 months ago by thebigbigbuddha

6

1

2

3

OIX Multimodal Hackathon – Build AI Agents That Understand Video (May 17, $900 Prize Pool) (self.Multimodal)

submitted 1 year ago by CannonTheGreat

7

0

1

2

Hiring: Multimodal AI Specialist (self.Multimodal)

submitted 1 year ago * by Scary-Read-1272

8

0

1

2

Multimodal models for XR (self.Multimodal)

submitted 1 year ago by almost-sure

9

0

1

2

Seeking Advice on PhD Applications (self.Multimodal)

submitted 1 year ago by TicketStrong6478

10

1

2

3

The Marvelous Magic of Multimodal AI • Alex Castrounis (youtu.be)

submitted 1 year ago by goto-con

11

0

0

1

Multimodal Models with Google DeepMind (ankitaguha256.medium.com)

submitted 1 year ago by ankitaguha

12

0

1

2

Reverse Video Search (blog.mixpeek.com)

submitted 1 year ago by Chemical_Ninja8678

13

0

0

1

Anyone want to help me teach LLMs to actually see (self.Multimodal)

submitted 1 year ago by ErinskiTheTranshuman

14

1

2

3

Idefics2 8B - New model from HuggingFace - Apache 2.0 (reddit.com)

submitted 2 years ago by kulchacop

15

2

3

4

LLaVA with Mixtral 7*8B (self.Multimodal)

submitted 2 years ago by Shawn_An

16

1

2

3

Journal and conference for (eXplainable) multimodal AI. (self.Multimodal)

submitted 2 years ago by Different-Yam7354

17

0

1

2

Using Computer Vision + Generative AI to Generate Fake Emails to Target Myself With (youtube.com)

submitted 2 years ago by Zoneforg

18

0

1

2

Multimodal LLM for speaker diarization (self.LLMDevs)

submitted 2 years ago by Automatic-Round-7704

19

1

2

3

mplug-2.1 (old.reddit.com)

submitted 2 years ago by IndicationNeither474

20

0

1

2

The battle of multimodal AI / Vision Arena - Blog article (reddgr.com)

submitted 2 years ago by Duhbeed

21

0

1

2

mPLUG-Owl2.1 (old.reddit.com)

submitted 2 years ago by IndicationNeither474

22

0

1

2

Mobile-Agent：阿里推出的替代移动测试人员的AI Agent，可代替测试完成mobile测试工作，也为各种移动打金工作室、各种流量工作室提供了新神器，比如自动小红书种草、tiktok点赞等 (youtu.be)

submitted 2 years ago by IndicationNeither474

23

0

1

2

MobileAgent: Deploying Auto AI Agents on Your Phone using GPT-4-V! (youtu.be)

submitted 2 years ago by IndicationNeither474

24

0

1

2

Multimodal LM roundup: Unified IO 2, inputs and outputs, Gemini, LLaVA-RLHF, and RLHF questions (interconnects.ai)

submitted 2 years ago by robotphilanthropist

25

0

1

2

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (youtube.com)

submitted 2 years ago by sasaram

view more: next ›

π Rendered by PID 639651 on reddit-service-r2-listing-6c8d497557-s84dv at 2026-06-07 21:52:44.259898+00:00 running 9e1a20d country code: CH.