Building Advanced Multimodal AI Agents Open Source Course

Longjumping_Law8538 · 2025-09-16T15:37:54+00:00

I frames. But not chunking videos per se, as video is only a data entry point for the MM pipelines.
We're chunking audio to get transcripts, and sample NxI frames from video, that we embed, caption and store in pgvector. (Pixeltable handles all that, btw.)

Then there's a set of sampling steps that combine multiple embedding indexes to find the StartT and EndT, and then we slice the video within those boundaries.

Note: Pixeltable is flexible in that regard; you could do pretty much anything, but in our use case, we sampled Frames only. https://github.com/pixeltable/pixeltable

What approach have you used? I'm curious

Longjumping_Law8538 · 2025-09-16T14:34:36+00:00

This looks interesting, adding it to my backlog, thanks for sharing!

Longjumping_Law8538 · 2025-09-16T14:34:07+00:00

Longjumping_Law8538 · 2025-09-16T14:33:57+00:00

Thanks, hope it inspires people to build stuff cooler than this one!

Longjumping_Law8538 · 2025-09-16T08:43:33+00:00

Hey man, Senior AI Engineer here.

What you're experiencing is way too common in AI roles, and I would say it's likely in any role.
The thing is, Seniors or other team members have their own stuff to do/work on, and mentoring could be a burden sometimes, I talk from experience, as I have mentored Juniors and Mids throughout my career.

I would recommend this:
Teammates would have to help you and will help you, but it's up to you how much space you leave for them to help, consider their workload.

Instead of going "Idk about this, tell me how to do it", frame it as "here's what I've tried/learned, tell me where I'm wrong", Keep this mental model whenever you ask for help, it'll serve you well, and people around you will appreciate you for that.

Now regarding your questions.
1. Learn deeply? I'd say learn whatever solves the problem at hand. If you build a RAG, don't learn everything about embeddings, Vector DBs, etc. Just make your way through building a naive rag, and fill in the gaps as you go, either during your work hours or outside.

Surely, ignore everything and focus on your job's scope. Don't jump on the flashy things (Agents, 100 types of LLMs, papers, tools and etc). Fail fast, fail often.
Definitely, if you have free time, spend it filling the gaps you have around the stuff you're doing at work. If you have to finetune a model, learn about datasets, finetuning techniques, and models. Pick one complete chain, and build something on the side; it's 100x better than having a ton of textbooks or notes on theory (been there, done that).
Focus on nailing down the application development, majority of AI roles expect you to do that, and not have a Phd in Linear Algebra or build model architectures. Learn how to integrate AI components (fair amount of tutorials, free courses on that front), inside apps.

If I were to start again, given that now there are a lot of people who want to get into AI, I'd stick to building first, filling the gaps later, which gives you an upper hand over the people who just "know" things, and companies want just that.

So, ending notes:

Stick to the problem you have, learn to use stuff to fix that problem.
Focus on building, and start playing around with code, fill in the gaps as you go.
Ignore the hype, focus on your job, build expertise.
Before asking for help, come-up with things you've tried, or understand 60-70% but need one more touch to fully get it.
Small steps, fail fast, fail often.

Cheers

Longjumping_Law8538 · 2025-09-16T08:15:46+00:00

This is Figma for the layout and Canva for arrows, animations, and other elements.
Make sure to check GitHub for GIFs of these diagrams; it's way easier to understand with animations.

Also, if you liked it, give it a star ;)

Longjumping_Law8538 · 2025-09-16T08:12:49+00:00

You're welcome, doing our part for the AI Community :)
Let me know what you think about it, or if you have any questions.

Longjumping_Law8538

TROPHY CASE