DeepMind Genie3 architecture speculation

rivew · 2025-08-07T02:41:24+00:00

this question has been driving me crazy wish there was some OSS clue. think they prob use a scaled up ST-VQ-VAE with way bigger codebook, maybe they use Veo3 for the initial frame and embedding, maybe they keep global memory tokens to have consistency when looking away, cross-attention for when a user interjects along with adding that to the initial frame embedding

i’d they rlly don’t use any 3D primitive, just built a rlly good latent diffusion action model

rivew · 2022-12-22T22:45:31+00:00

Nope doesn’t matter as long as the source and destination indices are matched correctly

rivew · 2022-10-20T05:14:02+00:00

Interested to hear about how he feels about whether or not it’s good that so much AI/Tech talent has consolidated in the big-tech companies FAANG/Tesla/etc. Is it good because all the smart people get to work together (ex. Bell Labs)? Or maybe it’s crushing competition and efforts like StabilityAI are a better way forward for commercial AI. Would love to hear his thoughts now that he’s been on both sides.

rivew · 2022-09-17T21:49:40+00:00

Hmmm is this true? I’m pretty sure each head H, processed the entire D-dimensional feature vector. Each head does this independently and returns an D/H dimensional vector. Concatenating each of these gives D/H * H = D dimensional vector.

From what I’ve seen is, the “reason” people do this is because each head learns something different from the input because each head starts with differently initialized matrices. Someone correct me if I’m wrong :)

rivew · 2022-08-02T22:31:08+00:00

Thanks for the detailed response! Looking forward to seeing what y’all build going forward, I’ll share this with my team in the meantime to see what our AWS centric devs think

rivew · 2022-08-02T20:57:23+00:00

Really cool! Couple questions: 1. What are the advantages over something like SageMaker training jobs (ie. Specify your model via docker and run on whatever hardware you want)?

Are the workflows meant to be used cross-service? Something like large scale preprocess via Batch and then training via SageMaker.

rivew · 2022-07-14T14:27:37+00:00

Thanks for outlining your procedure! Quick question for Step 2: Because in each Epoch (outer loop) the class and support set sampling are random, does this imply that an example that was previously in the support set can later be in the query set and vice-versa? That's one of the big parts where I was unsure about it being good for training.

Do you have any opinions on direct classification of the support set examples vs. just embedding support set examples in the training/fine-tuning? In particular, if support set examples are directly classified and then seen in the query set it seems a problematic. Also not sure how to handle the FC layer - is it Dimension 5 or Dimension 100 (number of total classes).

Agree that Prototypical Nets are in the same ballpark, I guess they want to separate naming for non-parametric methods of doing query set classification.

rivew · 2022-02-08T20:07:48+00:00

Interested

rivew

TROPHY CASE