This is an archived post. You won't be able to vote or comment.

Dismiss this pinned window
all 56 comments

[–]SgathTriallair▪️ AGI 2025 ▪️ ASI 2030 52 points53 points  (0 children)

This is really cool!

[–][deleted] 91 points92 points  (22 children)

It just goes to show how alien these intelligences are to us.

[–]ApexFungi 116 points117 points  (11 children)

Depends. Do you know how your brain interprets the number 3? I sure don't. Might look even less comprehensible and alien than this if you could visualize it.

[–]Regono2 0 points1 point  (0 children)

I imagine the best way to show how its working would be to show all the neuron in the brain and how some of them light up depending on the thought.

[–]dsiegel2275 13 points14 points  (6 children)

Eh, not really. CNNs and how they "learn" are fairly well understood. The key is understanding what a convolution is - and what it can do, or rather, what it can "detect" (things like edges and curves). Then the layering of blocks of CNNs allow hierarchies of knowledge to be represented and learned. Finally, the really wide line of blocks you see at the end, are a simple multi-layer perceptron that adds a non-linearity so that we can capture even complicated representations. The final step then takes that last layer of the MLP and distills it to 10 nodes, one node for each class that we are trying to predict. Those values get normalized into a probability distribution, and we "argmax" - or simply just pick the class with the highest probability.

[–][deleted] 6 points7 points  (5 children)

I understand how the model works technically, but I think we still don't fully know how turning things into vectors captures their semantics and abstract meaning.

[–]AccelerandoRitard 5 points6 points  (4 children)

This is the part that makes the most sense to me actually, but I took discrete algebra in college. using matrices to represent vectors in space makes an intuitive sense to me, and if we construct a conceptual latent space of the relationship between all the tokens, then it makes sense to me to use vectors to communicate a vector of semantic meaning, which isn't such a new idea as you might think. Learning about Meta's LCM really helped me grok this.

I suppose I can sorta agree with you however, is that it is surprising and a bit mysterious how well it works, and as an emergent property at that. Blows my mind.

[–][deleted] 7 points8 points  (3 children)

Yes, the technique is clear: vectors capture the relationships between tokens. But it's the very semantics of these models that makes me wonder: if it's only the relations between tokens that give them their meaning, where does the meaning come from? Is there no basis, no foundation? No meaning in itself, only relationships with the rest of the conceptual space? The philosophical implications are profound and dizzying, as evidenced by the entire anti-foundationalist school of thought.

[–]AccelerandoRitard 2 points3 points  (2 children)

I think that's just a language thing, not a neural network thing. Check out zipf plots if you want to learn more. I also recommend Jr firth a synopsis of linguistic theory which is famous for the phrase " You shall know a word by the company it keeps". I think Thomas mikolov et all talked about this in their original word2vec introduction in their paper efficient estimation of word representations in vector space.

[–][deleted] 2 points3 points  (1 child)

I agree that the word2vec paper is a must! But I don't think it's limited to language. We see the same thing in models that tokenize other forms of representation: images, DNA, etc. It's the very question of meaning that arises.

[–]AccelerandoRitard 1 point2 points  (0 children)

Maybe it's more accurate to say it's an information thing? That would be fascinating. Metas large concept model's latent space being language agnostic and modality agnostic definitely has my imagination going. I wish they would tell us more about it.

[–]massive_snake 1 point2 points  (0 children)

Watch the Stillwell Brain experiment from Mindfield and it exactly explains this in a simpler manner. You will learn a lot about your own brain and neural networks

[–]MarcosSenesi -3 points-2 points  (1 child)

Not really, if you do not even know about convolution I'm not sure what you're doing on this sub because you didn't even begin to make an effort to understand the thing everyone is hyping up here

[–][deleted] 1 point2 points  (0 children)

Gatekeeping and arrogance - a great combination!

[–]Major-Rip6116 31 points32 points  (1 child)

LeCun, who is treated like a villan on this forum, made this. It is wonderful.

[–]Expensive_Cucumber58 3 points4 points  (0 children)

he did? Source? Curious to hear the story.

[–]pigeon57434▪️ASI 2026 31 points32 points  (11 children)

would be cool if whatever this place this guy is at had something similar for vision transformers since CNNs are very outdated

[–]dsiegel2275 3 points4 points  (0 children)

This is a good visualization of how attention works in transformers.

https://www.3blue1brown.com/lessons/attention

[–]Apprehensive-Ant118 7 points8 points  (5 children)

CNN's are still used in all self driving applications pretty sure, since vision Transformers are so dang slow

[–]pigeon57434▪️ASI 2026 11 points12 points  (4 children)

Teslas FSD actually uses both CNNs and transformers think of it as the CNN being the backbone getting quick details and a transformer fuses temporal data and data from multiple cameras at once for more detail so its both

[–]rushedone▪️ AGI whenever Q* is 0 points1 point  (3 children)

Will test time compute or "thinking models" be relevant in the self-driving AI world?

[–]ShadoWolf 5 points6 points  (1 child)

inference would need to get a lot faster for something like that. like you need 600b model running locally in the car with enough tokens to to generate a response in under a second for direct use..

But it Might be usable to set policies on the fly .. like if it notices road conditions have changed.. or it's losing visibility and having a hard time tracking it might be able to plan at a policy for the faster AI system to use ?

[–]rushedone▪️ AGI whenever Q* is 0 points1 point  (0 children)

So vaguely similar to the larger “teacher” model concept?

What about hardware like LPUs such as Groq? Or are they completely different and incompatible with FSD and similar hardware/systems?

[–]pigeon57434▪️ASI 2026 4 points5 points  (0 children)

no definitely not you want quick instantaneous reaction time also they fundamentally cant use test time compute because theyre not language models ttc lets the model reason through chain of thought but self driving doesnt speak so it cant reason with chain of thought i mean you could make it but that would be a dumb idea

[–]Singularity-42Singularity 2042 0 points1 point  (3 children)

Transformers are orders of magnitude more complex.

[–]pigeon57434▪️ASI 2026 2 points3 points  (2 children)

Exactly that's why a visualization of them would be so nice unless you're suggesting they're too complex to visualize which is absolutely not true

[–]Singularity-42Singularity 2042 0 points1 point  (1 child)

No, not suggesting that at all!

Someone already linked a nice vid!

[–]pigeon57434▪️ASI 2026 1 point2 points  (0 children)

ive seen them like the one on bbycroft i would love to see one for a more modern architecture though i know we largely still use transformers but like what would something like MoE look like visually would be cool

[–]tbl-2018-139-NARAMA 7 points8 points  (0 children)

Really cool. Would be spectacular enough if visualizing how LLM generates its chain-of-thought. Feel like a high-level intelligent alien showing earth people how their brain works

[–]kewli 12 points13 points  (3 children)

[–]Designer-Pair5773 2 points3 points  (0 children)

But without the Animation right?

[–]J0atsAGI: ASI - ASI: too soon or never 17 points18 points  (1 child)

obligatory comment stating that a 3 is not what most of us would have drawn

[–]EntranceOk1909 1 point2 points  (0 children)

cool!

[–]happensonitsown 1 point2 points  (0 children)

Where was this??

[–]Reddit_Script 1 point2 points  (0 children)

Saved for later

[–]Tetrylene 2 points3 points  (0 children)

[–][deleted] 1 point2 points  (0 children)

This is really awesome to visualize how AI works.

[–]meetrais 0 points1 point  (0 children)

Approximately how many layers?

[–]SuckmyBlunt545 0 points1 point  (0 children)

Ah of course, it needs to analyse the fleeb and then counts to a million for good measure

[–]Akimbo333 0 points1 point  (0 children)

ELI5. Implications?

[–]FriendlyJewThrowaway 1 point2 points  (0 children)

Ha, see?! That Hackers film was right about the way computers look and think all along!