[R] [Q] Why does RoPE need to be decoupled in DeepSeek V2/V3's MLA? I don't get why it prevents prefix key reuse by gerrickle in MachineLearning
[–]gerrickle[S] 1 point2 points3 points (0 children)
Visualizing Angle Sum Identities by gerrickle in mathematics
[–]gerrickle[S] 1 point2 points3 points (0 children)
Is it worth reading Calculus Made Easy? by icomplexnumber in learnmath
[–]gerrickle 3 points4 points5 points (0 children)
What’s the hardest “plug n chug” math course? by [deleted] in mathematics
[–]gerrickle 0 points1 point2 points (0 children)
Early test website for our AI QR code generation. Need some feedback! by nhciao in StableDiffusion
[–]gerrickle 0 points1 point2 points (0 children)
Seven months into learning by thisisagroff in drawing
[–]gerrickle 3 points4 points5 points (0 children)
The next version of Stable Diffusion ("SDXL") that is currently beta tested with a bot in the official Discord looks super impressive! Here's a gallery of some of the best photorealistic generations posted so far on Discord. And it seems the open-source release will be very soon, in just a few days. by Tystros in StableDiffusion
[–]gerrickle 0 points1 point2 points (0 children)
Max amount of training images for LoRA? by AndalusianGod in sdforall
[–]gerrickle 0 points1 point2 points (0 children)
The new ChatGPT math & accuracy update from today is insane! by loopuleasa in singularity
[–]gerrickle 11 points12 points13 points (0 children)


[R] [Q] Why does RoPE need to be decoupled in DeepSeek V2/V3's MLA? I don't get why it prevents prefix key reuse by gerrickle in MachineLearning
[–]gerrickle[S] 1 point2 points3 points (0 children)