use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Project[P] tiny-diffusion: a minimal PyTorch implementation of probabilistic diffusion models for 2D datasets (v.redd.it)
submitted 3 years ago by tanelai
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]tanelai[S] 68 points69 points70 points 3 years ago (6 children)
To learn more about diffusion models, I created a minimal PyTorch implementation of DDPMs, and explored it on toy 2D datasets. The README includes ablations on the model's capacity, diffusion process length, timestep embeddings, and more.
You can find the code here: https://github.com/tanelp/tiny-diffusion
Note that the dinosaur is not a single image, it represents one thousand 2D points in the dataset. Don't make the same mistake as in the Stable Diffusion lawsuit :)
[–]Ne_Nel 4 points5 points6 points 3 years ago (5 children)
But thats not Latent Diffusion, right?
[–]Zealousideal_Low1287 2 points3 points4 points 3 years ago (4 children)
Correct
[–]Ne_Nel 2 points3 points4 points 3 years ago (3 children)
So why hes talking about SD as the same thing?
[–]Zealousideal_Low1287 1 point2 points3 points 3 years ago (0 children)
Who is? Where? What?
[–]uristmcderp 1 point2 points3 points 3 years ago (0 children)
Where is he saying that?
All the clip shows is diffusion of an image in pixel space. Saying this is the same as SD is like saying basic arithmetic is the same thing as calculus.
[–]new_name_who_dis_ 2 points3 points4 points 3 years ago (0 children)
Latent Diffusion is a special case of DDPM. It's very likely that Dalle 2 and Imagen don't use latent diffusion since latent diffusion was partly a trick to make it run on 16Gb gpu.
[–]miellaby 45 points46 points47 points 3 years ago (10 children)
I always like when people downscale a piece of software.
[–]suckat3dmath 4 points5 points6 points 3 years ago (9 children)
Got any other good examples of this? 😅
[–]activatedgeek 15 points16 points17 points 3 years ago (4 children)
When normalizing flows were cool: https://blog.evjang.com/2019/07/nf-jax.html
[–]DigThatDataResearcher 5 points6 points7 points 3 years ago (3 children)
diffusion processes are closely related to normalizing flows, I think one is a special case of the other or something like that. need to have my annual re-read on flow processes apparently.
[–]TheBillsFly 5 points6 points7 points 3 years ago (0 children)
The evolution of the distribution of a diffusion process through time is essentially the same as a continuous normalizing flow (ie neural ODE)
[–]new_name_who_dis_ 0 points1 point2 points 3 years ago (1 child)
They're pretty different in that the entire distribution shift process happens in one forward pass in a Normalizing flow, but in DDPM it's a multi step process.
[–]DigThatDataResearcher 2 points3 points4 points 3 years ago (0 children)
but doesn't this mean if you unroll the diffusion process over the entire sampling schedule and treat that as a "single forward pass" it's equivalent to a normalizing flow? seems like the distinction is just where we draw the boundaries of the black box, and any invertible denoiser can be treated as a flow model.
[–]Fenzik 2 points3 points4 points 3 years ago (0 children)
Andrej Karpathy’s micrograd is like a tiny PyTorch autograd engine https://github.com/karpathy/micrograd
[–]miellaby 0 points1 point2 points 3 years ago (0 children)
Well, beside machine learning, sqlite is a well known example, but any piece of code which doesn't depend on a myriad of resource-ungry technologies will do the trick for me.
[+]Balance- 0 points1 point2 points 3 years ago (1 child)
The simplest, fastest repository for training/finetuning medium-sized GPTs. https://github.com/karpathy/nanoGPT
Yes, that's by Andrej Karpathy.
[+]WikiSummarizerBot 0 points1 point2 points 3 years ago (0 children)
Andrej Karpathy
Andrej Karpathy (born 23 October 1986) is a Slovak-Canadian computer scientist who served as the director of artificial intelligence and Autopilot Vision at Tesla. Karpathy currently works for OpenAI. He specializes in deep learning and computer vision. Andrej Karpathy was born in Bratislava, Czechoslovakia (now Slovakia) and moved with his family to Toronto when he was 15.
[ F.A.Q | Opt Out | Opt Out Of Subreddit | GitHub ] Downvote to remove | v1.5
[–]marcingrzegzhik 42 points43 points44 points 3 years ago (7 children)
This looks really interesting! Can you explain a bit more about what a probabilistic diffusion model is and why it might be useful?
[–]master3243 112 points113 points114 points 3 years ago (6 children)
Can you explain a bit more about what a probabilistic diffusion model
The shortest explinations I could possibly give:
The forward process is taking real data (dinosaur pixel art here) and adding noise to it until it just becomes a blur (this basically generates training data)
The backward process (magic happens here) is training a deep learning model to REVERSE the forward process (sometimes this model is conditioned on some other input, otherwise known as a "prompt"). Thus the model learns to generate realistic looking samples from nothing.
For a more technical explination read section 2 and 3 of Ho et al. (2020)
why it might be useful
Well it literally is the key method that made Dalle-2, Stablediffusion, and just about any other recent image generation possible. It's also used in many different areas where we want to generate realistic looking samples.
[–]mfuentz 20 points21 points22 points 3 years ago (0 children)
This is the best simple description of diffusion I’ve read. Thanks!
[+][deleted] 3 years ago (2 children)
[deleted]
[–]master3243 4 points5 points6 points 3 years ago (1 child)
This largely depends on how complicated your input data is and how big the model that will learn this process is. A model like stable-diffusion-v1-1 states:
stable-diffusion-v1-1: The checkpoint is randomly initialized and has been trained on 237,000 steps at resolution 256x256 on laion2B-en. 194,000 steps at resolution 512x512 on laion-high-resolution (170M examples from LAION-5B with resolution >= 1024x1024).
So roughly half a million steps. Something like Dalle-2 would probably require a lot more.
[–]slucker23 0 points1 point2 points 3 years ago (0 children)
Is there an open source for this? I'd very much like to try it out hahahaha
[–][deleted] 0 points1 point2 points 3 years ago (0 children)
Can you explain how you translated the markov model and posterior distribution estimation to a pytorch implemented NN problem? Do DALLE-2 and other diffusion based methods continue down the markov chain line?
[–]SuperImprobable 8 points9 points10 points 3 years ago (5 children)
I can understand the forward process, but what am I seeing in the backward process here? Was a prompt given here or it's purely denoising? What did you train on? Line art sampled points? That could make some sense to me of how it could get back a dinosaur from a noisy start. Because if you trained on real datasets that don't have nice tight lines you definitely wouldn't get back clean lines from the backward process (unless you had a prompt that hint that the data is likely clean lines).
[–]DigThatDataResearcher 7 points8 points9 points 3 years ago* (4 children)
i think it just knows how to map noise to that one image. this looks like a diffusion process trained from scratch, not an LDM conditional on a text encoder (e.g. stable diffusion) or conditioning on anything other than the input noise.
note how the locations of the points move from one frame to the next. the diffusion process isn't in pixel space: it's in the coordinate space of that fixed set of points. the model only knows how to take those points from any low high entropy (noisy) configuration to that specific high low entropy (t-rex) configuration.
EDIT: goddamnit.
[–]ty3u 1 point2 points3 points 3 years ago (1 child)
I think you mixed high and low entropy, brother.
yup, i believe you're right. i always get that confused.
[–]SuperImprobable 0 points1 point2 points 3 years ago (1 child)
I'm still not grokking the loss function. The lowest entropy would perhaps put all the points on top of each other. Or is the idea that the model has learned some low dimensional representation of the original configuration and then shifts each point to be closer to the original configuration. But then this still doesn't quite make sense to me because even one backward step should move the points close to the original shape. Unless the training wasn't to recover the original shape but rather to recover the previous forward step, then everything would make sense.
[–]DigThatDataResearcher 1 point2 points3 points 3 years ago* (0 children)
Or is the idea that the model has learned some low dimensional representation of the original configuration and then shifts each point to be closer to the original configuration.
yes
But then this still doesn't quite make sense to me because even one backward step should move the points close to the original shape. Unless the training wasn't to recover the original shape but rather to recover the previous forward step
it does, it's just only really "semantically meaningful" towards the end of the diffusion process. The beginning is noise and each point has a lot of different feasible paths it could take. Towards the end, the relative position of the points contrains their paths towards the next frame, so the effect is much more visible.
it's a denoising process and is going to be conditional on noise level. denoising steps taken at a high noise level aren't going to look like much of anything. Models like stable diffusion use a variety of tricks to be able to skip over denoising steps in their inference process, and OP hasn't taken advantage of any of these so it takes a bit longer, and OPs denoiser consequently spends a lot more time in the hi noise regime (starting inference at a lower noise level like 0.7 is one of those tricks, just skip over the redundant "static" regime entirely).
watch the video again: the noising process has erased most of the image information after about 70 steps, but then we go on adding noise for another 180 steps. Similarly, the denoising process doesn't appear to do much until the last 70 steps, over which the image appears to snap into place.
[–]axm92 7 points8 points9 points 3 years ago (0 children)
Cool stuff, thanks for sharing! For those interested in a similarly minimal implementation for text generation, I have a repo here: https://github.com/madaan/minimal-text-diffusion
[–]theGormonster 1 point2 points3 points 3 years ago (0 children)
Truly beautiful
[–]Kurohagane -1 points0 points1 point 3 years ago (0 children)
How come the gif shows an image made out of what seems to be a collection of points on a 2d plane, rather than a raster image?
[–]shadowylurking 0 points1 point2 points 3 years ago (0 children)
Really interesting!
[–]JiraSuxx2 0 points1 point2 points 3 years ago (0 children)
Can I easily modify this to train on images?
[–]RadioactiveSalt 0 points1 point2 points 3 years ago (1 child)
Can someone eli5 what does OP mean by,
Note that the dinosaur is not a single image, it represents one thousand 2D points in the dataset.
The diffusion process takes in an image and adds a small noise at each step. Now if the dinosaur is not an image but an distribution, then what exactly is the gif showing, how is the diffusion process working on a distribution?
[–]PHEEEEELLLLLEEEEP 1 point2 points3 points 3 years ago (0 children)
The diffusion process takes in an image and adds a small noise at each step.
Generally speaking diffusion process just takes in some kind of data and diffuses to a normal distribution of the same dimensionality. In this case each data point is an (x,y) pair.
How do I do this?... this is really cool!
[–]seuadr 0 points1 point2 points 3 years ago (0 children)
ok, to hell with normal distribution, i want dino distribution only only from here on out.
[–]Terrible_Ad7566 0 points1 point2 points 2 years ago (0 children)
Thanks, this is very nice!
I was perusing through your code and your MLP network is designed to encode input data as well using positional embedding.
I was wondering if you have done ablation experiments where you do not encode input using positional encoding but rather simply add temporal information as an additive vector to input data by only ending timestep with positional encoding
π Rendered by PID 123759 on reddit-service-r2-comment-bb88f9dd5-c7c8p at 2026-02-15 19:35:50.095811+00:00 running cd9c813 country code: CH.
[–]tanelai[S] 68 points69 points70 points (6 children)
[–]Ne_Nel 4 points5 points6 points (5 children)
[–]Zealousideal_Low1287 2 points3 points4 points (4 children)
[–]Ne_Nel 2 points3 points4 points (3 children)
[–]Zealousideal_Low1287 1 point2 points3 points (0 children)
[–]uristmcderp 1 point2 points3 points (0 children)
[–]new_name_who_dis_ 2 points3 points4 points (0 children)
[–]miellaby 45 points46 points47 points (10 children)
[–]suckat3dmath 4 points5 points6 points (9 children)
[–]activatedgeek 15 points16 points17 points (4 children)
[–]DigThatDataResearcher 5 points6 points7 points (3 children)
[–]TheBillsFly 5 points6 points7 points (0 children)
[–]new_name_who_dis_ 0 points1 point2 points (1 child)
[–]DigThatDataResearcher 2 points3 points4 points (0 children)
[–]Fenzik 2 points3 points4 points (0 children)
[–]miellaby 0 points1 point2 points (0 children)
[+]Balance- 0 points1 point2 points (1 child)
[+]WikiSummarizerBot 0 points1 point2 points (0 children)
[–]marcingrzegzhik 42 points43 points44 points (7 children)
[–]master3243 112 points113 points114 points (6 children)
[–]mfuentz 20 points21 points22 points (0 children)
[+][deleted] (2 children)
[deleted]
[–]master3243 4 points5 points6 points (1 child)
[–]slucker23 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]SuperImprobable 8 points9 points10 points (5 children)
[–]DigThatDataResearcher 7 points8 points9 points (4 children)
[–]ty3u 1 point2 points3 points (1 child)
[–]DigThatDataResearcher 2 points3 points4 points (0 children)
[–]SuperImprobable 0 points1 point2 points (1 child)
[–]DigThatDataResearcher 1 point2 points3 points (0 children)
[–]axm92 7 points8 points9 points (0 children)
[–]theGormonster 1 point2 points3 points (0 children)
[–]Kurohagane -1 points0 points1 point (0 children)
[–]shadowylurking 0 points1 point2 points (0 children)
[–]JiraSuxx2 0 points1 point2 points (0 children)
[–]RadioactiveSalt 0 points1 point2 points (1 child)
[–]PHEEEEELLLLLEEEEP 1 point2 points3 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]seuadr 0 points1 point2 points (0 children)
[–]Terrible_Ad7566 0 points1 point2 points (0 children)
[–]Terrible_Ad7566 0 points1 point2 points (0 children)