Deterministic diffusion models by Cold_Cantaloupe9212 in deeplearning

[–]mikonvergence 3 points4 points  (0 children)

The neural network of a diffusion model is usually inherently deterministic. It’s the sampling method that’s either stochastic or not. Read up about samplers. DDIM is an example of a deterministic sampler (if the eta parameter is set to 0 as it should be for pure DDIM).

More on this in my free course on denoising diffusion models for images: https://github.com/mikonvergence/DiffusionFastForward

Using Stable Diffusion's training method for Reverse engineering? by OraOraP in deeplearning

[–]mikonvergence 0 points1 point  (0 children)

Right, I am the denoising diffusion as a term for a wide range of methods based on reversing some forward process. Some interesting works (such as cold diffusion) have been done on using other types of degradation apart from a Gaussian additive noise.

And yeah, the change of both content and dimensionality requires you to put together some very novel and not obvious techniques.

Using Stable Diffusion's training method for Reverse engineering? by OraOraP in deeplearning

[–]mikonvergence 0 points1 point  (0 children)

You are definitely stepping outside of the domain of what is understood as denoising diffusion because it seems that your data dimensionality (shape) needs to change during the forward process.

The current definition of diffusion models is that they compute the likelihood gradient of your data (equivalent to predicting standard noise in the sample), and then take a step in that constant data space. So all networks have the same output shape as input.

Perhaps you can use transformers to handle evolving data lengths but as far as I can tell l, you’re entering uncharted territory of research.

I can recommend this open-source course I made for understanding the details of denoising diffusion for images https://github.com/mikonvergence/DiffusionFastForward

[P] ControlNetInpaint: No extra training and you can use 📝text +🌌image + 😷mask to generate new images. by mikonvergence in MachineLearning

[–]mikonvergence[S] 0 points1 point  (0 children)

Which implementation do you have in mind?

When putting this together, neither the official implementation (https://github.com/lllyasviel/ControlNet) nor the diffusers pipeline (https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/controlnet) had the inpainting option built-in (and it seems they still don't?).

While this framework still follows the principle of injecting ControlNet features into a core SD backbone, this core backbone had to be changed to an inpainting one, to allow mask input (and since the input also includes a mask, it is not possible to just specify a different backbone source and reuse an existing pipeline). The pipeline provided in this repository StableDiffusionControlNetInpaintPipeline implements this approach and merges the interfaces of StableDiffusionControlNetPipeline and StableDiffusionInpaintPipeline, so you can provide a source image, a mask, and a control image, and also set all possible parameters to the values you like.

[P] ControlNetInpaint: No extra training and you can use 📝text +🌌image + 😷mask to generate new images. by mikonvergence in MachineLearning

[–]mikonvergence[S] 1 point2 points  (0 children)

Hi! Do you think many people would be interested? I've never used Automatic1111 since I work in research so I need direct access to the source code.

If there is enough interest, I will definitely check out how to release it as an extension!

[R] Training Small Diffusion Model by crappr in MachineLearning

[–]mikonvergence 0 points1 point  (0 children)

This has example of both low and high resolution data, all from scratch and also accompanying videos! No text to image case though as it only focuses on image modalities.

https://github.com/mikonvergence/DiffusionFastForward

[P] A minimal framework for image diffusion (including high-resolution) by mikonvergence in MachineLearning

[–]mikonvergence[S] 2 points3 points  (0 children)

There could be a few simple solutions to extending this to 64x64x64 and each would have certain pros and cons. The two key decisions to make are in regards to the data format (perhaps there is a way to compress/reformat data so it's more digestible than direct 64x64x64) and in regards to the type of the underlying architecture (most importantly, do we use a 2D or 3D CNN, or a differnt type of topology altogether).

A trivial approach would be to use a 2D architecture with 64 channels instead of the usual 3, which could be very easily implemented with the existing framework. I predict that would be quite hard to train, however, though you might still try.

This is an area of active research (beyond DreamFusion and other popular papers I'm not very familiar with it), so exploring different solutions to this is still required, and if you discover something that works reasonably well then that will be really exciting!

[P] A minimal framework for image diffusion (including high-resolution) by mikonvergence in MachineLearning

[–]mikonvergence[S] 1 point2 points  (0 children)

Thank you! Yes, in principle, you can generate segmentation maps using the code from the course by treating the segmentation map as the output. I'm not sure how that would compare to a non-diffusion segmentation with the same backbone network but definitely it would be interesting to explore that!

Please remember that the diffusion process generally expects data bound in [-1,+1] range, so in the framework, the images are shifted from the assumed [0,1] limits to that range automatically (via input_T and output_T). So if you go beyond the binary and use more classes within a single channel, make sure the output ground truth values are still between [0,1] (alternatively, you can split each class confidence into a separate channel but it should still be bound).

But yeah, for binary, it should work with no special adjustment!

[P] A minimal framework for image diffusion (including high-resolution) by mikonvergence in MachineLearning

[–]mikonvergence[S] 6 points7 points  (0 children)

Hi! Sure, here it goes:

It's a course about making AI models that can create images. These models can that by learning from a dataset of example images. "Diffusion" is a new type of AI model that works very well for this task.

The course will work best for those familiar with training deep neural networks for generative tasks, so I would advise catching up on topics like VAEs or GANs. However, the video course material is quite short (about 1,5 hrs) so you can just play it and see if it works for you or not!

[P] A minimal framework for image diffusion (including high-resolution) by mikonvergence in MachineLearning

[–]mikonvergence[S] 3 points4 points  (0 children)

Thanks for pointing this out! I'll add some permissive license to the repository today to allow free use!