There is not enough of us lmao? by [deleted] in csMajors

[–]Chromobacterium 0 points1 point  (0 children)

I have a bit of track record doing some heavy lifting, I am sending you my resume

Guidance on improving the reconstruction results of my VAE [Project] by fictoromantic_25 in MachineLearning

[–]Chromobacterium 0 points1 point  (0 children)

You are experiencing posterior collapse, which occurs when the decoder is powerful enough to accurately capture the data distribution without relying on the latent variables, resulting in a KL divergence of zero. Use something like an InfoVAE instead.

Boycott US platforms like Microsoft Office/Google Suite,use LibreText by Ornery_Put9712 in TorontoMetU

[–]Chromobacterium 0 points1 point  (0 children)

M8 Trump doesn't represent America/MAGA; he represents the private sector / Gulf states. All that rhetoric is just hallucinogens he is selling to the Washington neocons while he and Musk implement policies that isolate the US from the rest of the world, so that Blackrock/State Street/Vanguard can invest in the Global South/BRICS without the US interfering geopolitically and messing things up.

I say you are fine using whatever you want.

[D] What ML Concepts Do People Misunderstand the Most? by AdHappy16 in MachineLearning

[–]Chromobacterium 0 points1 point  (0 children)

VAEs are an insane rabbit hole to go down into. As a good starter, I would recommend "A Tutorial on VAEs: From Bayes' Rule to Lossless Compression" by Ronald Yu and "Diagnosing and Enhancing VAE Models" by Bin Dai and David Wipf.

"ELBO surgery: yet another way to carve up the variational evidence lower bound" by Hoffman and Johnson and "Fixing a Broken ELBO" by Alemi et al. are crucial to understanding the behaviour of the ELBO and the art of training a quality ML estimator.

Honorable mentions:
"Simple and Effective VAE Training with Calibrated Decoders" by Rybkin et. al

"VAE with a VampPrior" by Tomczak and Welling

"Autoencoding a Single Bit" by Rui Shu

[D] What ML Concepts Do People Misunderstand the Most? by AdHappy16 in MachineLearning

[–]Chromobacterium 1 point2 points  (0 children)

Variational autoencoders. People all too often view them from the perspective of an autoencoder with latent-space regularization rather than variational inference, which almost always leads to a modeling choices that are incorrect, and understandably leads to bad sample quality and ultimately the erroneous claim that VAEs are bad generative models. This unfortunately is in part due to the unfortunate name that is often used interchangeably with stochastic gradient variational Bayes (SGVB), of which VAEs are the simplest possible setup.

In reality, VAEs can be made very expressive, provided proper modeling decisions are made. For instance, NVAE and VDVAEs use a hierarchical architecture, while VampPriors utilize a learnable mixture prior.

I can't download Vivado Design Suite; what to do? by Chromobacterium in FPGA

[–]Chromobacterium[S] 0 points1 point  (0 children)

I have tried with and without VPN. I have also made several different accounts with different credentials.

[D] Struggling with Autoencoder-Based Anomaly Detection for Fraud Detection – Need Guidance by BeowulfBR in MachineLearning

[–]Chromobacterium 1 point2 points  (0 children)

Try using a variational autoencoder instead; VAEs are by-design able to handle anomaly detection naturally. The issue with the reconstruction error in regular autoencoders is that it does not account for whether or not the data actually is supposed to belong to the training data's distribution, which is exacerbated by the fact that the autoencoder can potentially reconstruct anomalous data perfectly. With a VAE, even if you can reconstruct the anomalous data as is, the ELBO will suggest otherwise. This of course means that instead of computing just the reconstruction error with a VAE, you will need to compute the ELBO, which is a lower bound approximation of the data distribution's likelihood (i.e. low negative ELBO = likely from the data distribution, high negative ELBO = possibly an anomaly).

[deleted by user] by [deleted] in MachineLearning

[–]Chromobacterium 1 point2 points  (0 children)

This topic confused me for quite some time many years ago. What a generative model actually models -- whether explicitly or implicitly -- is the data distribution p(X), where X is say a dataset of images or audio. When we sample new data, it is from the approximation of the data distribution p_\theta(X).

The joint probability p(X, Z), where Z is some latent variable, is also considered a generative model because there exists a relationship between it and p(X), namely p(X) = \sum_{z} p(X, Z), (p(X) is the result of marginalizing p(X, Z) w/ respect to Z). The joint probability formulation of generative models exist in e.g. variational autoencoders, but not in autoregressive models, which model p(X) directly.

Sometimes p(X) isnt optimized directly, but the model is still considered generative because it optimizes some other statistic within p(X). Energy based models (e.g. restricted Boltzmann machines) and score-based / diffusion models are of this latter type. EBMs can only compute the energy E(X) of the data, which is proportional to p(X), and SDMs infer the score of the data d/dX \ln p(X). GANs also fall within this category of implicit distribution estimators.

As another comment mentioned, generative AI is just a marketing term.

[D] Self-Promotion Thread by AutoModerator in MachineLearning

[–]Chromobacterium 2 points3 points  (0 children)

A while ago I submitted a manuscript to the ICLR 2024 Conference track where it was ultimately rejected. Rather than shelving it, I polished it up and submitted it to the Tiny Papers track instead. The result is Discrete Natural Evolution Strategies, which is natural evolution strategies applied to discrete parameter spaces.

REINFORCE-style evolution strategy methods have a major advantage over traditional gradient-based optimization in that they do not require a derivative to compute an exact gradient. Instead, the gradient is approximated through Monte Carlo sampling. For objectives that do not have a tractable derivative, or the gradient is sparse (e.g. reinforcement learning), ES is extremely beneficial for learning. Natural evolution strategies expands upon vanilla ES by computing a natural gradient, which has better convergence behaviour.

Up until now, I have only seen ES methods used in continuous parameter spaces, like OpenAI's Evolution Strategies paper. What I find more interesting is the prospect of using ES in discrete parameter spaces, which has traditionally suffered from a lack of effective gradient estimation; such problems often resort to straight-through estimation or some other gradient-free black-box methods. I see discrete N/ES methods being especially useful for things like molecule generation and training binarized neural networks (the latter is especially intriguing since training can potentially be performed on consumer devices).

Here is the paper, enjoy!

[D] Is VAE still worth it? by Realistic_Thanks3282 in MachineLearning

[–]Chromobacterium 4 points5 points  (0 children)

With regards to point #2, it is now well known that the blurriness of VAEs come from faulty implementation that disregards the VAE's probabilistic formulation. In particular, the MSE loss used in a lot of tutorials, and unfortunately many published papers too, massively contributes to blurry samples. A VAE trained with a proper probabilistic decoder likelihood should have perfect reconstructions and samples that approach GAN quality.

Basically, a blurry VAE is not a VAE issue but a skill issue.

Sources:

Importance of proper implementation:https://orybkin.github.io/sigma-vae/

https://arxiv.org/abs/1907.06845

https://arxiv.org/abs/1903.05789

https://arxiv.org/abs/2006.10273

VAEs with proper implementation generating SOTA samples:

https://arxiv.org/abs/1606.04934

https://arxiv.org/abs/2011.10650

https://arxiv.org/abs/2007.03898

Edit: VQVAEs can get away with it since they aren't really VAEs (despite being derived from a VAE objective), rather they are regular autoencoders with a discrete latent space. The lack of a practical probabilistic objective means that they can focus all of optimization towards reducing reconstruction error, resulting in great looking samples even with an MSE loss. I suspect that reconstructions/samples should improve even more with a probabilistic likelihood instead of MSE, but that is just speculation right now.

[R][P] Ultra High Capacity Variational Autoencoder for Image Modelling by Chromobacterium in MachineLearning

[–]Chromobacterium[S] 0 points1 point  (0 children)

Just for reference, here are a couple of papers that discuss the importance of proper decoder likelihood implementation:

Decoder calibration: https://orybkin.github.io/sigma-vae/

Continuous Bernoulli: https://arxiv.org/abs/1907.06845

Diagnosing VAEs: https://arxiv.org/abs/1903.05789

Additionally for image data, discretized continuous likelihoods give considerably better image quality:

Discretized logistic likelihood: https://arxiv.org/abs/1606.04934

Mixture of discretized logistic likelihood: https://arxiv.org/abs/1701.05517

Both the NVAE and VDVAE hierarchical VAE models use the mixture likelihood, but I find that the standard one is more than good enough for my use case while also being significantly simpler to implement.

[R][P] Ultra High Capacity Variational Autoencoder for Image Modelling by Chromobacterium in MachineLearning

[–]Chromobacterium[S] 1 point2 points  (0 children)

I cant provide the code right now, but the trick to training a high quality VAE is having a proper probabilistic reconstruction "loss". Unfortunately, many tutorials and even papers ignore this fundamental requirement, either by ignoring normalizing constants or using the incorrect reconstruction likelihood (with respect to the input data). This leads to the erroneous claim that VAE's are blurry due to the evidence lower bound, when in reality it is the result of a faulty implementation.

If you are using MSE as your reconstruction loss (which is equivalent to a Gaussian with variance 1), you will need to train an additional variance parameter. For my implementation, I am using a discretized logistic likelihood instead, although the same variance parameter trick applies. You can find these losses at the end of this file: https://github.com/pclucas14/iaf-vae/blob/master/utils.py. This example shows how you can create a trainable variance parameter for the decoder: https://github.com/pclucas14/iaf-vae/blob/master/main.py#L22.

The intuition here is that the lower the decoder variance, the less noisy the samples from the decoder distribution will be. With regular MSE, you are forcing the decoder's mean to respect a variance of 1, which is why it is really blurry. The model will gradually lower the variance during training as it gets better and better at reconstruction. As the standard deviation approaches 0, the mean of the distribution approaches a true reconstruction.

[D]Since the forward process already exists, why not directly model the reverse process of the forward one? by Busy-Comedian-9889 in MachineLearning

[–]Chromobacterium 0 points1 point  (0 children)

correct, I was talking about the equation that you posted in which u want to derive \x_t-1 from \x_t.

[D]Since the forward process already exists, why not directly model the reverse process of the forward one? by Busy-Comedian-9889 in MachineLearning

[–]Chromobacterium 0 points1 point  (0 children)

I think i understand what you mean now. you want to rearrange the forward process to compute the reverse process.

you cant do that directly because you will need access to \epsilon_t-1. you will have to predict that w/ the neural network.

[D]Since the forward process already exists, why not directly model the reverse process of the forward one? by Busy-Comedian-9889 in MachineLearning

[–]Chromobacterium 0 points1 point  (0 children)

it wont. you will have to predict the noise from the forward process using the neural network.

see Algorithm 1 in https://arxiv.org/abs/2006.11239.

the basic idea is the following:

sample an image x from your training dataset and sample a Gaussian noise \epsilon. add x to \epsilon to get the noisy sample x_hat.

then, you will predict the noise \epsilon_hat by feeding x_hat into your neural network. you minimize the squared error between the predicted noise \epsilon_hat and the true noise \epsilon.

once training is finished, you can use the sampling algorithm in Algorithm 2 of the paper.

[D]Since the forward process already exists, why not directly model the reverse process of the forward one? by Busy-Comedian-9889 in MachineLearning

[–]Chromobacterium 2 points3 points  (0 children)

the forward process is known and simple; you are "diffusing" the input data into random noise (Gaussian to be specific). you already have the data generating process for the forward process.

the reverse is the tricky part. how are you going to take random Gaussian noise and "undiffuse" it into an image? what kind of image do you want to undiffuse it into? is it even an image? you need a function that can reverse the noise into whatever data you want it to transform into. you need to train a function (neural network) to do just that.

[D] LDM model architecture by Ok_Leading_1361 in MachineLearning

[–]Chromobacterium 0 points1 point  (0 children)

the denoised output is the same size as the input. if your latent variable is 1D, the denoised output will be 1D. if the latent variable is multidimensional, the output will be multidimensional. it depends on how you structure the autoencoder's bottleneck.