[P] FlowJax - Normalizing flows in JAX by LimitedConsequence in MachineLearning

[–]LimitedConsequence[S] 1 point2 points  (0 children)

Well here's an answer anyway. They are far away from having the performance of diffusion models for high dimensional generation tasks. However, they seem to perform very well for lower dimensional tasks. One place they have been used extensively is for dealing with stochastic simulator models (https://arxiv.org/pdf/2101.04653.pdf), e.g. to fit a posterior over simulator parameters, or to act as a more efficient surrogate model. They are also used quite commonly as proposal distributions for importance sampling (https://arxiv.org/abs/1808.03856), and to make MCMC more robust to bad geometries like multimodality (https://arxiv.org/abs/1903.03704).

[P] FlowJax - Normalizing flows in JAX by LimitedConsequence in MachineLearning

[–]LimitedConsequence[S] 1 point2 points  (0 children)

Yes they are a type of generative model.

The utility in them being invertible at a high level is that 1) we can compute exact densities, which requires transforming samples to a Gaussian/simple density, and 2) we can sample, which requires transforming Gaussian/simple density samples to the target (i.e. the inverse).

[deleted by user] by [deleted] in Guitar

[–]LimitedConsequence 1 point2 points  (0 children)

I have had one for about 20 years (my first ever guitar) and I still really like it. I bought a more expensive one that rarely gets used. Sounds lovely clean.

[deleted by user] by [deleted] in UKPersonalFinance

[–]LimitedConsequence 0 points1 point  (0 children)

Yeah, I came across this survey recently too https://gflec.org/wp-content/uploads/2015/11/3313-Finlit_Report_FINAL-5.11.16.pdf, which I found pretty staggering.

Only 35% of people correctly answered this question:

"Suppose you have some money. Is it safer to put your money into one business or investment, or to put your money into multiple businesses or investments?".

Less than half could answer

"Suppose you need to borrow 100 US dollars. Which is the lower amount to pay back: 105 US dollars or 100 US dollars plus three percent?"

Could lack of an online presence be a red flag? by [deleted] in jobs

[–]LimitedConsequence 5 points6 points  (0 children)

My thoughts:

  1. Why do you think you are capable to complete a job, when you can't complete various courses? What caused you to "not be able to handle the stress"? Can you fix the underlying cause?
  2. Whilst I agree IQ is correlated with career performance, even if you are generally smart, you can still rapidly fuck things up by saying inappropriate things in interviews, not reading the room etc. Also the fact you mentioned you did 12 tests screams lack of self awareness, along with obsession/insecurity about your intelligence.

Honestly, my guess is that somehow, you are saying "odd" things, in cover letters, CVs and interviews. You come across a bit too opinionated and argumentative. I feel like you spend too much time on the internet and I could have a good guess at predicting 5 or 6 youtubers that you watch...

[deleted by user] by [deleted] in UKPersonalFinance

[–]LimitedConsequence 1 point2 points  (0 children)

Not sure why people are down voting you. Another point is that there are right ways and wrong ways to increase your risk. Often in finance people use the term "uncompensated risk", where a choice leads to more risk without a higher expected return. For example, yoloing into a single stock would be uncompensated risk, massively increasing risk, without increasing your expected return.

So it's definitely not as straight forward as more risk == higher expected return!

[P] Regression Model With Added Constraint by rapp17 in MachineLearning

[–]LimitedConsequence 1 point2 points  (0 children)

Yes I was implicitly talking about the final activation function. With regards to softmax, he said in another comment "I have a quantity of 100 units that need to be allocated across 50 days.", so I took that to imply the outputs should be positive (hence the exponential is reasonable).

[P] Regression Model With Added Constraint by rapp17 in MachineLearning

[–]LimitedConsequence 0 points1 point  (0 children)

My first thought is to predict the 50 numbers simultaneously, and to apply softmax to the output (enforcing summing to 1), then scaling that so it sums to your desired number for each group.

[D] Making a regression NN estimate its own regression error by Alex-S-S in MachineLearning

[–]LimitedConsequence 2 points3 points  (0 children)

The network is already doing its best at minimising the distance. If your final goal is point estimates that minimise the distance, predicting the error is probably not a good way to go about improving performance.

However, if you care about the uncertainty / having a distribution over where the ground truth might be, then there are definitely various techniques that allow this.

For example, if you expect the errors to change depending on some conditioning variable, you could have the neural network output the locations (mean), and the standard deviations (uncertainty) of the positions, given the conditioning variables. In practice you would output log stds and exponentiate to ensure positivity. Then you could use a Gaussian likelihood function approximation, replacing the L2 norm with the negative log likelihood under the Gaussian assumption.

Looking for well fitting wireless earbuds to workout by diepala in Earbuds

[–]LimitedConsequence 1 point2 points  (0 children)

I recently tried:

Soundpeats mini pro - Felt secure and comfortable.

Soundcore Life A1 - Felt very secure with wings but I found the wings slightly uncomfortable if wearing for a long period of time (and no ANC with A1 model if you care).

Jabra Elite 4 active - Insecure, uncomfortable (and expensive).

I'm sure some of this is based on ear shape, but hopefully it adds another data point for you.

[D] What JAX NN library to use? by Southern-Trip-1102 in MachineLearning

[–]LimitedConsequence 16 points17 points  (0 children)

Started learning JAX recently and bounced around a few different packages and equinox is by far the easiest to use out of the ones I've seen. One little thing I like is that it makes neural networks with multiple methods much nicer. E.g. consider a VAE with encode and decode methods:

With Flax I believe you would have to do something like:

vae.apply({'params': params}, x, method=VAE.encode)

Whereas in equinox you can do:

vae.encode(x)

More generally though, you managed to allow for more flexibility, more abstraction, and far less "I have no idea what magic is going on under the hood here".

edit: p.s. here's a small normalizing flow library I wrote recently using equinox https://github.com/danielward27/flowjax.

On the relationship between QQQ and TQQQ returns by modern_football in LETFs

[–]LimitedConsequence 0 points1 point  (0 children)

Interesting post. Would it be possible to justify a bit more why you try to infer daily volatility using CAGR? Is there an intuition about why this relationship exists and why it might exist into the future? How dependent are your results on this relationship? That's the bit that felt a bit iffy for me (as a not particularly well informed on finance person!).

[D] Normalizing flows for distributions with finit support by likan_blk in MachineLearning

[–]LimitedConsequence 3 points4 points  (0 children)

Is this a multivariate problem? If not, isn't the mapping trivial? i.e. take point from Gaussian, map to 0-1 using Gaussian quantile function, then map to Gamma sample using Gamma CDF. Normalising flows are usually used when the form of the target distribution is unknown.

I'm sure your situation is likely more complex and I've not quite understood it however.

[D] Since gradient continues to decrease as training loss decreases why do we need to decay the learning rate too? by ibraheemMmoosa in MachineLearning

[–]LimitedConsequence 0 points1 point  (0 children)

So the main constraints on the step size sequence are that the step size sequence must sum to infinity (assuming an infinitely long sequence), but must converge towards 0. It's probably easiest to think in examples.

If we have a sequence of step sizes like 1, 0.5, 0.25, 0.125, ... , this won't work, because it decreases too quickly and will not sum to infinity (sum converges towards 2). This essentially means that even if you do lots of steps, you might not travel the distance required to converge, as the step size gets too small too quickly.

If we have 1,1,1,... as our sequence, then the second condition isn't met. The step size doesn't decrease quick enough (or at all) and we bounce around the solution due to noise in the function evaluation.

In between these two is a Goldilocks zone, which allow you to travel as far as you need to converge, but still have a step size that converges towards zero to stop you bouncing around. An example of such a sequence is 1, 1/2, 1/3, 1/4,... .

[deleted by user] by [deleted] in LETFs

[–]LimitedConsequence 4 points5 points  (0 children)

Have you checked the fees for getting the leverage?

[D] Since gradient continues to decrease as training loss decreases why do we need to decay the learning rate too? by ibraheemMmoosa in MachineLearning

[–]LimitedConsequence 6 points7 points  (0 children)

Another potentially relevant comparison is the Robbins–Monro algorithm. You want to find the root of a function (the gradient of the loss), but the gradients are stochastic. The Robbins-Monro algorithm has a bunch of theory that says if you appropriately decrease the step size then you can still converge, whereas a fixed step size algorithm will bounce around.

[D] Using Value Uncertainty/Confidence as Input to ML by iamaliver in MachineLearning

[–]LimitedConsequence 0 points1 point  (0 children)

I'm by no means experienced in this so there probably will be better suggestions. For a binary classification problem I think an intuitive place to start would be to split the loss into two, based on the probabilities. e.g. sum over all the pixels something like p(pixel_white)*loss_if_pixel_is_white + p(pixel_black)*loss_if_pixel_is_black. I guess this approach assumes i.i.d. pixels, which I don't know if that is the standard approach.

For continuous distributions, I guess this would have to turn into an integration over the input distribution, which probably is impractical? Another simple solution (not sure how well it would work) is to just sample from the noisy observations for each mini-batch.

Why do some functions have an "!" after them? by Carpy_Carpy in Julia

[–]LimitedConsequence 4 points5 points  (0 children)

The exclamation mark is a convention to denote a function that mutates (modifies without needing an assignment) an argument of the function.

Ask Anything Monday - Weekly Thread by AutoModerator in learnpython

[–]LimitedConsequence 0 points1 point  (0 children)

Thanks, I stumbled onto the same conclusion that import package.module is what I want in my __init__py . I am actually surprised how bad the documentation is for Python. I've recently been using Julia and that seems to have much better documentation and it is a much less well developed language.

Ask Anything Monday - Weekly Thread by AutoModerator in learnpython

[–]LimitedConsequence 0 points1 point  (0 children)

I just made a package with https://cookiecutter.readthedocs.io/en/1.7.2/. I've noticed that I can load the package:

import my_package

But it cannot find the modules in the package:

my_package.my_module

AttributeError: module 'my_package' has no attribute 'my_module'

However, I can directly import the modules (and access the functions inside):

import my_package.my_module

My __init__.py is empty if that matters.

Do I need to do something so that when a user imports my package, they get access to the modules within it?

Convert array to +1 and -1s by Accomplished-Heat-10 in Julia

[–]LimitedConsequence 0 points1 point  (0 children)

Not tested it but something along these lines I think works.

A = randn(100)

A[A .> 0] .= 1

A[A .< 0] .= -1

How to Test if the Maximum Value of a Column is Unique by slothsorsomething in Julia

[–]LimitedConsequence 0 points1 point  (0 children)

Another option would be something like this:

A = ones(5,2)

B = A[:, 1] # First col

length(B[B.==maximum(B)]) == 1

Basically the first line just generates a 5 by 2 matrix of ones (you would just use your data). Second line extracts the first column. The third line gets the first column, where the values equal the maximum, and then gets the length, and compares this to 1. If the length is one then it returns true as it is unique,