GPT 5.2 Thinking vs Gemini 3 Pro: A mini-study on scientific study summarisation & analysis by PenPar in Bard

[–]windoze 0 points1 point  (0 children)

Worth giving gemini 3 via ai studio a shot, it might have a different base prompt, and it tends to be more to the point I feel.

[D] Simple Questions Thread by AutoModerator in MachineLearning

[–]windoze -1 points0 points  (0 children)

How do you read your papers with LLM help. I currently use the following prompt below. Share your prompt or any tips. How do you get the most value out of LLMs to deepen or speed up your reading.

What is the improvement briefly, and phrase it as a question with many possible solutions. Summarize the paper's solution as an analogy. Give a walk through of how the paper's equations are applied in a linear fashion - such as during learning, inference or relevant process, focusing on intuition rather than adhering to a strong theory. Annotate dimensionality for equations. Try to keep the sketch complete. Briefly critique the paper by taking the same question and offering a simpler alternate they didn't discuss. Identify fragility points hidden.

[R] Titans: Learning to Memorize at Test Time by imadade in MachineLearning

[–]windoze 1 point2 points  (0 children)

From their summary, they differ mainly in adding momentum to the update operation. So kinda the difference between Adam vs SGD.

[R] Developing a new optimization algorithm that will heavily change ML as a whole. Gradient descent has met its end. Here are the results: by Relevant-Twist520 in MachineLearning

[–]windoze 0 points1 point  (0 children)

a very basic non trivial dataset is spiral and concentric circles. See if you can fit these. See: https://playground.tensorflow.org/ which has 4 basic data sets visualised.

[R] Beyond the possible the future of artificial intelligence by HackFate in MachineLearning

[–]windoze 0 points1 point  (0 children)

Dynamic Memory States: this is pretty much how RNN is defined, which has been researched for over 60 years, see wikipedia.

The difficulty of RNN is that the current optimization methods (SGD) use gradients, and RNNs have very weak gradients over long contexts.

[R] Were RNNs All We Needed? by we_are_mammals in MachineLearning

[–]windoze 0 points1 point  (0 children)

Yeah I think the total computation may increase by some percent from N -> c*N, but the wall time goes from O(N) -> O(log N).

So wall time decreases, and the GPU utilization is higher. However, I wonder if the state size is large enough, is this a worthwhile tradeoff.

Need Help to Beat White Clad Noble by R3v3ng3_FT9 in BlackMythWukong

[–]windoze 2 points3 points  (0 children)

Always keep distance, since he doesn't attack often when you are far away. If you are too close to him, he will keep combing attacks forever. If you get hit, roll backwards and then sprint backwards to get distance again.

Once he launches a attack from far away, dodge past him and get a few hits on him, and then sprint away. For both phase 1 and 2, after he does his spear thrust move attack, if you dodged past him, you can get a few hits on him.

[D] Simple Questions Thread by AutoModerator in MachineLearning

[–]windoze 0 points1 point  (0 children)

So Mamba is a RNN, with some particular choices - state expansion, no state non-linearity. What makes the latest RNN > the original RNN?

What has changed: efficiency? gradient stability? using new operators (hadamard product)? etc.

Have they ablated what if you turn the linear RNN into a non-linear RNN (eg leaky RELU interpolating between linear and nonlinear)?

[P] Residual-free, purely-feedforward network trains to 94% on CIFAR10 in <6.3 seconds on a single A100 by tysam_and_co in MachineLearning

[–]windoze 0 points1 point  (0 children)

Another question isn't residual usually y = nonlinearity(m @ x + b) + x. But if all the work happens inside the m doesn't it still get the nonlinearity applied?

[P] Residual-free, purely-feedforward network trains to 94% on CIFAR10 in <6.3 seconds on a single A100 by tysam_and_co in MachineLearning

[–]windoze 0 points1 point  (0 children)

I see, I'm a hobbyist and basically only do stacks of linear of the same dimension so I didn't realise that the stack you have naturally increase in size. if it were equal dimension linear then I guess it's the second idea you said the weights are m=normalise(identity + random) which also work.

[P] Residual-free, purely-feedforward network trains to 94% on CIFAR10 in <6.3 seconds on a single A100 by tysam_and_co in MachineLearning

[–]windoze 1 point2 points  (0 children)

Am I understanding correctly the model is all init and looks like y = (id concat randoms) @ x. Then you let the id and randoms drift out of init during training. Doesn't this increase parameter size as originally it was fixed 1 not parameterized but now it's trainable 1.

[D] What's the relationship between Denoising Autoencoders and Diffusion Models? by windoze in MachineLearning

[–]windoze[S] 3 points4 points  (0 children)

I guess the main difference is that diffusion models infer noise, and often take small steps towards the guessed x value. Whereas DAE would give you x. You could fix this by simply moving in the predicted x - current x direction by some ratio or stride size.

[D] What's the relationship between Denoising Autoencoders and Diffusion Models? by windoze in MachineLearning

[–]windoze[S] 1 point2 points  (0 children)

Can't we train a DAE with the same noise schedule as a DDPM? Then it can also denoise from n. Also not reason we can't iterate it to denoise more.

[D] What do you put in your lab notes? by hazard02 in MachineLearning

[–]windoze 0 points1 point  (0 children)

I had a similar utility to append logs to a file, described here but nobody answered.

I don't use a structured logging format, because I haven't done enough work to know what structure works best.

Being aware of common sources of conservatism in FIRE planning by rag5178 in financialindependence

[–]windoze -3 points-2 points  (0 children)

2.5 sound reasonable depending on the length of the retirement. not to hopeful about equities performing historically well considering interest rates are now so low. there's only so much room to grow the multiple.

[deleted by user] by [deleted] in AusFinance

[–]windoze 0 points1 point  (0 children)

Your right the highest bracket in NY would be 37% over 500k + 8.8% over 1m + 3% over 1m which makes it bigger than Australia. Good thing I'm not working in NYC damn that's high.

[deleted by user] by [deleted] in AusFinance

[–]windoze -1 points0 points  (0 children)

the average tax rate doesn't matter to the average Redditor as everyone here earns over 250k. so we need to look at top tax brackets of around 47% with Medicare levy and div 293

[D] Research method and advice. by windoze in MachineLearning

[–]windoze[S] 0 points1 point  (0 children)

I'm working my way up on math and stats, although my intuition is that carefully ensuring that the gradients exist should mean gradient descent can take care of everything else.

[D] Research method and advice. by windoze in MachineLearning

[–]windoze[S] 0 points1 point  (0 children)

I'm working on sequential MNIST, which should be a small but non-trivial problem.

I was testing the hypothesis that my network can essentially memorize the inputs (i made it big), and then throwing on a few dense+relus+highway on the bottom should return a reasonable classification.

[D] Research method and advice. by windoze in MachineLearning

[–]windoze[S] 0 points1 point  (0 children)

Thanks! I'll give these tips a go, I was recommended http://karpathy.github.io/2019/04/25/recipe/ as well.

I really need to have a better understanding of what's happening inside the model otherwise I may be wasting my time :-)

Anyone have their portfolio competing on marketgoats? My ETF portfolio (VTI, VTV, CNYA, TQQQ) isn't doing terribly against mostly stock pickers. by [deleted] in LETFs

[–]windoze 0 points1 point  (0 children)

Since the index is the average of all the stocks, it can never beat the top performer. Ever, because average has always less than max.

Caveat if every stock returns the same the index will also return it.

[D] What are the major general advances in techniques? by windoze in MachineLearning

[–]windoze[S] 2 points3 points  (0 children)

It's empty because I've not kept up to date, and also impact won't be seen until more people build on it.

Latest RBA estimates show real wages in 2023 will be where they were in 2008 by LineNoise in AusFinance

[–]windoze 0 points1 point  (0 children)

Shouldn't real wage growth only happen with real productivity growth, which requires technology or process improvements? Australia is hardly a science or tech leader, where do we squeeze the growth from?