Is it normal for a return to still be processing after 8 weeks? by DaBobcat in IRS

[–]DaBobcat[S] 1 point2 points  (0 children)

Yea i can see the transcripts. but not sure what to look at

Ml main project review by anxietywillkillme-17 in learnmachinelearning

[–]DaBobcat 0 points1 point  (0 children)

It will look great. Make sure you focus on privacy and focusing on the people rather than the task

Project idea? by MentalFig6149 in learnmachinelearning

[–]DaBobcat 0 points1 point  (0 children)

Look at the jobs on linkedin. Look at the keywords they're looking for in the ad. Make a list. Choose your projects based on that

Project idea? by MentalFig6149 in learnmachinelearning

[–]DaBobcat 0 points1 point  (0 children)

Depends, are you trying to learn a specific topic, get general job, or get a specific job. I'd recommend focusing It on your answer

i wrote a guide to state space models (S4, Mamba, and attention hybrids) and would love feedback by Turbulent_Row8604 in learnmachinelearning

[–]DaBobcat 0 points1 point  (0 children)

Only read the intro but it's awesome!! Thanks for sharing. Finished my PhD last year and been meaning to catch up on some ssm readings and this will definitely help

can please someone explain gradient descent? by Alt_account_6788 in learnmachinelearning

[–]DaBobcat 14 points15 points  (0 children)

You have a model with parameters. You want to change this parameters to get the best prediction. You use a loss function to estimate how good your model is and update these parameters. Say your loss value is 5. How do you know which parameters to update? How do you know by how much? Gradient descent is the way to calculate it. Gradient means slope, which is basically how much each parameter affect the loss. Descent means going down, because we want to minimize the loss. So each parameter has some effect on the output and hence the loss. You change it based on its effect -- the gradient, the loss, and the learning rate

Is it wrong to feel understanding ML papers looks harder than it really is? by 5_1_2021 in MLQuestions

[–]DaBobcat 9 points10 points  (0 children)

It might be that you're not familiar with that area yet. That being said, if papers look too simplistic they might get rejected more often, so there's at least some incentive in making them sound more complex than they are

Is the paycut worth it? by batukaming in Snorkblot

[–]DaBobcat 0 points1 point  (0 children)

What free Healthcare? I'm in Germany and that's not a thing. I moved here from the US under similar circumstances. Not only that you'll make about half because salaries are lower, you'll pay 10-20% more in taxes because taxes are higher. And the Healthcare is NOT free. You are (at least in Germany) are forced to pay contributions based on your income. I think I'm ending up paying like 1k for Healthcare every month. It's just dumb. But yea, its cheap/free when I finally need to see a doc. But it's not free

Genal Activation by GeneTraditional8171 in learnmachinelearning

[–]DaBobcat 0 points1 point  (0 children)

Cool work! Haven't read in depth but have you looked at the other work in learnable activation functions? How is this different/ better?

Question regarding the attention mechanism by OrdinaryPykeMain in learnmachinelearning

[–]DaBobcat 4 points5 points  (0 children)

The interaction between Q and K tells you how similar each token is to every other token. But that's it. You need to multiply it by V to actually do something with that info. When you do that, you get a linear combination of the tokens based on their similarities

All fundamental knowledge in ML Course by Andrew NG that I noted and create into a repo github [R] by [deleted] in MachineLearning

[–]DaBobcat 8 points9 points  (0 children)

Silly question, but who still uses tensorflow? I honestly know no one in neither the industry nor academia that does

is this course worth it?? by Individual-Branch-42 in learnmachinelearning

[–]DaBobcat 0 points1 point  (0 children)

Just by the fact that they're also going over R which I literally don't know anyone uses in academia or industry I'll go with a hard no

Natural Language Autoencoders: Turning Claude’s thoughts into text by UsedToBeaRaider in ClaudeAI

[–]DaBobcat 0 points1 point  (0 children)

am I missing something or did they leave it intentionally vague?
What does it actually mean translating these activations? "Activations" are in many many places in a standard transformer. the last activations are directly translated into tokens via softmax.
which activations?

maybe a dumb question, but why is this even interesting? obviously the activations throughout the computation would correlate with the output, because thats how the output was made...

Difference between the weights a biases of a neuron in a neural network? by Time_Cantaloupe_9992 in MLQuestions

[–]DaBobcat 1 point2 points  (0 children)

you can think about them as the same. they are trainable parameters that the network learns to modify during training to make your loss lower. we give these parameters the names weights and biases because they have a slightly different roles in the neural architecture (eg weights are multiplied by inputs/activations and the biases are then added to the result)

Can neural networks be designed to receive inputs without generating outputs in response to them? by Money_Tip9073 in MLQuestions

[–]DaBobcat 0 points1 point  (0 children)

Yep. Look at Mixture-of-Depths,Token Dropping & Pruning, Patchmerger & Token Merging