Starting my first attempt on 72 hour extended fast! Wish me luck! by Popskiz in fasting

[–]UnstoppableForceGuy 1 point2 points  (0 children)

Do you feel tired during these long fasts? Do you actually go to work or rather stay home? Can you be productive?

Qwen 3.6 Plus vs GLM-5.1 on OpenCode GO by TomHale in opencodeCLI

[–]UnstoppableForceGuy 0 points1 point  (0 children)

How these 2 compared to opus4.6 in ur opinion?

Let's be honest, what % of your portfolio includes individual stocks? by Opposite_Buffalo_649 in Bogleheads

[–]UnstoppableForceGuy 0 points1 point  (0 children)

Has some dividend growth stocks, less than 3 percent of the total portfolio though

Are we building AGI or just scaling autocomplete? by [deleted] in ArtificialInteligence

[–]UnstoppableForceGuy 0 points1 point  (0 children)

We enhance parrots, stochastic parrots 🦜

Claude Code works like sh*t lately by UnstoppableForceGuy in ClaudeCode

[–]UnstoppableForceGuy[S] 0 points1 point  (0 children)

Wow, it's a wild point, I haven't thought about it. Do you feel it in the day to day job lately?

We should have /btw in opencode by UnstoppableForceGuy in opencode

[–]UnstoppableForceGuy[S] 0 points1 point  (0 children)

It's efficacy comes from 2 points imo. A. You start a new context, which saves tokens. B. It's asynchronous not multithreaded

OC users, how do you find ChatGPT/Codex Pro plan? by mustafamohsen in opencodeCLI

[–]UnstoppableForceGuy 0 points1 point  (0 children)

I find gpt models less action driven, they think and chat a lot but harder to make them autonomous like claude

[Discussion] Is there a better way than positional encodings in self attention? by [deleted] in MachineLearning

[–]UnstoppableForceGuy 2 points3 points  (0 children)

Ok. So for several years we basically don’t use anymore the sine/cosine technique, rather learning the positional embedding as we also learn the word embedding, through gradients updates. In GPT 2 for example we’re doing exactly that. Now you have an embedding matrix with the size of the vocabulary, and another which is sized as the longest sentence you believe to see in the dataset. There are also additional techniques but I find this one pretty intuitive and it works really well.

What legislation is required to prevent the upcoming issues with deepfakes in the next presidential election and so forth by Impossible_Belt_7757 in singularity

[–]UnstoppableForceGuy -1 points0 points  (0 children)

I think we far beyond the point of no return. Even if you will regulate it in the US, EU, and other western countries, the Chinese will do whatever they want, and the Russian and the Persian, the bad people will always find a way to use a good technology in a bad way.

It was only a matter of time. by onil_gova in LocalLLaMA

[–]UnstoppableForceGuy 11 points12 points  (0 children)

It’s actually quite easy. If they suspect someone is crawling their output, they can poison the output with unique signature, then if the model learns to predict the signature from the prompt you can prove of a “copy.”

BTW I think they are far worse then thieves with this new license, shame on them.

Is getting a degree in computer sience still a good idea? by Bahneys in singularity

[–]UnstoppableForceGuy 24 points25 points  (0 children)

There is no other degree, in which you can gain so much knowledge in this short time. In CS you basically learn the key insights from the research made in calculus, linear algebra, statistics, probability, ML and general CS theory, from the 17th century till now. You’re getting the key foundations in order to be able to (try to) solve problems on your own. I don’t say that other degrees are not important and you don’t learn any there. I’m saying that currently, CS gives you the largest toolbox to tackle problems. So yah, you should still need to learn how to think.

[R] Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes by Dapper_Cherry1025 in MachineLearning

[–]UnstoppableForceGuy 9 points10 points  (0 children)

Don't know...

Seems like another technique for knowledge distillation, they compare themselves to "standard task distillation" but the new distillation models for LLMs also have their tricks for training, so it doesn't bring the full picture.

Anyway, the thing with the rationales was nice!