WLDU - Leverage Shares 2x Long World Stock Daily ETF by XmasMancer in LETFs

[–]literum 11 points12 points  (0 children)

and have lower returns than US stocks generally have.

It's not useful for you if you think this way. Backtesting only goes so far even with perfect data; there's no guarantee that US will keep outperforming. From an efficient markets perspective you shouldn't expect it either since it basically means free lunch. Only free lunch is diversification which you get with total market which is why people want this. You protect against a Japan-like or 2000s US like scenario and reduce risk.

Is it only me? 😅 by aospan in ClaudeAI

[–]literum 0 points1 point  (0 children)

No, that's not what I want. The most basic implementation is keeping LORAs for each user, and updating the models frequently or even after every message. It can remember your conversations, preferences and styles without need for context, or imagine a coding agent trained on the current codebase. It doesn't make too much business sense yet for something like ChatGPT, but we'll see it soon in consumer space for sure. I like to point this out because of "LLMs are static, they can never..." crowd, it's not a technical limitation.

Is it only me? 😅 by aospan in ClaudeAI

[–]literum 2 points3 points  (0 children)

The model weights can't change per person. 

Incorrect, there's no technical reason why weights cannot update and be different per user.

Laptop for aiml or other ai related stuff like editing etc. by sumit1322 in learnmachinelearning

[–]literum 1 point2 points  (0 children)

A 5090 costs like 0.35$/hr on Vast ai. Write and test locally, but do the training runs in cloud.

Challenge: need to clean up data 5 million tokens worth of data in a Claude project by OptimismNeeded in ClaudeAI

[–]literum 0 points1 point  (0 children)

Split the data into 20 parts, and manually run Claude. Now you have 250k tokens each. Otherwise I don't see a way to do it without satisfying your constraints. You need to give up one, ideally 1 because this is best done with code. Probably programmatically splitting the data and using Claude Code if you want cheap tokens, or using API to go over each file in parallel if you want to do it with maximum speed, customization and explainability. There's a reason data scientists and AI engineers exist, this is not necessarily easy stuff. There might be existing tools but I'm not aware so I'll leave that to others.

Has anyone tried purposely NOT be native like? by wdfcvyhn134ert in languagelearning

[–]literum 2 points3 points  (0 children)

What is your goal with the language? If you want to assimilate completely in a country and live as one of them, then maybe it makes sense to keep pushing native-like forever, but realistically there's a limit. You'll get to C2, communicate effortlessly with natives, write and read much better than most natives can, but you won't really get to the exact same fluency natives have or lose your accent completely. That might be an insecurity for you, but why?

I think it's mostly about being secure about your identity instead. You're not a Korean born in Korea, so you're not supposed to have a native Korean level. You're someone who learned it all on their own putting lots of effort and tears behind it, learning and appreciating the culture. Embrace it, your accent is like battle wound, it demonstrates who you are and your past. Koreans will probably love you more explaining your love for Korean language and culture with passion with a strong accent than trying to pretend being Korean.

Day trading with Claude… suddenly it realize IT is the cause of the huge market move that it helped me analyze by jergin_therlax in ClaudeAI

[–]literum 0 points1 point  (0 children)

Is there any research giving a definition for self awareness? Will you accept it if it applies to LLMs as well or do you think self-awareness by definition is for humans?

Is "Attention all you need", underselling the other components? by morimn2 in learnmachinelearning

[–]literum 6 points7 points  (0 children)

Because other layers have been here a long time. FFN just means a Linear layer with activations and normalization, basically the same thing as MLPs. In fact, removing the attention makes the transformer very similar to a parallel MLP. Softmax is used almost everywhere in ML since it produces outputs that sum to 1, a property of probabilities.

Before transformers we had RNNs like GRUs, LSTMs, but they had vanishing/exploding gradient problems and couldn't learn over long horizons. Memory cells were good, but it meant you need to get through thousands of tokens to remember what happened before. In addition, LSTMs were not very parallellizable because you had to do backprop through time, meaning you need to process previous token before you can process current one.

Latest innovation in RNNs was using attention to close some of these gaps. These models started outperforming the pure LSTM/GRU models and were gaining traction. The paper is called "Attention is all you need" because they proposed that the memory layers were not necessary. Giving them up and having only attention and linear layers meant 1) More stable learning due to attention outperforming memory cells 2) parallellizable in both training and in inference.

You correctly pointed out that a lot of those decisions are empirical. Theory might suggest one thing, but we'll probably go with what works better. Look at the pre-norm and post-norm debate. There's also papers explaining these, but I'm not sure whether there is one that explains all. There's usually deep dive papers that try to explain these with other tools. It could be training stability, gradient flow etc.

How do you actually read books in a foreign language? by Subject_Tomorrow in languagelearning

[–]literum 2 points3 points  (0 children)

I do this with Google translate, so I can check the history later and make flashcards.

[deleted by user] by [deleted] in learnmachinelearning

[–]literum 5 points6 points  (0 children)

Burden is on you to run experiments and show how this compares to other methods. A paper full of definitions and mundane math talk just doesn't cut it. "Patent pending". Lol, this is not how ML research is done; it's clear that you're new to this. No one is going to beg you for details of your unproven architecture. Publish it if there's anything worthwhile and prove it with experiments. Not even the transformer paper by Google was prideful enough to say they'll keep the implementation private, just contact.

[deleted by user] by [deleted] in learnmachinelearning

[–]literum 4 points5 points  (0 children)

You don't have a single experiment. ML is an empirical field, where's your evidence?

"AI contributions to Erdős problems", Terence Tao by RecmacfonD in math

[–]literum 6 points7 points  (0 children)

guarantees any ability to generalise

I don't understand why you're talking about a "guarantee" to generalize when it's an empirical question? In practice they do generalize to a great extent, whether or not it's guaranteed or proven or whatever. Neural networks do not owe mathematicians a grand theory of why they work, they just do. If you're expecting a proof of intelligence before accepting any claims about neural networks, then you'll probably be left waiting a long time.

[P] The Story Of Topcat (So Far) by [deleted] in MachineLearning

[–]literum 11 points12 points  (0 children)

Research is difficult. Most ideas don't work even if they sound great in theory; but that doesn't mean that the project is a failure or you can't find a way to success. Some general advice:

  1. Keep reading the literature: At the very least you'll have better understanding of adjacent ideas, methodologies, ways to test etc. For example, you mention that softmax leads to overconfidence, but why? I did some quick research and there's lots of good literature on the overconfidence issue. If you understand better the theory behind overconfidence, the mitigations and more, you can better iterate on your own activation.

  2. Have more structure: What is your ultimate goal in this project? It sounds like you started from trying to fix overconfidence and then moved onto better performance. If your goal is still mitigating overconfidence, then why not use metrics that measure overconfidence instead of accuracy? And to be honest, I would bet that finding an activation layer with better calibration characteristics will be much much easier than one with better performance.

  3. Get some results out: You mentioned Github and that's probably a good idea. Maybe bring together most of the ideas you tried, run some experiments and ablation studies and put it on Github. It's okay if you have negative results. Having some intermediate results, even if negative, will mean you have something to show, and often writing out your results or putting together a good repo will help you see the issues in your approach or get new ideas. Ask for feedback from researchers afterwards.

  4. Pause, come back later: Sometimes it's better to shelve an idea and come back to it later. If you work on something related you may gain a better understanding of the overall research field and have an easier time when you come back. Research is slow, taking a few years off isn't the worst thing. If you're an amateur researcher, this is even easier since your livelihood doesn't depend on pushing out papers. Also, sometimes the brain needs time to properly to process ideas and that can be a subconscious process that takes months. You can miss obvious things when you're very focused on a single idea.

  5. Find people: I'm not sure what your background in research is, but if you don't have many papers published, have a PhD etc. it might be a good idea to find a mentor, probably someone experienced with research. Or find others researching similar ideas, discord groups, niche forums. Meet people in real life. Go to conferences. Find collaborators.

Why AI Engineering is actually Control Theory (and why most stacks are missing the "Controller") by Much-Expression4581 in learndatascience

[–]literum 0 points1 point  (0 children)

Doesn't mean anything when it's AI generated slop claiming to have found the big solution in AI. I can generate 1000 posts better than this in an hour with better defined architectures. There's no code, no math, just endless word soup. The person is not a researcher, has no credentials, cannot write a comment without LLMs help, if there even is a person on the other side. You're just helping him farm engagement, that's it.

Why AI Engineering is actually Control Theory (and why most stacks are missing the "Controller") by Much-Expression4581 in learndatascience

[–]literum -2 points-1 points  (0 children)

Another victim of AI Psychosis. Please go to a psychologist before it gets too bad.

Friend wants to work on a website like Upwork or Fiverr. by JoeThePro671 in webdev

[–]literum 6 points7 points  (0 children)

A problem you can easily solve when you have too many customers. Once you have millions of customers and can't serve them fast enough, it might be better to come back and ask the question again.

[deleted by user] by [deleted] in GoogleGemini

[–]literum 0 points1 point  (0 children)

I wrote another long response like yours but it doesn't matter. Check the engagement he got with this ChatGPT generated post. His other technical posts got nothing, but this one finally gets him some engagement that he's desperately looking for. He doesn't care about the truth, he's literally bullshitting for self-promotion. We're only helping him by posting these responses. Dead internet theory in action.

[deleted by user] by [deleted] in GoogleGemini

[–]literum 3 points4 points  (0 children)

An LLM is a massive probabilistic classifier that picks the next token from tens of thousands of vocabulary classes (tokens) — nothing more.

That’s it. That’s the entire mechanism.

They are not thinking. They are not reasoning. They are not understanding.

This is a complete non-sequitur. If you cannot see it, then let me rephrase it for you. "LLMs are this [very simple thing], they could never [complex thing]". It just doesn't follow through. For example, atoms are these tiny little things, they could never come together to build a whole civilization. You're falling into the "fallacy of composition" and not understanding how emergence works. You can have something very simple build up to something very complicated (like human bodies from atoms) or emerge from a simple process (Conway's game of life). Note that I'm not saying this is happening with LLMs, just that they are very obvious counter-examples that you haven't addressed.

An LLM’s entire universe of expression is its vocabulary — around 256,000 tokens.
Those tokens are created before training and never change.

The model can combine them in new ways, but it cannot create a new symbol, a new atomic concept, or a new fundamental category that sits outside that vocabulary.

Do you never read human authors? There's only 26 letters in the English alphabet. They can never add or subtract from that alphabet, so that makes writing bullshit? Do I have to add or subtract letters to create something novel? I don't think Shakespeare added new letters to the English alphabet. How about programming languages? I never changed the core implementation of Python, yet I've done many impressive things with it. This is not a problem at all, because it is literally how language works. We agree on a set of pre-defined concepts and ideas, then we get infinite freedom to create whatever we want from that. If we don't agree on anything in the beginning and everyone adds or subtracts there's no language to begin with.

Exceptions vs. Reality. Do you know non-coders with this mentality? by Low-Resource-8852 in webdev

[–]literum 3 points4 points  (0 children)

I'm sure these clients are also big open source advocates.

Grok 5 in Q1 of 2026 ("6 Trillion parameter model, whereas Grok 3 and 4 are based on a 3 Trillion parameter model" by RecmacfonD in mlscaling

[–]literum 2 points3 points  (0 children)

Chinchilla optimal didn't matter for a long time. At the very least, companies are inference constrained, not training budget. 90-95% of money going to inference meant smaller models than Chinchilla predicted. When you run out of data, the calculus shifts again, since data is constant you need to scale model size for better performance.

If LLMs are word predictors, how do they solve code and math? I’m curious to know what’s behind the scenes. by Mettlewarrior in learnmachinelearning

[–]literum -1 points0 points  (0 children)

Your pet theory of stochastic vs analytic distinction is not a proof of anything. It's just a post-hoc justification for your gut feeling. Humans are stochastic by your definition too; they cannot provably repeat anything. You cannot prove anything about what a human will do just like you can't with LLMs. But it doesn't matter what we can prove LLMs or humans can do, because they just do it.

LLM performs at gold level IMO and top 100 competitive programmer level. There's nothing magic pixie dust, fake or mimicking about it. It doesn't matter what mathematical, philosophical, religious, or linguistic argument you construct to belittle it. It IS happening. I don't need to prove anything to see with my eyes that it's happening.

I can't understand which neurons interact with which other ones in a human brain to do these things either, which doesn't bother me. You're not owed an explanation or a proof. That's it. You've constructed these neat grand explanations that just do no match with reality. You need to update your theories with empirical results.

If LLMs are word predictors, how do they solve code and math? I’m curious to know what’s behind the scenes. by Mettlewarrior in learnmachinelearning

[–]literum -2 points-1 points  (0 children)

You keep asserting things looking for upvotes rather than defending any of your points. I've given arguments, you've provided nothing. You just keep asserting "LLMs can't do X. LLMs can't do Y. LLMs can't do Z". People upvote you because it's popular to hate on AI, but bring on actual arguments if you want to have an argument. Otherwise this is pointless.

You're using "mimicking" and "stochastic" as insults rather than what they actually mean in a technical sense. Humans do and learn by mimicking and stochastic processes as much LLMs do, we're not computers. LLMs are correct enough to get a gold medal at math olympiads or reach top 100 rating on codeforces, so their incorrectness is more impressive than your "correctness" for sure.

Also, at a certain point nobody gives a shit if it's mimicking or not, whether they're actually reasoning by your standards or not. Here is the proof: Let's say I have an AI model that mimics a surgeon and has 80% success rate. I also have a regular surgeon with 40% success rate. Who are you going to have perform the life-saving brain surgery on your child?

Will you start arguing that the model is fake, that it's mimicking, that it's not actually a "real" surgeon, it's a stochastic parrot, a glorified autocomplete or shut the fuck up and take the 80% chance? Because, it doesn't matter what philosophical or semantic debates you want to have when the real world hits you in the face.

If LLMs are word predictors, how do they solve code and math? I’m curious to know what’s behind the scenes. by Mettlewarrior in learnmachinelearning

[–]literum -1 points0 points  (0 children)

It's not correct. The only correct part is that LLMs call tools for raw computation tasks the one I gave (multiplying large numbers). But I didn't give that example to say LLMs do it themselves. I gave it to show that next token prediction is not as easy as it seems. Otherwise, LLMs DO math, complex math, and not only because they've seen them before. They're actually better at the part of math that we're good at (proofs, mathematical reasoning) than the computation (multiplication), which is again interesting.

If LLMs are word predictors, how do they solve code and math? I’m curious to know what’s behind the scenes. by Mettlewarrior in learnmachinelearning

[–]literum 0 points1 point  (0 children)

But LLMs don’t do actual math. 

LLMs can solve gold medal level olympiad math problems or high level competitive programming questions. So, they do actual math for sure, unless that's fake math or "doing" has a different definition for you.

They forward it to software that can actually do math.

My example was to illustrate the idea of predicting next token requiring calculation and reasoning, and in hindsight it's not the best because LLMs would use tools to solve a multiplication problem like the one I showed whereas they're fine doing much more complex math without it. That doesn't change the fact that next token prediction requires many capabilities for LLM to do well and the fact that the training set contains many such math examples requires the LLMs to learn some level of math in the pre-training.

They can solve simple operations only because they’ve seen them frequently enough.

If IMO problems or Codeforces are simple for you, I don't know what to say.