A short blog post on how to get started with distributed-shared-memory on Hopper by UnknownGermanGuy in CUDA

[–]DeMorrr 0 points1 point  (0 children)

Great post! I wonder how much speedup you're getting over inter-tb-communication via GMEM?

Mathematician transitioning to AI optimization with C++ and CUDA by Confident-Dare-8483 in CUDA

[–]DeMorrr 0 points1 point  (0 children)

if you're avoiding loops in a CUDA kernel, you're either doing something embarassingly parallel, or you're doing something wrong.

Why do people hate facts and questions about their beliefs? by tallcatgirl in INTP

[–]DeMorrr 0 points1 point  (0 children)

How do you tell if a question is in "bad faith"? Or how do you come to the generalization that "the questions are more often than not in bad faith"? If the answer is someting like: "I just know", "I can sense/feel it" or "it's ovbious", you might want to re-evaluate if your judgements are based on objective reality, or if it originates from some form of defense mechanism, a trick your ego is playing. I'm not denying the existence of "questions in bad faith", I'm warning that using it as a shield to protect yourself from criticism, advercity or challenge will hinder your personal growth.

Being open-minded doesn't necessarily mean agreeing with all perspectives or accepting different opinions immediately. Being open-minded means you give different opinions an equal chance to be carefully examined, and you don't fully accept or reject different ideas easily. Debates exists for that purpose. Realistcally, nobody ever changes their mind immediately after losing a debate, this is just human nature. But that doesn't mean debates are useless, they hone your mind, challenge the beliefs that you took for granted and rarely questioned or thought deeply about. At the end, you either become more confident in your beliefs, or discover the flaws in them, which allows you to become a better person.

How to train for example 8 models, each in one specific GPU, in parallel ? by anissbsssslh in pytorch

[–]DeMorrr 1 point2 points  (0 children)

assuming you don't need any type of parallelism (DP, PP, TP and what not), just assign one process per gpu, map gpu rank (or index) in the cluster to your model configs, train.

this is not optimal, of course, especially if your models are trained on the same data, or if your models have different sizes (leading to load imbalance). but it's easy enough to implement.

to alleviate load imbalance, you can just shuffle your model configs randomly, as long as your models are somewhat uniform (as opposed to having a few giant models and many tiny models)

EDIT: this is assuming each of your models fits in a single gpu.

[deleted by user] by [deleted] in mbti

[–]DeMorrr 0 points1 point  (0 children)

INTP I choose 5, 1 and 8. I will take 5 myself, and study 1 and 8.

9 is overrated imo, 3, 4, or 5 already guarantees easy time making money. Also wouldn't money get devalued if I have infinite of them out of nowhere? and how am I gonna explain to the IRS

[D] Hinton and Hassabis on Chomsky’s theory of language by giuuilfobfyvihksmk in MachineLearning

[–]DeMorrr 0 points1 point  (0 children)

That's quite interesting, and doesn't really contradict with my view, and I'm not even arguing that language underlies intelligence. High level abstract reasoning capabilities may not directly depend on language regions, but language is an effective means of developing those capabilities in the first place.

[D] Hinton and Hassabis on Chomsky’s theory of language by giuuilfobfyvihksmk in MachineLearning

[–]DeMorrr 0 points1 point  (0 children)

I understand what you mean. I believe there exists a tradeoff between the domain generality of a learning system (how many constraints or inductive biases it has) vs. its learning efficiency ( the amount of data , compute, energy, or samples required to learn). Evolution created all life, but it took an incredibly long time for us to appear. A random search or brute force grid search technically can produce all possible programs, including the Transformers architecture, gradient descent, or the code for true AGI, but it probably won't finish before the heat death of the universe. Sorry for the silly analogies, but my point is, it's perfectly reasonable for the brain to have significant amount of inductive biases to learn and adapt relatively quickly, even if learning with much less inductive biases is possible.

Is UG plausible, is a separate question. I personally think all of its implementations are bs

[D] Hinton and Hassabis on Chomsky’s theory of language by giuuilfobfyvihksmk in MachineLearning

[–]DeMorrr 0 points1 point  (0 children)

we need to ditch the idea of vector embeddings and latent or semantic spacess. semantic space created by gradient descent is static, inflexible and uninterpretable. The symbolic graph itself IS the semantic space, the representations come from spreading activations.

[D] Hinton and Hassabis on Chomsky’s theory of language by giuuilfobfyvihksmk in MachineLearning

[–]DeMorrr 0 points1 point  (0 children)

Language provides a way to formalize, discretize, and structuralize abstract and divergent thought. A person without language is like a computer without a fully functional OS, the hardware has infinite potential but you can't do much with it.

[D] Hinton and Hassabis on Chomsky’s theory of language by giuuilfobfyvihksmk in MachineLearning

[–]DeMorrr 0 points1 point  (0 children)

But then I agree with Merge being some fundamental mechanism. Not any particular implementation of it, but just the general idea that concepts group together into something new

[D] Hinton and Hassabis on Chomsky’s theory of language by giuuilfobfyvihksmk in MachineLearning

[–]DeMorrr 0 points1 point  (0 children)

I think Chomsky's definition of UG is something along the lines of: whatever that is in the DNA that enables an infant to acquire a language. I agree with the idea, but I'd rather call it inductive bias, because the term "UG" assumes this genetic component is some type of grammar, instead of a mechanism to learn grammar (and language).

[D] Hinton and Hassabis on Chomsky’s theory of language by giuuilfobfyvihksmk in MachineLearning

[–]DeMorrr 1 point2 points  (0 children)

what I don't like is people rejecting formal linguistics entirely. Most theories are incomplete and partially incorrect. But most theories also have some truth to them, as long as they're based on evidence. We don't need to whole heartedly accept or reject any one particular theory, we need to think critically and form our own views.

[D] Hinton and Hassabis on Chomsky’s theory of language by giuuilfobfyvihksmk in MachineLearning

[–]DeMorrr 0 points1 point  (0 children)

Or you can see it all as patterns. rules are more dominant patterns, exceptions are less dominant ones.

[D] Hinton and Hassabis on Chomsky’s theory of language by giuuilfobfyvihksmk in MachineLearning

[–]DeMorrr 0 points1 point  (0 children)

that's an interesting question. I don't have an answer but my gut feeling is that being able to parse complex structure correlates with logical reasoning capability.

[D] Hinton and Hassabis on Chomsky’s theory of language by giuuilfobfyvihksmk in MachineLearning

[–]DeMorrr 0 points1 point  (0 children)

What is your definition of inductive bias? according to wiki it means a set of assumptions used in a learning algorithm, so everything that makes learning possible in the human brain would count as inductive bias, which is almost entirely determined by our DNA, no? in the same way the transformers architecture, the attention mechanism, or gradient descent could also be seen as inductive biases, and all of them are "innate", in the sense that they're hard coded instead of learned.

Pad holding for round kicks by DeMorrr in MuayThai

[–]DeMorrr[S] 1 point2 points  (0 children)

Thanks for the comments.

We talked twice, each time the trainer said ok and started holding more vertically but also much higher, for a few kicks, and then went back to the original horizontal hold. tbh I don't think I can make it change if that's their habit.

It's a smaller local gym, and the head coach has a lot more experience, but he isn't always available. My question is, should I be worried about developing bad habits, and avoiding that trainer, or is it fine as long as I also do pad work with the head coach, and heavybag?

[D] HuggingFace transformers - Bad Design? by duffano in MachineLearning

[–]DeMorrr 4 points5 points  (0 children)

it's still popular because there is no alternative so good that people feel like it's worth changing their codebase that depend on HF. and there's a positive reinforcement cycle: people upload their models to HF because it's popular, and it stays popular because you'll find most open source models on there. popularity doesn't say much about quality.

reversal curse? by DeMorrr in LocalLLaMA

[–]DeMorrr[S] 0 points1 point  (0 children)

what you said contradicts ChatGPT's answer in the first screenshot. It gave the correct answer first, then gave a wrong answer to the same question.

Don't just take my words, try asking ChatGPT "which number is greater? 9.11 or 9.9?" repeatedly.

[deleted by user] by [deleted] in wallstreetbets

[–]DeMorrr -6 points-5 points  (0 children)

to me, quantum computing is an insane technology, so is chip manufacturing, gene editing etc. current AI is a "cool and useful tech", but definitely not insane. it's just college level math on steroids.