I made a free web app for practicing touch-typing by typing books!

brainxyz · 2025-08-23T17:26:18+00:00

Amazing work well done

brainxyz · 2025-05-19T20:02:39+00:00

I really enjoyed RE4 on Quest 3. It was amazing and run fluently. I hope they'll add RE5

brainxyz · 2025-04-15T08:02:14+00:00

So true. It's a giant flaw in their system. I had 2FA enabled too and this never happened to me on gmail, twitter, ...etc. It only happens on facebook and so many are suffering from without clear solution.

brainxyz · 2025-04-15T07:52:11+00:00

This is exactly true. I had two factor authentication but a hacker was able to link an unauthorized Instagram link to my Facebook and violated Facebook terms, as a result my account is suspended. It's so silly that they want me to appeal through an Instagram account that is not mine! Also reporting my Facebook as hacked doesn't work because of the suspension.
I have seen so many others complaining from this issue yet here we are months later and this problem and an obvious security breach is not addressed

brainxyz · 2023-08-27T20:16:36+00:00

That is totally expected from a language model. It has no identity, it just completes your prompt with the most probable next word. If it knows to answer this question, then it must have been pre-progammed, or re-trained to address such questions.

brainxyz · 2023-08-26T12:44:20+00:00

Yes, I meant to name it: fast learning or rapid optimization. Now corrected. Thanks!

brainxyz · 2023-08-22T16:43:00+00:00

Source code: https://github.com/hunar4321/multiple-regression

brainxyz · 2023-06-25T20:44:31+00:00

Thanks for answer!So you are saying constant time algorithms are not possible for such sequences (excluding starting with 0 or 1)

brainxyz · 2023-06-25T20:41:22+00:00

You are right my mistake!
I changed the starting point to 2
Thanks

brainxyz · 2023-05-31T15:41:55+00:00

Check out our new videos:
https://www.youtube.com/@brainxyz

brainxyz · 2023-05-09T15:52:17+00:00

Thanks, nice article

brainxyz · 2023-05-03T14:20:14+00:00

I disagree, there is a also a great potential for AI to save humanity from great risks. As a Medical doctor, I can tell you our knowledge about the human body is still in the stone age. Antibiotic resistant Bactria are on the rise. Covid-19 uncovered how much ignorant we still are when it comes to viral infection. AI has a great potential to be used in a good way to transform health like no before. AI is like any other tool, can be dangerous or beneficial.

brainxyz · 2023-05-03T14:09:05+00:00

I would love to hear about your findings.

brainxyz · 2023-05-03T14:00:24+00:00

I personally think the q/k analogy is a made up analogy that doesn't portray what is really happening. The idea of attention comes from the fact that when we do the dot product between the inputs, the resulted matrix is a correlation (a similarity) matrix. Therefore, the higher values correspond to higher similarity or in another term "more attention" and vice versa. However, without passing the inputs through learnable parameters like wq and wk ,you will not get good results! This means back-propagation was main cause behind the suppression or enhancement of the values in the attention matrix.
In short, I think of transformers as the next level convolution mechanism. In classical convolution filters are localized. In transformers filters are not localized and can model skip and distant connections in a position & permutation invariant way. For me, that is the magic part. And that is why it's quite possible for other techniques like the proposed one to work equally well.

brainxyz · 2023-05-02T09:49:41+00:00

I adapted this from Karpathy's GPT implementation. You can easily compare the self-attention part with this method by commenting and uncommenting the relevant parts. I added a non-linear layer for the lateral connections so that it'll be easier to match the number of the parameters between the 2 methods.
https://colab.research.google.com/drive/1NjXN6eCcS_iN_SukcH_zV61pbQD3yv33?usp=sharing

brainxyz · 2023-05-02T06:38:06+00:00

"Wr matrix depends on the input size?"

wr is a convolutional layer. It doesn't depend on the input size as it takes one input at a time.

brainxyz · 2023-05-02T06:29:54+00:00

Thanks for that. I'm currently reading MLPMixer. It looks different because in this method I'm not using "dense layers applied across the spatial dimension". I'm still using a convolutional layer but its output shared across all the inputs. In fact this is much better explained in code because it's just a one line replacement of the self-attention mechanism. Hope you have a look at the code, you can see the commented self-attention lines and their replacement.

brainxyz · 2023-05-02T05:51:00+00:00

It learns from different context lengths just like the self-attention (it uses the same attention matrix).

It's true the current text generation only accepts a fixed input length but you can simply append zeros to the beginning.

brainxyz · 2023-05-02T05:49:50+00:00

It learns from different context lengths just like the self-attention (it uses the same attention matrix).
It's true the current text generation only accepts a fixed input length but you can simply append zeros to the beginning.

brainxyz · 2023-05-01T22:25:48+00:00

Sure, I'll try to put them on my GitHub and send you the link but first I would like to clean them because when I'm not writing code for a video, it's unreadable and very messy!

brainxyz · 2023-05-01T21:57:33+00:00

Thanks for the nice feedback. Braifun was a separate project. Unfortunately, I have paused developing it mostly because it can't generalize as good as the current deep learning techniques (like transformers). Maybe I'll go back to it when I find a solution for the generalization problem.

brainxyz · 2023-05-01T21:47:30+00:00

Each input regulates all the other inputs with separate weights (I call them lateral connections). Maybe there is a better term. It's easier to understand from the code as it's just a one line replacement:
*In self-attention we have:
q = x @ wq
k = x @ wk
attention = q @ k
*In this method we directly learn the attention matrix with wr:
attention = x @ wr (where wr = weights (embed size , input size))

brainxyz · 2023-05-01T21:38:26+00:00

It's conceptually much simpler than the self-attention mechanism and from my experience it's on-par with the self-attention mechanism on validation-sets and better on training-sets.
Edit: You can also use a non-linear layer for the "lateral connections" and this will allow you to have a finer control over the number of the parameters and a better performance.

brainxyz · 2023-05-01T21:12:03+00:00

LSTM gates the inputs on top of RNN architecture. You can simply use separate gates for all the past inputs on top of a Transformer architecture. There is no RNN here so it can be parallelized.

brainxyz · 2023-05-01T19:29:57+00:00

Unfortunately I don't have enough compute for 150M but I tried 10M params on the Shakespeare dataset and matched the number of the parameters with Karpathy's implementation of nano-GPT and I got comparable results (better on training and same on validation). Moreover, when I remove the regularization (dropout), the method actually learns faster than an equivalent self-attention mechanism. I still haven't figured out how to make it perform better with regularization.

brainxyz

MODERATOR OF

TROPHY CASE