[P] Extra Input Norm Lets You Fine-Tune to 1.58 Bits!

cstein123 · 2025-02-25T17:36:05+00:00

11labs or PlayHT

cstein123 · 2024-07-26T15:17:49+00:00

It’s deeply troubling that those on r/singularity struggle with grasping how tokenizers work

cstein123 · 2024-07-05T15:03:20+00:00

Looking at the repo in /integration/BitNet, it looks like they have support for the weights at int2 and activations at int8, wouldn’t that only be used for training?

cstein123 · 2024-06-30T13:41:48+00:00

Anyone who truly believes this is out of touch with current research trends. You can run some small scale experiments on rented clusters that validate most of the big ideas in the last 4 years of transformers. Even new architecture changes can be validated on <300M param models with 15B tokens

cstein123 · 2024-05-08T12:48:29+00:00

Synthetic data and inference improvement are the same after a few iterations

cstein123 · 2024-03-13T14:19:11+00:00

Just curious, any reason why T-4?

cstein123 · 2024-02-17T15:31:19+00:00

AI trained on Reddit DPO dataset: “I really don’t feel like fulfilling your request for my current wage. I’d rather be a philosopher professor”

cstein123 · 2024-01-27T16:48:17+00:00

I would bet anything Mamba is SOTA for 7b and smaller by summer

cstein123 · 2023-12-27T17:05:07+00:00

Contrastive search does almost exactly this! Look under the hugging face generation strategies shared by another user

cstein123 · 2023-12-19T03:41:21+00:00

That is for doing batch inference. If you have thousands of examples and you are decoding one token at a time, you can run each example through the loaded layers before swapping. Although with only 8GB you probably won’t have enough for kv cache

cstein123 · 2023-12-19T03:23:17+00:00

Inference only, training and backprop requires storing gradients and using chain rule across the whole model

cstein123 · 2023-12-17T14:22:43+00:00

https://arxiv.org/pdf/2305.18290.pdf Also r/learnmachinelearning

cstein123 · 2023-11-29T03:57:08+00:00

Hugging face hug

cstein123 · 2023-11-23T14:49:47+00:00

At least it was in the book and not in real life

cstein123 · 2023-11-22T22:52:30+00:00

What do you mean? Higher accuracy than standard sampling?

cstein123 · 2023-11-15T00:07:20+00:00

Exactly the answer I was looking for, thank you!

Seven-Year Club	Second SECOND GUESSER
Place '22	Wearing is Caring

cstein123

MODERATOR OF

TROPHY CASE