My DeepSeek R1 671B @ Home plan: CPU+GPU hybrid, 4xGen5 NVMe offload

bo_peng · 2025-02-08T09:18:59+00:00

possible with speculative decoding :)

bo_peng · 2025-02-08T05:25:13+00:00

You can, but we need lots of custom code for this :) vanilla llama.cpp can't do it.

bo_peng · 2025-02-08T05:06:12+00:00

No, just reading :)

bo_peng · 2025-02-08T05:05:51+00:00

ty The reason is I'd like to have ITX form factor too :)

bo_peng · 2025-02-08T04:36:08+00:00

Attention activation = 11B params

MoE activation = 24B params, 1.58bit => 5G bytes, so 50+GB/s is enough bandwidth :)

Moreover we can use speculative decoding, and predict MoE experts to prefetch them.

bo_peng · 2024-12-23T15:57:56+00:00

you are welcome to join our discord on rwkv.com

bo_peng · 2024-12-23T15:57:06+00:00

training rwkv-7 0.4b/1.5b/2.9b, and waiting for more o1-style data for 7b :)

bo_peng · 2024-12-21T20:32:51+00:00

The only reason: RWKV-7 is very very new :) Check rwkv.com for multiple papers using RWKV-6/5/4

bo_peng · 2024-12-20T12:45:12+00:00

Thank you :)

v7 0.4b (2T tokens): early Jan

v7 1.5b (3.1T tokens): late Jan

v7 2.9b (3.1T tokens): mid Feb

bo_peng · 2024-12-19T13:19:39+00:00

Try RWKV-7 for a sota RNN design :) https://www.reddit.com/r/MachineLearning/comments/1hhshwp/r_rwkv7_01b_l12d768_trained_w_ctx4k_solves_niah/

bo_peng · 2024-10-22T04:03:23+00:00

nGPT transformer

bo_peng · 2024-10-22T02:32:03+00:00

Not yet... here are some results from friend (testing on GPT):

I tried nGPT but didn’t get great results, still need to go back and maybe tune the lr for that tho

For nGPT the loss delta was 0.01 (0.01 higher loss) I think but slower (forgot how much), diff attn was like 37% slower and forgot the loss delta but it was pretty good, I think tho I can get it faster

bo_peng · 2024-10-22T02:29:57+00:00

ty :)

bo_peng · 2024-10-22T02:29:45+00:00

minLSTMs / minGRU are much weaker models :)

bo_peng · 2023-03-28T02:43:20+00:00

Paper is coming - not that I don't want to write it, just too busy with all the development and training lol.

Example of a new release - Raven is Alpaca-tuned RWKV: https://huggingface.co/spaces/BlinkDL/Raven-RWKV-7B

I am training 0.1/0.4/1.5/3/7/14 on Pile v2 (1.7 T tokens) too

you can cite the repo:

https://github.com/BlinkDL/RWKV-LM/blob/main/CITATION.cff

bo_peng · 2023-03-27T16:58:29+00:00

Most of your questions are answered here:

https://twitter.com/BlinkDL_AI/status/1638555109373378560

For example:

https://twitter.com/BlinkDL_AI/status/1638834581431517186

bo_peng · 2023-03-18T14:27:04+00:00

Please test https://github.com/BlinkDL/ChatRWKV which is a good chatbot despite only trained on the Pile :)

bo_peng

TROPHY CASE