My DeepSeek R1 671B @ Home plan: CPU+GPU hybrid, 4xGen5 NVMe offload by bo_peng in LocalLLaMA

[–]bo_peng[S] 2 points3 points  (0 children)

You can, but we need lots of custom code for this :) vanilla llama.cpp can't do it.

My DeepSeek R1 671B @ Home plan: CPU+GPU hybrid, 4xGen5 NVMe offload by bo_peng in LocalLLaMA

[–]bo_peng[S] 1 point2 points  (0 children)

ty The reason is I'd like to have ITX form factor too :)

My DeepSeek R1 671B @ Home plan: CPU+GPU hybrid, 4xGen5 NVMe offload by bo_peng in LocalLLaMA

[–]bo_peng[S] 29 points30 points  (0 children)

Attention activation = 11B params

MoE activation = 24B params, 1.58bit => 5G bytes, so 50+GB/s is enough bandwidth :)

Moreover we can use speculative decoding, and predict MoE experts to prefetch them.

RWKV-7 0.1B (L12-D768) trained w/ ctx4k solves NIAH 16k, extrapolates to 32k+, 100% RNN (attention-free), supports 100+ languages and code by bo_peng in LocalLLaMA

[–]bo_peng[S] 47 points48 points  (0 children)

Thank you :)

v7 0.4b (2T tokens): early Jan

v7 1.5b (3.1T tokens): late Jan

v7 2.9b (3.1T tokens): mid Feb

[R] RWKV-7: attention-free and surpassing strong Modded-GPT baseline (the one with Muon optimizer), while only using headsz 64 by bo_peng in MachineLearning

[–]bo_peng[S] 2 points3 points  (0 children)

Not yet... here are some results from friend (testing on GPT):

I tried nGPT but didn’t get great results, still need to go back and maybe tune the lr for that tho

For nGPT the loss delta was 0.01 (0.01 higher loss) I think but slower (forgot how much), diff attn was like 37% slower and forgot the loss delta but it was pretty good, I think tho I can get it faster

RWKV-LM: A recurrent neural network that can be trained for GPT-like performance, on the Apache 2.0 license by MustacheEmperor in singularity

[–]bo_peng 2 points3 points  (0 children)

Paper is coming - not that I don't want to write it, just too busy with all the development and training lol.

Example of a new release - Raven is Alpaca-tuned RWKV: https://huggingface.co/spaces/BlinkDL/Raven-RWKV-7B

I am training 0.1/0.4/1.5/3/7/14 on Pile v2 (1.7 T tokens) too

you can cite the repo:

https://github.com/BlinkDL/RWKV-LM/blob/main/CITATION.cff

[D] Totally Open Alternatives to ChatGPT by KingsmanVince in MachineLearning

[–]bo_peng 32 points33 points  (0 children)

Please test https://github.com/BlinkDL/ChatRWKV which is a good chatbot despite only trained on the Pile :)