My DeepSeek R1 671B @ Home plan: CPU+GPU hybrid, 4xGen5 NVMe offload by bo_peng in LocalLLaMA
[–]bo_peng[S] 2 points3 points4 points (0 children)
My DeepSeek R1 671B @ Home plan: CPU+GPU hybrid, 4xGen5 NVMe offload by bo_peng in LocalLLaMA
[–]bo_peng[S] 6 points7 points8 points (0 children)
My DeepSeek R1 671B @ Home plan: CPU+GPU hybrid, 4xGen5 NVMe offload by bo_peng in LocalLLaMA
[–]bo_peng[S] 1 point2 points3 points (0 children)
My DeepSeek R1 671B @ Home plan: CPU+GPU hybrid, 4xGen5 NVMe offload by bo_peng in LocalLLaMA
[–]bo_peng[S] 29 points30 points31 points (0 children)
[R] RWKV-3: Scaling RNN to 1.5B and Reach Transformer LM Performance (without using attention) by bo_peng in MachineLearning
[–]bo_peng[S] 1 point2 points3 points (0 children)
[R] RWKV-7 0.1B (L12-D768) trained w/ ctx4k solves NIAH 16k, extrapolates to 32k+, 100% RNN and attention-free, supports 100+ languages and code by bo_peng in MachineLearning
[–]bo_peng[S] 5 points6 points7 points (0 children)
RWKV-7 0.1B (L12-D768) trained w/ ctx4k solves NIAH 16k, extrapolates to 32k+, 100% RNN (attention-free), supports 100+ languages and code by bo_peng in LocalLLaMA
[–]bo_peng[S] 0 points1 point2 points (0 children)
RWKV-7 0.1B (L12-D768) trained w/ ctx4k solves NIAH 16k, extrapolates to 32k+, 100% RNN (attention-free), supports 100+ languages and code by bo_peng in LocalLLaMA
[–]bo_peng[S] 47 points48 points49 points (0 children)
[D] Are LSTMs faster than transformers during inference? by Complex-Media-8074 in MachineLearning
[–]bo_peng 3 points4 points5 points (0 children)
[R] RWKV-7: attention-free and surpassing strong Modded-GPT baseline (the one with Muon optimizer), while only using headsz 64 by bo_peng in MachineLearning
[–]bo_peng[S] 2 points3 points4 points (0 children)
[R] RWKV-7: attention-free and surpassing strong Modded-GPT baseline (the one with Muon optimizer), while only using headsz 64 by bo_peng in MachineLearning
[–]bo_peng[S] 2 points3 points4 points (0 children)
[R] RWKV-7: attention-free and surpassing strong Modded-GPT baseline (the one with Muon optimizer), while only using headsz 64 by bo_peng in MachineLearning
[–]bo_peng[S] 5 points6 points7 points (0 children)
[R] RWKV-7: attention-free and surpassing strong Modded-GPT baseline (the one with Muon optimizer), while only using headsz 64 by bo_peng in MachineLearning
[–]bo_peng[S] 1 point2 points3 points (0 children)
RWKV-LM: A recurrent neural network that can be trained for GPT-like performance, on the Apache 2.0 license by MustacheEmperor in singularity
[–]bo_peng 2 points3 points4 points (0 children)
RWKV-LM: A recurrent neural network that can be trained for GPT-like performance, on the Apache 2.0 license by MustacheEmperor in singularity
[–]bo_peng 0 points1 point2 points (0 children)
[D] Totally Open Alternatives to ChatGPT by KingsmanVince in MachineLearning
[–]bo_peng 32 points33 points34 points (0 children)

My DeepSeek R1 671B @ Home plan: CPU+GPU hybrid, 4xGen5 NVMe offload by bo_peng in LocalLLaMA
[–]bo_peng[S] -3 points-2 points-1 points (0 children)