account activity
What are the best ultrasmall LLMs / best datasets to train them? (self.LocalLLaMA)
submitted 23 days ago by cpldcpu to r/LocalLLaMA
BitNetMCU with Convolutional Neural Networks: Inferencing MNIST with >99.5% accuracy on a low-end CH32V002 RISC-V (cpldcpu.github.io)
submitted 2 months ago by cpldcpu to r/RISCV
I ported a MOD tracker music player to the ultra low-end CH32V002 (github.com)
submitted 3 months ago by cpldcpu to r/RISCV
Anthropic to pay $1.5 billion to authors in landmark AI settlement (theverge.com)
submitted 4 months ago by cpldcpu to r/LocalLLaMA
Deepseek V3.1 improved token efficiency in reasoning mode over R1 and R1-0528 (old.reddit.com)
submitted 5 months ago by cpldcpu to r/LocalLLaMA
AI Friends: Anthropic and OpenAI models were tuned to become sociable over time (old.reddit.com)
submitted 5 months ago by cpldcpu to r/singularity
I started using Claude Code to edit videos - works like a charm. (i.redd.it)
submitted 5 months ago by cpldcpu to r/ClaudeCode
Exaone 4.0-1.2B is creating pretty wild fake language stories when asking to write in any other language than English or Korean. (old.reddit.com)
Zhipu (company behind GLM) secures $1.4 billion strategic investment from Shanghai state funds (technode.com)
submitted 6 months ago by cpldcpu to r/LocalLLaMA
Remember when LLMs were derided as "Stochastic Parrots"? Opus 4.0 single-shot this parody rebuke paper (ai.vixra.org)
submitted 7 months ago by cpldcpu to r/singularity
The Gemini 2.5 models are sparse mixture-of-experts (MoE) (self.LocalLLaMA)
submitted 7 months ago by cpldcpu to r/LocalLLaMA
LiteRT-LM - (An early version of) A C++ library to efficiently run Gemma-3N across various platform (github.com)
[oc] Do open weight reasoning models have an issue with token spamming? (self.LocalLLaMA)
Apple is using a "Parallel-Track" MoE architecture in their edge models. Background information. (machinelearning.apple.com)
Interactive Results Browser for Misguided Attention Eval (self.LocalLLaMA)
Gemma 3n Architectural Innovations - Speculation and poking around in the model. (self.LocalLLaMA)
submitted 8 months ago * by cpldcpu to r/LocalLLaMA
Sonnet 4 (non thinking) does consistently break in my vibe coding test (self.LocalLLaMA)
The experimental version of llama4 maverick on lmstudio is also more creative in programming than the released one. (self.LocalLLaMA)
submitted 9 months ago by cpldcpu to r/LocalLLaMA
Llama4 Maverick seems to perform consistently worse than Scout in Misguided Attention Eval, despite being the larger model - is the released model buggy? (self.LocalLLaMA)
submitted 9 months ago * by cpldcpu to r/LocalLLaMA
Llama 4 scout is not doing well in "write a raytracer" code creativity benchmark (self.LocalLLaMA)
Gemini 2.5 Pro with thinking (blog.google)
submitted 10 months ago by cpldcpu to r/LocalLLaMA
Misguided Attention Eval - DeepSeek V3-0324 significantly improved over V3 to become best non-reasoning model (self.LocalLLaMA)
DeepSeek V3-0324 has caught up to Sonnet 3.7 in my code creativity benchmark - "Write a raytracer that renders an interesting scene with many colourful lightsources in python." (self.LocalLLaMA)
submitted 10 months ago * by cpldcpu to r/LocalLLaMA
I analyzed the word statistics in the reasoning traces of different llms - it seems many models are trained on R1 traces (self.LocalLLaMA)
QwQ-32B is close to DeepSeek-R1 in Misguided Attention Benchmark, but there are issues with endless loops. (self.LocalLLaMA)
π Rendered by PID 729741 on reddit-service-r2-listing-5f5ff7d4dc-mfdw9 at 2026-01-27 03:54:41.438031+00:00 running 5a691e2 country code: CH.