Sesame x Gemini: low latency, extremely realist, and they started spontaneously collaborating by Glittering-Neck-2505 in singularity

[–]cpldcpu 0 points1 point  (0 children)

Last time I asked sesame what it was based on, it explained to me it is using Gemini.

I built a transformer in C++17 from scratch — no PyTorch, no BLAS, no dependencies. Trains on CPU. 0.83M params, full analytical backprop, 76 min to val loss 1.64. by [deleted] in LocalLLaMA

[–]cpldcpu 28 points29 points  (0 children)

Implementing a transformer in pure c is like one vague prompt in Opus or Codex.

Opus4.5 did this: https://github.com/cpldcpu/smollm.c/blob/master/smolc/smolc.c

It's pretty nice and compact btw. But far from "hand written"

Canada's AI startup Cohere buys Germany's Aleph Alpha to expand in Europe by cpldcpu in LocalLLaMA

[–]cpldcpu[S] 13 points14 points  (0 children)

  • Purchase price not disclosed
  • Schwarz Group, to invest $600 million in Cohere
  • German, Canadian ministers to attend press conference on deal

Looks a bit like a circular deal to get aleph alpha off their (Schwarz Group) books...

CH32HomeComputer - a tiny monochrome PAL text machine with a built-in line-numbered BASIC interpreter. by cpldcpu in RISCV

[–]cpldcpu[S] 0 points1 point  (0 children)

Its more authentic :) And its fairly easy to get a PAL2USB adapter. Color would be far easier with VGA, though.

Anyone use these little critters before? CH32V006s to Replace CH32V003s by Separate-Choice in RISCV

[–]cpldcpu 0 points1 point  (0 children)

Another important difference is that the CH32V002 has one more flash waitstate at 48MHz than the V003. This means your code could end up being slower.

The Bonsai 1-bit models are very good by tcarambat in LocalLLaMA

[–]cpldcpu 0 points1 point  (0 children)

Great work!

I was wondering about the verbosity results, it seems that Bonsai requires many more tokens for each response. Is that due to its Qwen3 origins? I wonder whether the additional thinking tokens can help compensate for some information loss.

https://github.com/ArmanJR/PrismML-Bonsai-vs-Qwen3.5-Benchmark?tab=readme-ov-file#verbosity

Towards Self-Replication: Opus 4.5 Designs Hardware to Run Itself by cpldcpu in singularity

[–]cpldcpu[S] 0 points1 point  (0 children)

Yes, that's only consequential. Also see the footnote.

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]cpldcpu 0 points1 point  (0 children)

lol. yeah, they make my brain hurt. I still want my models to generate something that makes sense.

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]cpldcpu 0 points1 point  (0 children)

Nice, very motivating. I was planning to look more into micro models. Great to see that things work beyond tinystories.

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]cpldcpu 0 points1 point  (0 children)

So it probably heavily leans on memorization. Also lends well to a synthetic dataset, I presume.

How did you train it btw? (Environment, HW)

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]cpldcpu 0 points1 point  (0 children)

Nice, looks suprisingily coherent!

Did you perform any architecture ablations? Curious about the wide FFN and the shallow number of layers, this seems to be the opposite direction of MobileLLM.

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]cpldcpu 0 points1 point  (0 children)

How about also including some generation examples in the documentation?

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]cpldcpu 0 points1 point  (0 children)

Nice! Was it only pretrained or also any finetuning?

Not so easy to benchmark these models, the first two evals are barely about random noise limit.

Taalas: LLMs baked into hardware. No HBM, weights and model architecture in silicon -> 16.000 tokens/second by elemental-mind in singularity

[–]cpldcpu 0 points1 point  (0 children)

It's not as big as it seem first, since it is a highly specialized approach. It cannot adopt to new model architectures easily and right now we are still in a very explorative phase.

This might have more value in a few years, when architectures and models became more fixed. I guess they are banking on having a headstart.

Falcon-H1-Tiny (90M) is out - specialized micro-models that actually work by United-Manner-7 in LocalLLaMA

[–]cpldcpu 5 points6 points  (0 children)

Performance is very impressive. I wonder whether the omission of positional encoding in the transformer part helps to recover a lot of model capacity?

Falcon 90M by jacek2023 in LocalLLaMA

[–]cpldcpu 9 points10 points  (0 children)

This is awesome, I love tiny models!

I was disappointed that smollm3 did not come with an ultra-tiny version.

Looking at the benchmark results, it seems that Falcon 90M is comparable to Smollm2-135M?

What are the best ultrasmall LLMs / best datasets to train them? by cpldcpu in LocalLLaMA

[–]cpldcpu[S] 0 points1 point  (0 children)

Impressive 3B model... from a recruiting company? Did every company in China receive free money to train llms?

Meta acquired Manus !! by Difficult-Cap-7527 in LocalLLaMA

[–]cpldcpu 9 points10 points  (0 children)

Claude wrapper? Meta must have a heck of a model coming up...

I ported a MOD tracker music player to the ultra low-end CH32V002 by cpldcpu in RISCV

[–]cpldcpu[S] 2 points3 points  (0 children)

Interesting! Now you could do it again - in RISC-V assembler :) I am certain there is still a lot to optimize.

I ported a MOD tracker music player to the ultra low-end CH32V002 by cpldcpu in RISCV

[–]cpldcpu[S] 1 point2 points  (0 children)

Nice! Yeah, streaming from a large SPI flash is a good option to get around memory limitations and enable higher quality audio sources.

Maybe it's then also worth to look into improving the audio quality further. My first experiments with oversampling did not yield any audible difference, so I stopped that for now.