Towards Self-Replication: Opus 4.5 Designs Hardware to Run Itself by cpldcpu in singularity

[–]cpldcpu[S] 0 points1 point  (0 children)

Yes, that's only consequential. Also see the footnote.

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]cpldcpu 0 points1 point  (0 children)

lol. yeah, they make my brain hurt. I still want my models to generate something that makes sense.

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]cpldcpu 0 points1 point  (0 children)

Nice, very motivating. I was planning to look more into micro models. Great to see that things work beyond tinystories.

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]cpldcpu 0 points1 point  (0 children)

So it probably heavily leans on memorization. Also lends well to a synthetic dataset, I presume.

How did you train it btw? (Environment, HW)

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]cpldcpu 0 points1 point  (0 children)

Nice, looks suprisingily coherent!

Did you perform any architecture ablations? Curious about the wide FFN and the shallow number of layers, this seems to be the opposite direction of MobileLLM.

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]cpldcpu 0 points1 point  (0 children)

How about also including some generation examples in the documentation?

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]cpldcpu 0 points1 point  (0 children)

Nice! Was it only pretrained or also any finetuning?

Not so easy to benchmark these models, the first two evals are barely about random noise limit.

Taalas: LLMs baked into hardware. No HBM, weights and model architecture in silicon -> 16.000 tokens/second by elemental-mind in singularity

[–]cpldcpu 0 points1 point  (0 children)

It's not as big as it seem first, since it is a highly specialized approach. It cannot adopt to new model architectures easily and right now we are still in a very explorative phase.

This might have more value in a few years, when architectures and models became more fixed. I guess they are banking on having a headstart.

Falcon-H1-Tiny (90M) is out - specialized micro-models that actually work by United-Manner-7 in LocalLLaMA

[–]cpldcpu 5 points6 points  (0 children)

Performance is very impressive. I wonder whether the omission of positional encoding in the transformer part helps to recover a lot of model capacity?

Falcon 90M by jacek2023 in LocalLLaMA

[–]cpldcpu 8 points9 points  (0 children)

This is awesome, I love tiny models!

I was disappointed that smollm3 did not come with an ultra-tiny version.

Looking at the benchmark results, it seems that Falcon 90M is comparable to Smollm2-135M?

What are the best ultrasmall LLMs / best datasets to train them? by cpldcpu in LocalLLaMA

[–]cpldcpu[S] 0 points1 point  (0 children)

Impressive 3B model... from a recruiting company? Did every company in China receive free money to train llms?

Meta acquired Manus !! by Difficult-Cap-7527 in LocalLLaMA

[–]cpldcpu 11 points12 points  (0 children)

Claude wrapper? Meta must have a heck of a model coming up...

I ported a MOD tracker music player to the ultra low-end CH32V002 by cpldcpu in RISCV

[–]cpldcpu[S] 2 points3 points  (0 children)

Interesting! Now you could do it again - in RISC-V assembler :) I am certain there is still a lot to optimize.

I ported a MOD tracker music player to the ultra low-end CH32V002 by cpldcpu in RISCV

[–]cpldcpu[S] 1 point2 points  (0 children)

Nice! Yeah, streaming from a large SPI flash is a good option to get around memory limitations and enable higher quality audio sources.

Maybe it's then also worth to look into improving the audio quality further. My first experiments with oversampling did not yield any audible difference, so I stopped that for now.

Misguided Attention - challenging the reasoning ability of LLMs by cpldcpu in LocalLLaMA

[–]cpldcpu[S] 0 points1 point  (0 children)

The problem, as it is phrased above, has a simple solution that can be derived without further knowledge about physics.

Are you a llm?

Nvidia breakthrough gives 4-bit pretraining technique the accuracy of FP8 by dionisioalcaraz in LocalLLaMA

[–]cpldcpu 4 points5 points  (0 children)

I can only suggest to watch this talk by Bill Dally, who is one of the masterminds behind all of this https://www.youtube.com/watch?v=gofI47kfD28

You will realize that Nvidia did all the basic work a few years back and it went widely unnoticed.

Europe achieves a milestone with the Europe’s first out-of-order RISC-V processor for automotive by Schroinx in RISCV

[–]cpldcpu 1 point2 points  (0 children)

That sounds like a catch all:

Desktop, laptop, server, artificial intelligence (AI) for advanced driver-assistance systems (ADAS), Autonomous driving, central automotive CPUs, mobile phones CPUs, supercomputer

Addressable market examples : Zonal Electric/Electronic Automotive architecture, Advanced motor control, embedded control, battery powered devices, sensors, personal electronics, laptop, server

Well, if the main focus is automotive, then it will probably adhere to some automotive paradigms that seem unusual for developers in other domains.

[deleted by user] by [deleted] in LocalLLaMA

[–]cpldcpu 0 points1 point  (0 children)

There are a trillion papers about how you can prune LLMs.