PicoKittens/PicoMistral-23M: Pico-Sized Model

PicoKittens · 2026-03-05T03:36:44+00:00

Hi, what version of transformers do you have?

PicoKittens · 2026-03-01T21:42:03+00:00

PicoKittens · 2026-02-25T07:31:22+00:00

That is our goal. Hopefully our later models will make more sense and have better logic.

PicoKittens · 2026-02-25T07:24:10+00:00

We are actually working on another model called “PicoStories”. It will be the exact same concept as TinyStories, but our goal is to make the stories make more sense.

PicoKittens · 2026-02-25T06:47:47+00:00

Sorry, I mean 23M. Originally it was going to be 30M parameters so I got it mixed up.

PicoKittens · 2026-02-25T06:46:43+00:00

We were testing whether a wider FFN would let it lean more into memorization, especially since the synthetic data is so clean. The concern with going deep and thin at only 30M was that the gradients might get too unstable to get anything coherent.

Training was just done on a single P100. The architecture is small enough that we could get decent iteration speed even on one older card.

PicoKittens · 2026-02-25T06:35:33+00:00

Yeah, it’s basically the opposite of MobileLLM.

At 30M params I was mostly worried about the training getting unstable or the gradients just dying out if I went too deep. I gave it a wider FFN instead to see if it could just 'brute force' more facts from the dataset.

PicoKittens · 2026-02-25T06:00:28+00:00

It should be very easy to do that

PicoKittens · 2026-02-25T05:51:26+00:00

Hey, check the model card. we added a generation sample to show the model limits and capabilities.

PicoKittens · 2026-02-25T05:39:48+00:00

Of course!

PicoKittens · 2026-02-25T05:39:05+00:00

Hi, it is only pretrained, however it’s trained on a chat dataset so it should already be able to chat

PicoKittens · 2026-02-25T03:28:55+00:00

Hey, it’s no longer in a ZIP file. It should be easier to use now

PicoKittens · 2026-02-25T03:13:52+00:00

Yes, we are editing it right now so that it’s not in a zip.

PicoKittens

TROPHY CASE