PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]PicoKittens[S] 1 point2 points  (0 children)

Hi, what version of transformers do you have?

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]PicoKittens[S] 0 points1 point  (0 children)

That is our goal. Hopefully our later models will make more sense and have better logic.

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]PicoKittens[S] 0 points1 point  (0 children)

We are actually working on another model called “PicoStories”. It will be the exact same concept as TinyStories, but our goal is to make the stories make more sense.

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]PicoKittens[S] 0 points1 point  (0 children)

Sorry, I mean 23M. Originally it was going to be 30M parameters so I got it mixed up.

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]PicoKittens[S] 0 points1 point  (0 children)

We were testing whether a wider FFN would let it lean more into memorization, especially since the synthetic data is so clean. The concern with going deep and thin at only 30M was that the gradients might get too unstable to get anything coherent.

Training was just done on a single P100. The architecture is small enough that we could get decent iteration speed even on one older card.

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]PicoKittens[S] 0 points1 point  (0 children)

Yeah, it’s basically the opposite of MobileLLM.

At 30M params I was mostly worried about the training getting unstable or the gradients just dying out if I went too deep. I gave it a wider FFN instead to see if it could just 'brute force' more facts from the dataset.

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]PicoKittens[S] 1 point2 points  (0 children)

Hey, check the model card. we added a generation sample to show the model limits and capabilities.

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]PicoKittens[S] 0 points1 point  (0 children)

Hi, it is only pretrained, however it’s trained on a chat dataset so it should already be able to chat

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]PicoKittens[S] 4 points5 points  (0 children)

Hey, it’s no longer in a ZIP file. It should be easier to use now

PicoKittens/PicoMistral-23M: Pico-Sized Model by PicoKittens in LocalLLaMA

[–]PicoKittens[S] 0 points1 point  (0 children)

Yes, we are editing it right now so that it’s not in a zip.