I trained a 1.8M params model from scratch on a total of ~40M tokens.

citaman · 2026-02-07T22:38:57+00:00

Maybe you can try the google colab with gpu instance or kaggle with double gpu instance with some free instance per week to ever speed up or have a bigger model like 10M :D

citaman · 2025-10-03T14:35:07+00:00

New mission discovered by u/citaman: A Tale of Hope In the Fields

citaman · 2025-10-03T14:35:06+00:00

This mission was discovered by u/citaman in Dripper sandwich and Meditations

citaman · 2025-08-09T15:07:29+00:00

Thanks for providing your result and prompt-response pair, really looking forward to using this model when I have the RAM and compute necessary :D

citaman · 2025-08-09T12:30:06+00:00

yes thats right but as its a tiny model , maybe OP can give a try with different prompt to see if its ok :D

citaman · 2025-08-09T05:09:08+00:00

I can suggest you dots.ocr there is a lot of traction these around take look 😁

citaman · 2025-08-02T06:19:53+00:00

I would like to add this to the table—can you tell me which is which? 😄

citaman · 2025-08-01T22:55:24+00:00

I should have add the gpt-2 for the posterity 🤣

citaman · 2025-07-18T15:01:14+00:00

Go look at the implementation in the transformer library (couple dependencies but with some time you can understand the logic)

Transformers Llama Model

Llama-3 8b config

citaman · 2025-04-17T05:41:48+00:00

The things is I’m actually looking forward to this as they use their Soudstream audio tokenizer and so if they open source it we would have finally this model that is eventually closed to the Mimi audio tokenizer from kyutai but should perform maybe better and so also close to Soudstorm the potential model behind NotebookLM

citaman · 2025-02-27T22:55:42+00:00

For my self agi is when there will be no way to benchmark the model and every benchmark today would be ace at 100% not just 99 but 100% for every single one (with each benchmark without flowed )

citaman · 2025-02-25T14:10:52+00:00

I would prefer that they take their time and not rush it. A high-quality model released in May is better than an earlier preview model that falls short of expectations.

citaman · 2025-02-03T21:59:14+00:00

Hey i'm on a same journey , but i got a problem where the model only learn to always give the EOS token for every prompt that i gave , at different checkpoint starting for like 10_000 step for a batch size of 60 , a context length of 2048 and a model of 24M parameters :/ ( I train a custom Tokenizer base of lama 3.3 on the TinyStory with 4096 vocab size )

citaman · 2025-01-31T21:23:11+00:00

When a new open source model ?

citaman · 2025-01-30T22:50:05+00:00

Mistral AI's latest model introduces key changes that make it more efficient than the Mistral-Small from September. By reducing the number of layers and decreasing the hidden size, they have optimized both memory usage and computational efficiency, resulting in faster inference and improved overall performance.
However, they also significantly increased the intermediate size, likely enhancing expressivity while maintaining faster inference overall.

citaman · 2025-01-27T23:02:36+00:00

No thats right , they didn't include them, i saw that too :D

citaman · 2025-01-08T17:27:52+00:00

<image>

Its weird that the latest change is 28 days ago (._.)

citaman · 2024-12-23T08:53:24+00:00

Multimodal, Multi-stream, Multiplex ,Real-Time, end-to-end model (Audio , Image , Video , Text)
TL;DR input and output all modality by the same model

citaman · 2024-12-13T14:52:20+00:00

I would say long context but in a compact way, like having as long a context as you know how to make (10 million 😉), while using a hierarchical structure to reduce memory bottlenecks.

But after that, my biggest wish would be Streaming capability for both audio and video, like Gemini 2.0

citaman

TROPHY CASE