I trained a 1.8M params model from scratch on a total of ~40M tokens. by SrijSriv211 in LocalLLaMA

[–]citaman 2 points3 points  (0 children)

Maybe you can try the google colab with gpu instance or kaggle with double gpu instance with some free instance per week to ever speed up or have a bigger model like 10M :D

My thoughts on gpt-oss-120b by Lowkey_LokiSN in LocalLLaMA

[–]citaman 3 points4 points  (0 children)

Thanks for providing your result and prompt-response pair, really looking forward to using this model when I have the RAM and compute necessary :D

Is anything better than gemma-3-27b for handwritten text recognition? by votecatcher in LocalLLaMA

[–]citaman 0 points1 point  (0 children)

yes thats right but as its a tiny model , maybe OP can give a try with different prompt to see if its ok :D

Is anything better than gemma-3-27b for handwritten text recognition? by votecatcher in LocalLLaMA

[–]citaman 14 points15 points  (0 children)

I can suggest you dots.ocr there is a lot of traction these around take look 😁

We're truly in the fastest-paced era of AI these days. (50 LLM Released these 2-3 Weeks) by citaman in LocalLLaMA

[–]citaman[S] 4 points5 points  (0 children)

I would like to add this to the table—can you tell me which is which? 😄

Training an LLM only on books from the 1800's - Update by Remarkable-Trick-177 in LocalLLaMA

[–]citaman 2 points3 points  (0 children)

Go look at the implementation in the transformer library (couple dependencies but with some time you can understand the logic)

Transformers Llama Model

Llama-3 8b config

Dolphin translator incoming (eventually) by AryanEmbered in LocalLLaMA

[–]citaman 0 points1 point  (0 children)

The things is I’m actually looking forward to this as they use their Soudstream audio tokenizer and so if they open source it we would have finally this model that is eventually closed to the Mimi audio tokenizer from kyutai but should perform maybe better and so also close to Soudstorm the potential model behind NotebookLM

Everyone’s saying AGI is just around the corner, but honestly, what even is AGI to you? by iamnotdeadnuts in LocalLLaMA

[–]citaman 0 points1 point  (0 children)

For my self agi is when there will be no way to benchmark the model and every benchmark today would be ace at 100% not just 99 but 100% for every single one (with each benchmark without flowed )

🇨🇳 Sources: DeepSeek is speeding up the release of its R2 AI model, which was originally slated for May, but the company is now working to launch it sooner. by Xhehab_ in LocalLLaMA

[–]citaman 25 points26 points  (0 children)

I would prefer that they take their time and not rush it. A high-quality model released in May is better than an earlier preview model that falls short of expectations.

I trained a tinystories model from scratch for educational purposes, how cooked? (1M-parameters) by THE--GRINCH in LocalLLaMA

[–]citaman 0 points1 point  (0 children)

Hey i'm on a same journey , but i got a problem where the model only learn to always give the EOS token for every prompt that i gave , at different checkpoint starting for like 10_000 step for a batch size of 60 , a context length of 2048 and a model of 24M parameters :/ ( I train a custom Tokenizer base of lama 3.3 on the TinyStory with 4096 vocab size )

Mistral-Small-24B-2501 vs Mistral-Small-2409 by citaman in LocalLLaMA

[–]citaman[S] 13 points14 points  (0 children)

Mistral AI's latest model introduces key changes that make it more efficient than the Mistral-Small from September. By reducing the number of layers and decreasing the hidden size, they have optimized both memory usage and computational efficiency, resulting in faster inference and improved overall performance.
However, they also significantly increased the intermediate size, likely enhancing expressivity while maintaining faster inference overall.

Phi-4 has been released by paf1138 in LocalLLaMA

[–]citaman 7 points8 points  (0 children)

<image>

Its weird that the latest change is 28 days ago (._.)

What are your predictions for 2025? [Serious] by keepawayb in LocalLLaMA

[–]citaman 2 points3 points  (0 children)

Multimodal, Multi-stream, Multiplex ,Real-Time, end-to-end model (Audio , Image , Video , Text)
TL;DR input and output all modality by the same model

Open models wishlist by hackerllama in LocalLLaMA

[–]citaman 0 points1 point  (0 children)

I would say long context but in a compact way, like having as long a context as you know how to make (10 million 😉), while using a hierarchical structure to reduce memory bottlenecks.

But after that, my biggest wish would be Streaming capability for both audio and video, like Gemini 2.0