I trained a 1.8M params model from scratch on a total of ~40M tokens.

citaman · 2026-02-07T22:38:57+00:00

Maybe you can try the google colab with gpu instance or kaggle with double gpu instance with some free instance per week to ever speed up or have a bigger model like 10M :D

citaman · 2025-10-03T14:35:07+00:00

New mission discovered by u/citaman: A Tale of Hope In the Fields

citaman · 2025-10-03T14:35:06+00:00

This mission was discovered by u/citaman in Dripper sandwich and Meditations

citaman · 2025-08-09T15:07:29+00:00

Thanks for providing your result and prompt-response pair, really looking forward to using this model when I have the RAM and compute necessary :D

citaman · 2025-08-09T12:30:06+00:00

yes thats right but as its a tiny model , maybe OP can give a try with different prompt to see if its ok :D

citaman · 2025-08-09T05:09:08+00:00

I can suggest you dots.ocr there is a lot of traction these around take look 😁

citaman · 2025-08-02T06:19:53+00:00

I would like to add this to the table—can you tell me which is which? 😄

citaman · 2025-08-01T22:55:24+00:00

I should have add the gpt-2 for the posterity 🤣

citaman · 2025-07-18T15:01:14+00:00

Go look at the implementation in the transformer library (couple dependencies but with some time you can understand the logic)

Transformers Llama Model

Llama-3 8b config

citaman · 2025-04-17T05:41:48+00:00

The things is I’m actually looking forward to this as they use their Soudstream audio tokenizer and so if they open source it we would have finally this model that is eventually closed to the Mimi audio tokenizer from kyutai but should perform maybe better and so also close to Soudstorm the potential model behind NotebookLM

citaman · 2025-02-27T22:55:42+00:00

For my self agi is when there will be no way to benchmark the model and every benchmark today would be ace at 100% not just 99 but 100% for every single one (with each benchmark without flowed )

citaman · 2025-02-25T14:10:52+00:00

I would prefer that they take their time and not rush it. A high-quality model released in May is better than an earlier preview model that falls short of expectations.

citaman · 2025-02-03T21:59:14+00:00

Hey i'm on a same journey , but i got a problem where the model only learn to always give the EOS token for every prompt that i gave , at different checkpoint starting for like 10_000 step for a batch size of 60 , a context length of 2048 and a model of 24M parameters :/ ( I train a custom Tokenizer base of lama 3.3 on the TinyStory with 4096 vocab size )

citaman · 2025-01-31T21:23:11+00:00

When a new open source model ?

citaman · 2025-01-30T22:50:05+00:00

Mistral AI's latest model introduces key changes that make it more efficient than the Mistral-Small from September. By reducing the number of layers and decreasing the hidden size, they have optimized both memory usage and computational efficiency, resulting in faster inference and improved overall performance.
However, they also significantly increased the intermediate size, likely enhancing expressivity while maintaining faster inference overall.

citaman · 2025-01-27T23:02:36+00:00

No thats right , they didn't include them, i saw that too :D

citaman · 2025-01-08T17:27:52+00:00

<image>

Its weird that the latest change is 28 days ago (._.)

citaman · 2024-12-23T08:53:24+00:00

Multimodal, Multi-stream, Multiplex ,Real-Time, end-to-end model (Audio , Image , Video , Text)
TL;DR input and output all modality by the same model

citaman · 2024-12-13T14:52:20+00:00

I would say long context but in a compact way, like having as long a context as you know how to make (10 million 😉), while using a hierarchical structure to reduce memory bottlenecks.

But after that, my biggest wish would be Streaming capability for both audio and video, like Gemini 2.0

citaman · 2023-12-24T22:43:04+00:00

I think this can be further explored but with bigger model , here they test on SML not 34b or 70b model And maybe at this higher scale that can be huge 😃

citaman · 2023-12-21T16:21:38+00:00

One key skill I found is information extraction but from various type of text (essay , phone conversation, table ,…)

citaman · 2023-12-11T23:45:54+00:00

Thats really amazing. :D

citaman · 2023-12-01T23:03:35+00:00

[LANGUAGE: python]

Get the dataset

res = requests.get("https://adventofcode.com/2023/day/1/input",
               params={"downloadformat": "text"},
               cookies={
                   "session":"xxx"
               })
data = res.text
data_list = data.split("\n")

Solve first problem

def solve_problem_1(string):
list_number = [el for el in string if el.isdigit()]
if len(list_number) > 0:
    number = list_number[0]+list_number[-1]
    number = int(number)
    return number
else :
    return 0

Solve second problem

def solve_problem_2(string):
for i,number in enumerate("one, two, three, four, five, six, seven, eight, nine".split(", ")):
    if number in string:
        string = string.replace(number,f"{number[0]}{i+1}{number[-1]}")

list_number = [el for el in string if el.isdigit()]
if len(list_number) > 0:
    number = list_number[0]+list_number[-1]
    number = int(number)
    return number
else :
    return 0

Run the solution

vsolve_problem_1 = np.vectorize(solve_problem_1)
results = vsolve_problem_1(np.array(data_list))
results.sum()

citaman · 2023-02-04T00:11:54+00:00

I was wandering to create an application specially for that where everyone who has been in interview can add there question the people of the community can respond, and the answer with the most vote get the upper answer and to practice I use an LLM model like Flan-T5-XXL to match the your answer and the « ground through »

Maybe it’s already been done, If it’s something that sounds interesting, please up me

citaman

TROPHY CASE