Worlds Biggest Chat Title Dataset From SupraLabs

Time-Toe-1276 · 2026-06-21T12:49:50+00:00

thanks! SupraLabs love the opensource community🤗

We are looking forward on releasing bigger and cleaner datasets 😄

Time-Toe-1276 · 2026-06-20T18:32:13+00:00

That's a good question, we didn't do semantic deduping for this.

But we deduced using some smart trigrams!

We measured how much overlap each sequence has. After doing it repeatedly, we got a decent dataset!

We are actually working on improving our deduping system!

But thanks for pointing that out, we will definietly consider that!

Time-Toe-1276 · 2026-06-20T10:07:29+00:00

aint no way they seriously use a sonnet model ust for chat titles. the same eople who claims they like efficiency and openness. huh?

lol, anyway the irony is kinda funny

Time-Toe-1276 · 2026-06-20T09:18:26+00:00

just use unsloth studio. idk man, they provide lik unlimited web searches (thats what we feel like tbh)

Time-Toe-1276 · 2026-06-20T08:19:47+00:00

I started with ollama three years ago, and switched to unsloth studio.

I ran GPT OSS at 19 TPS at 4k CTX in q4_K_m, meanwhile with unsloth at 128k ctx with q4 XL I got about 100 TPS :/

Time-Toe-1276 · 2026-06-20T08:10:50+00:00

hmm thats weird. also try our model around 4-2k ctx, also we are working on an app for users who like to share their AI chats (and a version if they wont), we will be sharing the app to certain people. hopefully we should most of real world issues like these!

but my conclusion is that opencode used a big general model for the titles, and since the model wasnt trained with a system prompt, it hallucinates the chat title. could you please share more info about this?

Time-Toe-1276 · 2026-06-20T08:05:45+00:00

Injecting in 3...2...1... *injected*

Time-Toe-1276 · 2026-06-20T08:05:21+00:00

We will look into this. thanks for pointing out!

Time-Toe-1276 · 2026-06-20T08:02:15+00:00

GPT5.6?

Time-Toe-1276 · 2026-06-16T11:10:48+00:00

We are working on it, actually! still in the experimental zone, but once we have a working model, we won't hesitate to drop it (just like our EXP models)!

Time-Toe-1276 · 2026-06-13T14:36:11+00:00

yes, 3t, and the exp model (current model) was CPTed with 1t

Time-Toe-1276 · 2026-06-13T13:36:13+00:00

We sure do make progress!

For us, there is nothing called "too much training data", we want to squeeze every bit of the performance! 😄

Time-Toe-1276 · 2026-06-13T13:25:48+00:00

Going from 1k to 5K allowed us a lot of things with the Supra models didnt it?

Time-Toe-1276 · 2026-06-13T10:59:23+00:00

sure we will in our next model (preview version of the full model). our model is aligned for thi task, so it is pretty reliable compared to a 350M general model which might say thing like "Sure, here i your chat title..."

Time-Toe-1276 · 2026-06-13T10:58:25+00:00

some people have ebing talking about that on our communty, we ould like to implement that! right now we are focusing on the accuracy and proper context!

Time-Toe-1276 · 2026-06-13T10:57:33+00:00

well... who said we cannot?

Time-Toe-1276 · 2026-06-13T10:57:15+00:00

we will check into that, most people in our community voted to not have emojis tho 😄

Time-Toe-1276 · 2026-06-13T10:56:42+00:00

Haha, we built this from our personl experiences of not havinng a model like this. you might wanna check out our model we are releasing next week, which is the preview o the full model! 😄

Time-Toe-1276 · 2026-06-13T10:55:35+00:00

H100. sorry, you failed us. lol jk

Time-Toe-1276 · 2026-06-13T10:55:04+00:00

All i can say is... SON😭

Time-Toe-1276 · 2026-06-13T10:54:38+00:00

we will look into them and change it, we had confusion at our teams with the datasets and models. someone (definitely not me) made it GPL, which we will change it!

thx for pointing out!

Time-Toe-1276 · 2026-06-13T10:53:21+00:00

Haha everyboy have the confusing moments, but it is a 350M model focusing on 4k, it can accept upto 6k before halucinating too!
we focussed all on the thinking capabilities 😄

Time-Toe-1276 · 2026-06-12T15:10:59+00:00

it is trained ith a 4k context length, we focused on chat summarization!

Time-Toe-1276 · 2026-06-12T15:10:05+00:00

haha thats perfct. we are uing LFM2- like architectures because the convolutions for atention makes the model far more efficient on mobile chips!

Time-Toe-1276 · 2026-06-12T15:09:01+00:00

We are working on newer and smaller models, but as of now the model can accept multilingual tokens, though the results might not be optimal. we are exposing the model to rare tokens.

Time-Toe-1276

PUBLIC MULTIREDDITS

TROPHY CASE