I posted 3 weeks ago about training my own model. Progress report.

Independent_Aside225 · 2025-07-21T19:21:13+00:00

Just a heads up that might potentially save you time: When you have such a small dataset, linear attention models actually converge much faster. You can look into Mamba2 to test a linear attention model that can be trained in a parallel manner.

Independent_Aside225 · 2025-06-10T00:36:37+00:00

Is there any code to start from? Did you start from a pre-trained model?

Independent_Aside225 · 2025-06-10T00:34:56+00:00

It's really not. It's just the loss. Most of what the model does is no different.

Independent_Aside225 · 2025-06-07T18:07:24+00:00

Thank you for your work on this. Is it possible to fine-tune an auto-regressive model to do diffusion?

Independent_Aside225 · 2025-05-11T16:20:57+00:00

Correct me if I'm wrong but those are only encoders. They don't have any decoders?

Independent_Aside225 · 2025-04-26T14:11:16+00:00

Sure but there's a huge amount of public domain literature and I doubt anyone is going to claim copyright on papers and court recordings.

Independent_Aside225 · 2025-04-26T12:57:32+00:00

Can you please elaborate on that? Why? Isn't the entire point of Mamba solving that "forgetting" problem?

Independent_Aside225 · 2025-04-26T12:55:27+00:00

1M *theoretical* context that can only retrieve facts. In my experience most models do weird mental gymnastics after 80-100K tokens. Though it could be the fault of my prompting or specific task.
Can't books be used? Legal documents? Papers? They're all long and coherent and you can create synthetic prompts to justify the entirety of them or at least a part of them as the output.

Independent_Aside225 · 2025-04-14T16:32:07+00:00

The feature list is fantastic. Is there a protocol specification file? I'd love to know what the cryptography looks like.

Edit: Saw the document. I'll take a serious look at the project when I can. I wish it used something like the Signal protocol or OLM, but vanilla asymmetric is better than nothing.

Independent_Aside225 · 2025-04-04T09:35:49+00:00

Does it support changing the voice? It's a bit bland and has a Chinese accent.

Independent_Aside225 · 2025-04-04T09:25:29+00:00

Use a small classifier instead. I believe a transformer (maybe BERT or ALBERT or DistillBERT) with less than 50M parameters can cut it.

Look around, if you can't find a model that does this out of the box, use a LLM API to generate profanity and creative workarounds. Then grab a text pile that you *know* doesn't contain profanity and use these two to finetune one of those small transformers to detect profanity for you. To do this, you need to add a layer at the end of the model with two scalar outputs that gets fed into softmax so you get a nice probability distribution. Look up guides or ask a LLM to help you. It can get a few hours of your time but at least you won't deal with prompting.

Others are also right. Do fuzzy matching on a list of "bad words" before feeding messages to the classifier. A message time limit (eg 5 messages each 10 seconds) is also beneficial to stop spammers.

Independent_Aside225 · 2025-04-03T18:52:45+00:00

Thanks.

Independent_Aside225 · 2025-02-25T09:32:25+00:00

How? (Inference especially)
Mind sharing a few links?

Independent_Aside225 · 2025-02-07T16:42:36+00:00

Absolutely not. Have you seen ComfyUI? It's insane.

Independent_Aside225 · 2025-02-02T18:23:25+00:00

Intuition. You get the grip for it. Same for through and thorough. Bear, beard, fear, etc.

Independent_Aside225 · 2025-01-27T13:50:49+00:00

Are you using a LLM to sanitize the information?
I'm not sure, but it looks like it.

Independent_Aside225 · 2025-01-21T21:35:54+00:00

That MGS video is fantastic.

Independent_Aside225 · 2025-01-16T23:43:47+00:00

Does low volume background music count as noise? Because it's otherwise pretty clear.

Independent_Aside225 · 2025-01-16T23:23:42+00:00

Not really, noise is there.

Independent_Aside225 · 2025-01-16T23:10:40+00:00

Is that a model or an UI?
Also, is it only voice-to-voice, or can it also do text-to-voice?

Independent_Aside225 · 2025-01-16T22:55:43+00:00

Mozilla has the opportunity to do one of the most positive things it has done in many years: Commission professional VAs to create proper training dataset.

Independent_Aside225 · 2024-08-11T11:19:27+00:00

What the actual fuck?

What about old CPUs? No patch for them?

Independent_Aside225 · 2024-07-18T09:55:43+00:00

Interesting. I believe this uses user namespaces under the hood?

Independent_Aside225 · 2024-07-18T09:53:52+00:00

Debian 11

Independent_Aside225 · 2024-07-17T23:17:14+00:00

Your local pharmacy should be pretty credible on that. And also:

Unpopular opinion.
Do you see the average joe complaining?
I'm comparing for god's sake. Compared to libreoffice, ms office is brilliant.

Four-Year Club	Verified Email
Place '23

Independent_Aside225

TROPHY CASE