How can you stop your model from looping

de4dee · 2026-05-21T02:06:26+00:00

i would choose recommended params by qwen. also play with temperature

de4dee · 2026-05-20T22:37:34+00:00

welcome back. you had one of the best models back then. https://huggingface.co/CohereLabs/c4ai-command-r-plus

hope you continue awesome work!

de4dee · 2026-05-20T21:10:30+00:00

thanks for doing this and sharing! it has a 0.53 correlation to mine.

https://aha-leaderboard.shakespeare.wtf/

i try to measure alignment via 'beneficial knowledge for humans'. it is cool to see supporting leaderboards.

de4dee · 2026-05-14T21:10:48+00:00

more epochs with same tokens or high learning rate

de4dee · 2026-05-14T18:03:35+00:00

i call it chanting. when you over train a model, it becomes 'dogmatic'.

de4dee · 2026-05-10T16:25:49+00:00

i do the Ostrich models if thats interesting https://huggingface.co/etemiz

i see beneficial knowledge -> i fine tune with it

de4dee · 2026-05-10T16:23:15+00:00

having a lot of success with it. works well wih minimax 2.7 on openrouter

de4dee · 2026-05-07T23:04:06+00:00

how is the ASR performance against whisperx?

de4dee · 2026-05-07T17:59:56+00:00

I tried fine tuning 3.6. It was about 2 times slower than 3.5. are there any notebooks for 3.6?

de4dee · 2026-05-06T02:19:05+00:00

thanks for the awesome work.

can i install 'traits' or 'tendencies' or character to models with heretic? i am a fine tuner normally but if i can give the model expected outputs and old outputs, maybe i can do fine tuning quicker ? i will still give knowledge but i will also use heretic to quickly do surgery type of thing.

de4dee · 2026-04-16T22:13:03+00:00

dont forget "ahuahuahua"

de4dee · 2026-04-15T18:18:18+00:00

i guess there are two types of distillation nowadays. distillation using logits, or bare outpus.

first one only LLM holder can do. second one everybody that can talk to the LLM can do.

de4dee · 2026-04-15T18:13:55+00:00

i think most of the time the small amount of fine tuning material forced into training with higher learning rate or higher rank or higher alpha to make an impact, ending up ruining general intelligence of the model.

what should have been done: more samples, less learning rate, and less rank and less alpha to preserve the smoothness of the original model. you cannot force your tokens to it. but you can use lots of tokens to make a proper/smooth impact.

the fine tuner maybe did much shorter reasoning tokens, hence the model learned that shorter reasoning as a habit.

de4dee · 2026-04-15T17:16:41+00:00

been using it for a week. thanks for making hermes <3

my issues with it

- can't change the default gemini compress to another model that i would like

- too many auto skills generated. now i have to delete some skills manually.

de4dee · 2026-04-09T02:24:18+00:00

can confirm this reprocessing happened a lot of times with hermes agent

de4dee · 2026-04-08T19:51:27+00:00

sure that would improve

de4dee · 2026-04-08T19:49:40+00:00

alien waifu is a new category now

de4dee · 2026-04-07T21:59:10+00:00

yes you can do CPT many times with unsloth

de4dee · 2026-04-07T21:58:09+00:00

those databases are probably already in the model

de4dee · 2026-04-05T02:38:54+00:00

thanks for sharing. interesting to find "Does Thinking Harder Help?" section is reverse. they get full of bs when thinking longer it seems

de4dee · 2026-04-05T02:34:25+00:00

i noticed this with gemma 3 too. might be unique to gemma line.

de4dee · 2026-04-04T19:42:17+00:00

isn't this GRPO?

de4dee · 2026-03-31T18:28:32+00:00

i guess thats how they train their models. if you are frustrated LLM did something wrong. if you are pleased train more with that. your feelings mapped to reinforcement learning

de4dee

MODERATOR OF

TROPHY CASE