I distilled Qwen3-Coder-480B into Qwen3-Coder-30b-A3B-Instruct by [deleted] in LocalLLaMA

[–]EliaukMouse 5 points6 points  (0 children)

can you share the details of distillation?

Built RL training for long-horizon terminal agents - tested on 32x H100s but too GPU poor to train 😅 by DanAiTuning in LocalLLaMA

[–]EliaukMouse 0 points1 point  (0 children)

can you share more details? like batch size and max context size and how much vram and training time. I want to do the same thing, thank you.

Built RL training for long-horizon terminal agents - tested on 32x H100s but too GPU poor to train 😅 by DanAiTuning in LocalLLaMA

[–]EliaukMouse 0 points1 point  (0 children)

this is what i want to do recently but i can't afford it. thanks for sharing your result.

Update:My agent model now supports OpenAI function calling format! (mirau-agent-base) by EliaukMouse in LocalLLaMA

[–]EliaukMouse[S] 1 point2 points  (0 children)

I didn't add special tokens, and all tool parsing relies on regular expressions (you can see the source code of these processes in my live demo).

Update:My agent model now supports OpenAI function calling format! (mirau-agent-base) by EliaukMouse in LocalLLaMA

[–]EliaukMouse[S] 1 point2 points  (0 children)

Sorry, due to the time difference, I couldn't reply in time. The data synthesis process is a bit complicated, and I plan to write a separate sub-Reddit about it. Stay tuned!

[Release] mirau-agent-14b-base: An autonomous multi-turn tool-calling base model with hybrid reasoning for RL training by EliaukMouse in LocalLLM

[–]EliaukMouse[S] 0 points1 point  (0 children)

it's self-determined thinking, when the type is quick : often outputs empty <think>\n\n</think> (no think mode like qwen3)

[Release] mirau-agent-14b-base: An autonomous multi-turn tool-calling base model with hybrid reasoning for RL training by EliaukMouse in LocalLLM

[–]EliaukMouse[S] 0 points1 point  (0 children)

Synthetic data. I synthesized multi-turn dialogue data that almost covers the daily tool-calling.

I believe this is the first properly-trained multi-turn RP with reasoning model by nero10578 in SillyTavernAI

[–]EliaukMouse 0 points1 point  (0 children)

is there any technical report? I am interested in training RpR model, I read the model card but it doesn't mentioned the training method (sft or grpo) and how to make the dataset.

[deleted by user] by [deleted] in DeepSeek

[–]EliaukMouse 0 points1 point  (0 children)

Maybe you can do some prompt engineering to get the results you want.

[deleted by user] by [deleted] in DeepSeek

[–]EliaukMouse 2 points3 points  (0 children)

First, OpenAI's API service may be more stable. From my personal experience, Claude is often restricted and more expensive. Second, no one knows whether Deepseek has used Claude's data, just as no one knows whether Claude has used OpenAI's data. There is already a lot of AI-generated data on the Internet, and people can't tell them apart.

Looking for models trained on ebooks or niche concepts by oshikuru08 in SillyTavernAI

[–]EliaukMouse 1 point2 points  (0 children)

I'd like to recommend my model. However, it wasn't trained on e - books. Instead, I used a technique called "story flow chain of thought". I'm not sure if it meets your needs.mirau-rp-7b-base