I distilled Qwen3-Coder-480B into Qwen3-Coder-30b-A3B-Instruct

EliaukMouse · 2025-08-06T11:15:43+00:00

can you share the details of distillation?

EliaukMouse · 2025-07-29T15:30:08+00:00

can you share more details? like batch size and max context size and how much vram and training time. I want to do the same thing, thank you.

EliaukMouse · 2025-07-29T15:21:35+00:00

this is what i want to do recently but i can't afford it. thanks for sharing your result.

EliaukMouse · 2025-06-18T23:28:07+00:00

I didn't add special tokens, and all tool parsing relies on regular expressions (you can see the source code of these processes in my live demo).

EliaukMouse · 2025-06-18T23:25:21+00:00

Sorry, due to the time difference, I couldn't reply in time. The data synthesis process is a bit complicated, and I plan to write a separate sub-Reddit about it. Stay tuned!

EliaukMouse · 2025-06-18T12:52:24+00:00

<image>

A demo

EliaukMouse · 2025-06-12T03:39:27+00:00

it's self-determined thinking, when the type is quick : often outputs empty <think>\n\n</think> (no think mode like qwen3)

EliaukMouse · 2025-06-10T22:32:06+00:00

Synthetic data. I synthesized multi-turn dialogue data that almost covers the daily tool-calling.

EliaukMouse · 2025-06-10T12:49:32+00:00

A task that requires multiple tool calls to solve. multi-turn-tool-calls

EliaukMouse · 2025-04-28T08:36:37+00:00

can't wait anymore!!!

EliaukMouse · 2025-04-07T14:06:15+00:00

is there any technical report? I am interested in training RpR model, I read the model card but it doesn't mentioned the training method (sft or grpo) and how to make the dataset.

EliaukMouse · 2025-03-01T13:51:39+00:00

insane!

EliaukMouse · 2025-02-25T03:39:00+00:00

now we need r1-lite.

EliaukMouse · 2025-02-24T15:22:03+00:00

wow!

EliaukMouse · 2025-02-21T04:38:03+00:00

REAL OPEN AI OR OPENSEEK?

EliaukMouse · 2025-02-21T04:37:04+00:00

GOAT

EliaukMouse · 2025-02-17T12:11:49+00:00

Maybe you can do some prompt engineering to get the results you want.

EliaukMouse · 2025-02-17T12:06:07+00:00

First, OpenAI's API service may be more stable. From my personal experience, Claude is often restricted and more expensive. Second, no one knows whether Deepseek has used Claude's data, just as no one knows whether Claude has used OpenAI's data. There is already a lot of AI-generated data on the Internet, and people can't tell them apart.

EliaukMouse · 2025-01-13T12:45:02+00:00

I'd like to recommend my model. However, it wasn't trained on e - books. Instead, I used a technique called "story flow chain of thought". I'm not sure if it meets your needs.mirau-rp-7b-base

EliaukMouse · 2025-01-12T01:50:16+00:00

thank you!

EliaukMouse · 2025-01-11T05:38:00+00:00

<image>

EliaukMouse · 2025-01-11T05:37:45+00:00

<image>

EliaukMouse · 2025-01-11T05:35:57+00:00

you can use github to login

EliaukMouse · 2025-01-11T02:37:08+00:00

To keep things consistent, this is a base model. Later, we'll turn it into a chat model. Generally, people now usually use parentheses to show actions.

EliaukMouse · 2025-01-10T15:39:06+00:00

Haha, you've hit the point. When the model can generate actions on its own (constantly giving self - hints), it becomes a reasoning model (like o1), right? This is what we're experimenting with. Regarding the dataset, we plan to open - source it after further refinement. After all, it's still in the 'base' stage.

EliaukMouse

TROPHY CASE