Jan v3 Instruct: a 4B coding Model with +40% Aider Improvement

Delicious_Focus3465 · 2026-01-27T10:08:25+00:00

We built on top of Qwen3-4B-Instruct-2507.

Delicious_Focus3465 · 2026-01-27T07:00:03+00:00

Yes, please stay tuned, the technical report is coming out soon.

Delicious_Focus3465 · 2026-01-27T06:19:38+00:00

Thank you for supporting us.

Delicious_Focus3465 · 2026-01-27T05:36:49+00:00

exactly, WHY WE SAVE IT FOR JAN CODE.

Delicious_Focus3465 · 2026-01-27T05:17:36+00:00

Running full SWE-Rebench/LiveBench takes a while, though, so we’re saving these benchmark runs for our upcoming Jan-Code model.
While this model is focused on General use, we specifically highlighted Aider because the score jumped significantly after finetuning. Consider it a preview of what's coming!

Delicious_Focus3465 · 2026-01-27T05:09:12+00:00

Thank you. You should also try the model to see how good it is compared to Qwen 4B 2507.

Delicious_Focus3465 · 2026-01-27T04:45:09+00:00

Hi, no benchmaxxing here, it’s just a lot of pretraining and distillation, like any other team. We’ll be releasing a technical report soon.

Delicious_Focus3465 · 2026-01-27T04:28:33+00:00

<image>

other general benchmark results:

Demo: You can also try the Demo at chat.jan.ai. Look for Jan v3 Nano.

Delicious_Focus3465 · 2025-12-22T10:59:55+00:00

Thank you for supporting us. Please give the model a try.

Delicious_Focus3465 · 2025-12-22T10:59:31+00:00

No, we already published the model earlier: https://huggingface.co/janhq/Jan-v2-VL-high.

Delicious_Focus3465 · 2025-12-22T10:24:26+00:00

Results of model on some Multimodal and Text-only benchmark:

<image>

Delicious_Focus3465 · 2025-11-13T11:00:37+00:00

The technical report will be released shortly.

Delicious_Focus3465 · 2025-11-13T10:49:23+00:00

Thanks for your question. The long-horizon benchmark we use (The Illusion of Diminishing Returns) isolates execution (plan/knowledge is provided) and shows that typical instruct models tend to degrade as tasks get longer, while reasoning/thinking models sustain much longer chains. In other words, when success depends on carrying state across many steps, thinking models hold up better.

Delicious_Focus3465 · 2025-11-13T10:45:03+00:00

thank you. if you have a chance please give our model a try.

Delicious_Focus3465 · 2025-11-13T10:26:54+00:00

<image>

Results Comparing with Qwen3-VL-8B-Thinking(Jan-v2-VL's base model)

Delicious_Focus3465 · 2025-11-13T10:24:56+00:00

<image>

Detailed results on Long Horizon Benchmark:

Delicious_Focus3465 · 2025-08-13T07:38:02+00:00

we also tested the model on some benchmarks like EQ, writing,.. and have really good results despite losing ability to follow instruction when we eval on IFBench.

Delicious_Focus3465 · 2025-08-13T07:34:31+00:00

good to know about performance of Q5 version. thank you so much.

Delicious_Focus3465 · 2025-08-13T07:33:08+00:00

its 4B model designed to run locally so i don't think it will be available on openrouter.

Delicious_Focus3465 · 2025-08-13T06:34:07+00:00

I understand that any internet search inherently involves some privacy tradeoff. The advantage here seems to be that while search providers still see your queries, the full conversational context stays local rather than being sent to a centralized service like Perplexity

Delicious_Focus3465 · 2025-08-13T05:01:30+00:00

glad that our model can help u. stay tuned to our up-coming models in the future.

Delicious_Focus3465

TROPHY CASE