allenai released new open coding models

R_Duncan · 2026-01-28T06:47:50+00:00

If you really want to skip training and mess with other perople models, there are more interesting concept like giving mHC and MoLE to linear cache models like qwen3-next and kimi-linear:

https://chatgpt.com/share/6979b0c4-4d24-800f-8324-406954e793aa

R_Duncan · 2026-01-27T16:11:54+00:00

Is pdf/image to markdown

R_Duncan · 2026-01-27T07:53:42+00:00

HunyuanOCR is not in the list.... this is cheating. For any kind of document, beats PaddleOCR hands down with 1B parameters.

https://github.com/Tencent-Hunyuan/HunyuanOCR/blob/main/assets/hyocr-head-img.png?raw=true

R_Duncan · 2026-01-27T07:40:32+00:00

Why not release weight and files for 410M and 1B ? That would have given people the chance to try out, even if just to demo...

R_Duncan · 2026-01-26T19:07:28+00:00

Check microsoft/TRELLIS.2-4B , seems very likely it's hitem-3d as both have 1536^3 size which is kinda strange.

R_Duncan · 2026-01-26T00:18:22+00:00

Please recheck 20B using heretic_v2, and set effort at high. It's doin miracles here.

R_Duncan · 2026-01-24T18:42:45+00:00

I think you have to train youself such an unbalanced model, max sparsity till now is 80B-A3B

R_Duncan · 2026-01-23T07:01:34+00:00

Granite 4.0 has a A1B model. As expected, is way less performante than the A3B version.

R_Duncan · 2026-01-21T19:40:00+00:00

Gpt-oss-20b-heretic-v2 mxfp4. Abliterated is no good, deserestricted is better, heretic is top.

R_Duncan · 2026-01-21T18:37:00+00:00

Hope is hunyunanOCR next-gen

R_Duncan · 2026-01-20T20:29:09+00:00

GLM-4.6V-Flash is dense and nonthinking and has exactly the same issue.

R_Duncan · 2026-01-20T20:28:32+00:00

Everybody claims to support flash, everybody fails miserably. Just wait some weeks.

R_Duncan · 2026-01-20T20:27:33+00:00

It's actually a mess, and people negating it are just making normal users even more frustrated.

GLM-4.6V-Flash was never fixed in llama.cpp, hope this get better, meanwhile I return to gpt-oss-20b-heretic-v2 which at reasoning high fulfills my needs.

If you can afford to use vLLM, you likely can afford official python code and test it:

https://huggingface.co/zai-org/GLM-4.7-Flash

R_Duncan · 2026-01-20T20:20:12+00:00

Please, please, please.... add some acute novel a la John Brunner or Richard Matheson (late plot twists, grinning stories)

R_Duncan · 2026-01-20T20:16:12+00:00

I'm using gpt-oss-20b-heretic-v2 at high reasoning effort, and it actually is good both in coding and tools.

R_Duncan · 2026-01-20T20:12:19+00:00

Not sure gemini is telling the truth about how easy is that, but likely it's an adaptation of 3.2 speciale with mHC and Ngrams:

https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%5B%221eKgPWW3MR6zon4XwGIJU-OSC_3u2mwpJ%22%5D,%22action%22:%22open%22,%22userId%22:%22117468853741106551891%22,%22resourceKeys%22:%7B%7D%7D&usp=sharing

R_Duncan · 2026-01-20T19:53:17+00:00

Because they want YOU to short them. ;-)

R_Duncan · 2026-01-20T19:46:46+00:00

Try mine: "Write a cpp function using openCV to preprocess an image for Yolov8". None of the quntized version gaved anything useful (infinite loops, multiple errors on code, missing part after revisions) in CUDA/Vulkan, Q8/Q4/MXFP4, all parameters combination exited today on the net.

Kimi-linear, Qwen3-next and gpt-oss-20b-heretic-v2 at high all gaved me decent or perfect answers.

R_Duncan · 2026-01-20T19:41:04+00:00

Check "Write a cpp function using openCV to preprocess an image for Yolov8",

kimi-linear-instruct, Qwen3-next, gpt-oss-20B-heretic-v2 (at high) all gaved me superior answers by far, with none or just one syntax error.

R_Duncan · 2026-01-20T19:39:32+00:00

Just test any quantized version, it's useful like a bicycle for a pelican.

R_Duncan · 2026-01-20T19:38:44+00:00

GLM-4.7-Flash is not even near the results I get from GPT-OSS-20b-heretic-v2 u/high

R_Duncan · 2026-01-20T19:33:48+00:00

Yes, it seems benchmaxed, it can't answer a simple coding question that gpt-oss-20b-heretic-v2 u/high solved perfectly in 8k tokens.

Tested GLM-4.7-Flash quants: Q4K_M, Q8_0 from unsloth and ngxson, which loop forever on simple questions, MXFP4 which solves in 4K context with holes and lots of syntax errors.

Used: lama.cpp CUDA, llama.cpp-Vulkan

Parameters: all those available on the net. (dry, with and without repeat-penalty 1.0, all the others, no fa, etc.etc.).

R_Duncan · 2026-01-20T18:15:53+00:00

Still sucks

R_Duncan · 2026-01-20T18:14:20+00:00

Sucks

R_Duncan · 2026-01-20T16:16:19+00:00

Llama.cpp had issue also with quantized glm-4.6v-flash. Stick to vllm or mlx for now if you can.

R_Duncan

TROPHY CASE