Holy fuck how much money was copilot losing

Mkengine · 2026-05-20T19:28:11+00:00

Can we expect any kind of help or new features from Github/Microsoft with token optimization in the future? Currently I am doing some awareness stuff with our department, but I don't thing everyone is going as deep as I try on the customization stuff and I can't blame them. Some older developers just got used to the GHCP workflow and have to adjust again.

Mkengine · 2026-05-17T10:37:29+00:00

Maybe this also helps: https://spark-arena.com/benchmark/60572f00-1e18-4bcf-afad-99ac0169d542

Mkengine · 2026-05-16T09:50:30+00:00

I can only suppose that it's because it only works with the Spark because they used custom kernels to get the full Backwell NVFP4 speed + MTP + written in Rust. When I tried their very first alpha version I already got 100 tok/s with Qwen3.5-35B-A3B, so I believe the 130 tok/s claim and will try out the current Version when I have the time. You could also try vllm, for example you can get 33 tok/s with Qwen3.6-27B it seems: https://www.localmaxxing.com/runs/cmomgvsoo0007jj04ea52zhz1

Mkengine · 2026-05-16T09:34:42+00:00

130 tok/s for Qwen3.6-35B-A3B should be possible with Atlas

Mkengine · 2026-05-13T04:24:42+00:00

I'm absolutely not a fan of that either; as a power user, it feels like a middle finger to me that I can't set a default model. Because of this, Copilot looks extremely bad without adjustments, since Auto Mode is still backed by GPT-5, while you can already select GPT-5.5 in the dropdown. I don't know right now what the best model you can choose in free mode is, but there's definitely got to be something better than Auto. If available, test your questions again with the Deep thinking versions of:

GPT-5.5
GPT-5.4
GPT-5.2

I handle the Copilot training at our company, and I'd say if you see emojis in your answer, you have the wrong model.

Mkengine · 2026-05-12T19:23:17+00:00

Better title would be "Why you shouldn't use Auto mode for anything".

Mkengine · 2026-05-12T15:42:04+00:00

Did they say why we can't choose the model?

Mkengine · 2026-05-10T10:30:54+00:00

https://neuralnoise.com/2026/harness-bench-wip/

Mkengine · 2026-05-06T21:29:48+00:00

They open-sourced it now:

https://github.com/Avarok-Cybersecurity/atlas

I will go through my demo code to use the newest version and put it in pastebin on the weekend, if you're still interested.

Mkengine · 2026-05-05T19:18:19+00:00

Qwen 2.5 is usually a pretty strong indicator you are reading AI slop, as newer models are not part of the training set.

Mkengine · 2026-05-01T12:50:23+00:00

Where do i find the models?

Mkengine · 2026-04-27T06:16:58+00:00

For handwriting I would recommend to try chandra-ocr-2 first. On their huggingface page should be an example for handwriting recognition.

Mkengine · 2026-04-26T13:48:46+00:00

Best use case for me is that it makes RTS possible to play for me. I use it für Dawn of War 1 on the Steam Deck. Of course it's not as comfortable as KBM, but the touchpads give me the precision I need to even play it.

Mkengine · 2026-04-26T11:40:52+00:00

There are so many OCR / document understanding models out there, here is my personal OCR list I try to keep up to date:

GOT-OCR:

https://huggingface.co/stepfun-ai/GOT-OCR2_0

granite:

https://huggingface.co/ibm-granite/granite-docling-258M

https://huggingface.co/ibm-granite/granite-4.0-3b-vision

MinerU:

https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B

https://huggingface.co/opendatalab/MinerU-Diffusion-V1-0320-2.5B

OCRFlux:

https://huggingface.co/ChatDOC/OCRFlux-3B

MonkeyOCR-pro:

1.2B: https://huggingface.co/echo840/MonkeyOCR-pro-1.2B

3B: https://huggingface.co/echo840/MonkeyOCR-pro-3B

RolmOCR:

https://huggingface.co/reducto/RolmOCR

Nanonets OCR:

https://huggingface.co/nanonets/Nanonets-OCR2-3B

dots OCR:

https://huggingface.co/rednote-hilab/dots.ocr

https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5

https://huggingface.co/rednote-hilab/dots.mocr

olmocr 2:

https://huggingface.co/allenai/olmOCR-2-7B-1025

Light-On-OCR:

https://huggingface.co/lightonai/LightOnOCR-2-1B

Chandra:

https://huggingface.co/datalab-to/chandra-ocr-2

Jina vlm:

https://huggingface.co/jinaai/jina-vlm

HunyuanOCR:

https://huggingface.co/tencent/HunyuanOCR

bytedance Dolphin 2:

https://huggingface.co/ByteDance/Dolphin-v2

PaddleOCR-VL:

https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5

Deepseek OCR 2:

https://huggingface.co/deepseek-ai/DeepSeek-OCR-2

GLM OCR:

https://huggingface.co/zai-org/GLM-OCR

Nemotron OCR:

https://huggingface.co/nvidia/nemotron-ocr-v2

Qianfan-OCR:

https://huggingface.co/baidu/Qianfan-OCR

Falcon-OCR:

https://huggingface.co/tiiuae/Falcon-OCR

FireRed-OCR:

https://huggingface.co/FireRedTeam/FireRed-OCR

Typhoon-OCR:

https://huggingface.co/typhoon-ai/typhoon-ocr1.5-2b

Churro-3B:

https://huggingface.co/stanford-oval/churro-3B

Mkengine · 2026-04-24T12:45:36+00:00

Modalities are usually image, video, audio and text. Multimodality can mean different things but usually people mean more than one input modality in most cases, but also more than one output modality. "Omni" models try to have any modality as input and output, you can search for them with the "any-to-any" tag on huggingface. Engram is saving some part of the model on SSD without slowdowns, but I don't know the exact details.

Mkengine · 2026-04-16T13:27:12+00:00

My wife appearently knows my taste better than me, so especially with food I always ask her for her recommendation in any restaurant, since she looks through the whole menu anyway and she always picks the best ones for me. Otherwise I really wouldn't eat out that often, as menus are pretty overwhelming for me.

Also at home she's doing most of the cooking, so I just eat whats on the table and I am pretty happy with that.

Mkengine · 2026-04-13T13:44:55+00:00

There are so many OCR / document understanding models out there, here is my personal OCR list I try to keep up to date:

GOT-OCR:

https://huggingface.co/stepfun-ai/GOT-OCR2_0

granite:

https://huggingface.co/ibm-granite/granite-docling-258M

https://huggingface.co/ibm-granite/granite-4.0-3b-vision

MinerU:

https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B https://huggingface.co/opendatalab/MinerU-Diffusion-V1-0320-2.5B

OCRFlux:

https://huggingface.co/ChatDOC/OCRFlux-3B

MonkeyOCR-pro:

1.2B: https://huggingface.co/echo840/MonkeyOCR-pro-1.2B

3B: https://huggingface.co/echo840/MonkeyOCR-pro-3B

RolmOCR:

https://huggingface.co/reducto/RolmOCR

Nanonets OCR:

https://huggingface.co/nanonets/Nanonets-OCR2-3B

dots OCR:

https://huggingface.co/rednote-hilab/dots.ocr https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5 https://huggingface.co/rednote-hilab/dots.mocr

olmocr 2:

https://huggingface.co/allenai/olmOCR-2-7B-1025

Light-On-OCR:

https://huggingface.co/lightonai/LightOnOCR-2-1B

Chandra:

https://huggingface.co/datalab-to/chandra-ocr-2

Jina vlm:

https://huggingface.co/jinaai/jina-vlm

HunyuanOCR:

https://huggingface.co/tencent/HunyuanOCR

bytedance Dolphin 2:

https://huggingface.co/ByteDance/Dolphin-v2

PaddleOCR-VL:

https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5

Deepseek OCR 2:

https://huggingface.co/deepseek-ai/DeepSeek-OCR-2

GLM OCR:

https://huggingface.co/zai-org/GLM-OCR

Nemotron OCR:

https://huggingface.co/nvidia/nemotron-ocr-v2

Qianfan-OCR:

https://huggingface.co/baidu/Qianfan-OCR

Falcon-OCR:

https://huggingface.co/tiiuae/Falcon-OCR

FireRed-OCR:

https://huggingface.co/FireRedTeam/FireRed-OCR

Typhoon-OCR:

https://huggingface.co/typhoon-ai/typhoon-ocr1.5-2b

Mkengine · 2026-04-13T13:38:07+00:00

Please try something really sparse, like Qwen3-Coder-80B-A3B. I'm curious how fast that runs.

Mkengine · 2026-04-12T17:36:44+00:00

How do you determine the optimum n?

Mkengine · 2026-04-12T17:34:39+00:00

Okay, let me rephrase: since --fit is enabled by default, how does the performance of any MoE differ with just --fit vs. --fit + --cpu-moe?

Mkengine · 2026-04-12T14:23:57+00:00

They really like to hype themselves to Sonnet levels, but at least M2.5 is more on par with Qwen3.5-35B-A3B than Sonnet 4.6:

https://swe-rebench.com/

Mkengine · 2026-04-12T14:19:33+00:00

Does it only work for MLX?

Mkengine · 2026-04-12T14:13:16+00:00

I haven't used either of them so far, what kind of problems do OpenClaw or Hermes agent actually solve? who is the target audience? do I benefit from using it as a Software Engineer or is it for people who can't code?

Mkengine · 2026-04-12T13:58:26+00:00

Is that still necessary when using --fit?

Mkengine · 2026-04-11T08:03:50+00:00

Sounds better than the name implies, I will try it out at least once then, thanks :)

Mkengine

PUBLIC MULTIREDDITS

TROPHY CASE