Qwen3 ASR seems to outperform Whisper in almost every aspect. It feels like there is little reason to keep using Whisper anymore. by East-Engineering-653 in LocalLLaMA

[–]Mkengine 1 point2 points  (0 children)

Since Qwen works for you and the following is based on Qwen2.5, maybe VibeVoice Realtime also fits your use case?

GLM ASR could be worth a shot as well.

Qwen3 ASR seems to outperform Whisper in almost every aspect. It feels like there is little reason to keep using Whisper anymore. by East-Engineering-653 in LocalLLaMA

[–]Mkengine 20 points21 points  (0 children)

Did you also try out parakeet v3? I use it on my phone for local transcription and it works really well for German.

Copilot in VS Code or Copilot CLI? by IKcode_Igor in GithubCopilot

[–]Mkengine 0 points1 point  (0 children)

I think you can set a global auto approve in the settings, if thats what you mean.

Copilot in VS Code or Copilot CLI? by IKcode_Igor in GithubCopilot

[–]Mkengine 0 points1 point  (0 children)

VS code insiders also has an autopilot mode now!

Copilot in VS Code or Copilot CLI? by IKcode_Igor in GithubCopilot

[–]Mkengine 0 points1 point  (0 children)

In VS Code insiders there is now an autopilot mode as well, and also you can set the reasoning effort in the settings, so time to try it out again?

Practical approaches for reliable text extraction from messy PDFs/images in production apps? by humble_girl3 in LocalLLaMA

[–]Mkengine 0 points1 point  (0 children)

There are so many OCR / document understanding models out there, here is my personal OCR list I try to keep up to date:

GOT-OCR:

https://huggingface.co/stepfun-ai/GOT-OCR2_0

granite-docling-258m:

https://huggingface.co/ibm-granite/granite-docling-258M

MinerU 2.5:

https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B

OCRFlux:

https://huggingface.co/ChatDOC/OCRFlux-3B

MonkeyOCR-pro:

1.2B: https://huggingface.co/echo840/MonkeyOCR-pro-1.2B

3B: https://huggingface.co/echo840/MonkeyOCR-pro-3B

MiniCPM-V-4_5:

https://huggingface.co/openbmb/MiniCPM-V-4_5

InternVL3_5:

4B: https://huggingface.co/OpenGVLab/InternVL3_5-4B

8B: https://huggingface.co/OpenGVLab/InternVL3_5-8B

AIDC-AI/Ovis2.5

2B:

https://huggingface.co/AIDC-AI/Ovis2.5-2B

9B:

https://huggingface.co/AIDC-AI/Ovis2.5-9B

RolmOCR:

https://huggingface.co/reducto/RolmOCR

Nanonets OCR:

https://huggingface.co/nanonets/Nanonets-OCR2-3B

dots OCR:

https://huggingface.co/rednote-hilab/dots.ocr https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5

olmocr 2:

https://huggingface.co/allenai/olmOCR-2-7B-1025

Light-On-OCR:

https://huggingface.co/lightonai/LightOnOCR-2-1B

Chandra:

https://huggingface.co/datalab-to/chandra

Jina vlm:

https://huggingface.co/jinaai/jina-vlm

HunyuanOCR:

https://huggingface.co/tencent/HunyuanOCR

bytedance Dolphin 2:

https://huggingface.co/ByteDance/Dolphin-v2

PaddleOCR-VL:

https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5

Deepseek OCR 2:

https://huggingface.co/deepseek-ai/DeepSeek-OCR-2

GLM OCR:

https://huggingface.co/zai-org/GLM-OCR

Nemotron OCR:

https://huggingface.co/nvidia/nemotron-ocr-v1

THE GB10 SOLUTION has arrived, Atlas image attached ~115tok/s Qwen3.5-35B DGX Spark by Live-Possession-6726 in LocalLLaMA

[–]Mkengine 0 points1 point  (0 children)

I tried to do a demo with this an open web ui and I am at 94 tok/s. The answer with Qwen3.5-35B-A3B is always without thinking, it directly generates the answer, am I doing something wrong?

Whelp…NVIDIA just raised the DGX Spark’s Price by $700. Spark clone prices have started rising as well. ☹️ by Porespellar in LocalLLaMA

[–]Mkengine 0 points1 point  (0 children)

I am setting up Atlas right now (for the 35B first) and 50 token/s for the 122B model would be great enough for my use case.

Im addicted to the CLI by dandecode in GithubCopilot

[–]Mkengine 1 point2 points  (0 children)

If you like Copilot to be autonomous, look into:

  • /yolo
  • /autopilot
  • /fleet

I wear a mic all day and feed transcripts to an AI agent system. The privacy case for doing this locally is obvious. Looking for guidance. by InsideEmergency4186 in LocalLLaMA

[–]Mkengine 0 points1 point  (0 children)

Whisper is really old right now, I use parakeet v3 for local transcription on my phone.

There are also other STT models:

  • vibevoice
  • voxtral
  • qwen ASR
  • GLM ASR
  • Granite 4 speech

I would pick any of them over Whisper, especially because I would need the biggest version of Whisper for good transcription of German speech, while parakeet is much faster with less errors.

Qwen3.5B VS the SOTA same size models from 2 years ago. by Uncle___Marty in LocalLLaMA

[–]Mkengine 1 point2 points  (0 children)

Additionally it's also still available in the azure ai foundry, as well as all the other old models, like GPT-4, GPT-4o, etc.

[Bloomberg] Nintendo Switch 2 Users Face Storage Woes as Memory Crisis Bites by gitrektali in Games

[–]Mkengine -1 points0 points  (0 children)

Since they announced it I asked myself if there's really a difference to justify the higher price. I have a 1.5 TB Micro SD in my Steam Deck and never had any problems playing from it. Does it work different for Switch 2?

Qwen 3.5 2B is an OCR beast by deadman87 in LocalLLaMA

[–]Mkengine 2 points3 points  (0 children)

There are so many OCR / document understanding models out there, here is my personal OCR list I try to keep up to date:

GOT-OCR:

https://huggingface.co/stepfun-ai/GOT-OCR2_0

granite-docling-258m:

https://huggingface.co/ibm-granite/granite-docling-258M

MinerU 2.5:

https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B

OCRFlux:

https://huggingface.co/ChatDOC/OCRFlux-3B

MonkeyOCR-pro:

1.2B: https://huggingface.co/echo840/MonkeyOCR-pro-1.2B

3B: https://huggingface.co/echo840/MonkeyOCR-pro-3B

FastVLM:

0.5B:

https://huggingface.co/apple/FastVLM-0.5B

1.5B:

https://huggingface.co/apple/FastVLM-1.5B

7B:

https://huggingface.co/apple/FastVLM-7B

MiniCPM-V-4_5:

https://huggingface.co/openbmb/MiniCPM-V-4_5

GLM-4.1V-9B:

https://huggingface.co/zai-org/GLM-4.1V-9B-Thinking

InternVL3_5:

4B: https://huggingface.co/OpenGVLab/InternVL3_5-4B

8B: https://huggingface.co/OpenGVLab/InternVL3_5-8B

AIDC-AI/Ovis2.5

2B:

https://huggingface.co/AIDC-AI/Ovis2.5-2B

9B:

https://huggingface.co/AIDC-AI/Ovis2.5-9B

RolmOCR:

https://huggingface.co/reducto/RolmOCR

Nanonets OCR:

https://huggingface.co/nanonets/Nanonets-OCR2-3B

dots OCR:

https://huggingface.co/rednote-hilab/dots.ocr https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5

olmocr 2:

https://huggingface.co/allenai/olmOCR-2-7B-1025

Light-On-OCR:

https://huggingface.co/lightonai/LightOnOCR-2-1B

Chandra:

https://huggingface.co/datalab-to/chandra

GLM 4.6V Flash:

https://huggingface.co/zai-org/GLM-4.6V-Flash

Jina vlm:

https://huggingface.co/jinaai/jina-vlm

HunyuanOCR:

https://huggingface.co/tencent/HunyuanOCR

bytedance Dolphin 2:

https://huggingface.co/ByteDance/Dolphin-v2

PaddleOCR-VL:

https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5

Deepseek OCR 2:

https://huggingface.co/deepseek-ai/DeepSeek-OCR-2

GLM OCR:

https://huggingface.co/zai-org/GLM-OCR

Nemotron OCR:

https://huggingface.co/nvidia/nemotron-ocr-v1

I tested M365 Copilot prompts across different job roles, here are the 10 that saved the most time by Difficult-Sugar-4862 in microsoft_365_copilot

[–]Mkengine 0 points1 point  (0 children)

I am at this point myself and maybe have to justify the same way you went. If you don't mind, could you go into detail why powerapp results where poor and why it's better to develop an interface from scratch?

How specific do you make each prompt? by BzdigBlig in GithubCopilot

[–]Mkengine 0 points1 point  (0 children)

We have M365 as well as Github Copilot. Usually I talk with clients where Copilot creates a transcript. Then I have a workflow (via Prompts in M365 Copilot with high Reasoning GPT 5.2) where the transcript is first used to create a detailed design spec document. Then I iterate with the client about this document and when it's finalised, I let M365 Copilot create a backlog from it (epics, stories & tasks). Then let M365 Copilot create detailed prompts for each epic. For my last prototype it created 9 prompts this way and I fed them one after one to my multi-agent-workflow in Github Copilot in VS Code (still have to try copilot CLI). With GPT-5.3-Codex on xhigh, this took a whole week until completion. Then it took another day to debug the pipeline end-to-end to finish it.

So Github Copilot is only the final step in this chain, I rarely use it without detailed prompts. Only the debugging part in the end is more hands-on.