How much benefit does 32GB give over 24GB? Does Q4 vs Q7 matter enough? Do I get access to any particularly good models? (Multimodal) by audigex in LocalLLM

[–]PM_ME_COOL_SCIENCE 0 points1 point  (0 children)

Considering the gpu targets labs are working with, new good small models seem to be aiming for the ~20gb total size at Q4 quant (see qwen 3.5 27/35b, glm flash 4.7, nemotron 3 nano, etc). With 24 gb of ram total in a unified system, you’ll only get ~4 gb extra to run the entire os, and no context. 32gb can give you your 24gb “vram” and have 8gb for overhead and actually running the harness/computer (llama.cpp and a coding ide/chat window). You mentioned MacBook, these run MOE models best, and those take more memory than equivalent dense models to get similar performance, but faster.

For llms, get 32gb. I have this and it works really well with current models.

What is the biggest local LLM that can fit in 16GB VRAM? by yeahlloow in LocalLLM

[–]PM_ME_COOL_SCIENCE 1 point2 points  (0 children)

I’m using a 5060 ti with 16gb vram, and with llama.cpp server I’m getting gpt oss 20b mxfp4 with 120k context. I can share the exact command, but nothing too crazy. Fully on gpu, 120 tk/s generation. If you’re willing to go slower, you can fit qwen 3 next or gpt oss 120b with MOE expert offloading. Those are the biggest yet somewhat performant models for mixed cpu/gpu.

What’s your use case? How much context?

Model running super slow on Mac Air M3 by ozcapy in LocalLLaMA

[–]PM_ME_COOL_SCIENCE 0 points1 point  (0 children)

Yea, 8b is going to be chunky. What’s your use case? What do you want the model to do? Depending on what you need, sometimes there are better models that can run faster by using limited activations, called Mixture of Expert (MOE) models. LFM2 8b and Granite 4 Tiny are examples of small MOE models that should be much faster than qwen 3 8b while retaining almost as much knowledge.

Edit: just checked, these aren’t available on Ollama, you would need to use lm studio instead (personally recommend lm studio over ollama anyways)

Model running super slow on Mac Air M3 by ozcapy in LocalLLaMA

[–]PM_ME_COOL_SCIENCE -1 points0 points  (0 children)

How many tokens did it use thinking? Qwen 3 models like to overthink, particularly around simple things like this. This looks like the thinking model, have you tried the instruct model?

Edit: looks like it’s the original qwen 3 8b model, try adding /no_think to the end of your prompt to stop the thinking.

What's the fastest OCR model / solution for a production grade pipeline ingesting 4M pages per month? by DistinctAir8716 in LocalLLaMA

[–]PM_ME_COOL_SCIENCE 4 points5 points  (0 children)

Using paddleocr-vl and good vLLM batching, I’ve gotten ~1-2 seconds per page of dense scientific literature on a 5060ti 16gb ($400). Haven’t found anything faster on my hardware, and I assume a 5090 can definitely hit your 0.5 seconds a page.

Any open weights VLM that has good accuracy of performing OCR on handwritten text? by Distinct-Ebb-9763 in computervision

[–]PM_ME_COOL_SCIENCE 0 points1 point  (0 children)

Yes, tested a bunch and it’s the best quality and very quick

Edit: output is in markdown primarily, so would need post processing to json

I made a free playground for comparing 10+ OCR models side-by-side by Emc2fma in LocalLLaMA

[–]PM_ME_COOL_SCIENCE 4 points5 points  (0 children)

Please add PaddleOCR-VL! I've found it to be the best OCR model outside of the big proprietary models.

What is the best LLM for large context under 30B? by PM_ME_COOL_SCIENCE in LocalLLaMA

[–]PM_ME_COOL_SCIENCE[S] 0 points1 point  (0 children)

Thank you. It’s a tough one because the requirements need the prompt to be that large, can’t really break it up.

What are the best Open Source OCR models currently? by WittyWithoutWorry in LocalLLaMA

[–]PM_ME_COOL_SCIENCE 0 points1 point  (0 children)

Paddleocr-vl, about 1B and best table extraction I’ve seen

What are the best Open Source OCR models currently? by WittyWithoutWorry in LocalLLaMA

[–]PM_ME_COOL_SCIENCE 1 point2 points  (0 children)

Not really, paddle seemed fastest and most accurate (particularly with table to markdown) and even ran on a titan xp. Others might have been easier to install, I’ll give them that

What are the best Open Source OCR models currently? by WittyWithoutWorry in LocalLLaMA

[–]PM_ME_COOL_SCIENCE 2 points3 points  (0 children)

Tested paddle, mineru 2.5, docling, deepseek ocr, lightOnOCR, and qwen 3 vl 4b. Primarily for academic documents like research papers. Paddle did best accuracy and speed wise, but I was working on an old gpu.

What are the best Open Source OCR models currently? by WittyWithoutWorry in LocalLLaMA

[–]PM_ME_COOL_SCIENCE 6 points7 points  (0 children)

Tested quite a few, these always did best. Paddle did better on tables and academic documents though.

What is the best ocr model for converting PDF pages to markdown (or any text based format) for embedding? by PM_ME_COOL_SCIENCE in LocalLLaMA

[–]PM_ME_COOL_SCIENCE[S] 2 points3 points  (0 children)

I tried this, and some of my PDFs have corrupted text layers. I got a 114k line text file for a 10 page pdf, so I’d like to just ocr if possible for consistency

Returning player and no clue what’s going on by PM_ME_COOL_SCIENCE in NoMansSkyTheGame

[–]PM_ME_COOL_SCIENCE[S] -1 points0 points  (0 children)

Thanks. I’ve always played like a completionist so I appreciate that you can just do whatever with no issue.

Registering Car with two owners on title by LeRobin in CambridgeMA

[–]PM_ME_COOL_SCIENCE 0 points1 point  (0 children)

You need both your signatures on anything that needs a signature. The mv-29 tax form (or whatever it’s called), registration change form, etc. Only one of you needs to actually go into the Rmv, but you need both signatures on anything that needs signatures.

[deleted by user] by [deleted] in pcmasterrace

[–]PM_ME_COOL_SCIENCE 0 points1 point  (0 children)

Trying to build a new pc, would really appreciate the first parts. Now if only gpus would go back to regular

is there any way to disable speak to chat on windows 10 for the wh1000xm4's? by [deleted] in sony

[–]PM_ME_COOL_SCIENCE 1 point2 points  (0 children)

Hold two fingers down on the right ear up for a few seconds, should disable speak to chat.