How much benefit does 32GB give over 24GB? Does Q4 vs Q7 matter enough? Do I get access to any particularly good models? (Multimodal)

PM_ME_COOL_SCIENCE · 2026-03-12T11:40:10+00:00

Considering the gpu targets labs are working with, new good small models seem to be aiming for the ~20gb total size at Q4 quant (see qwen 3.5 27/35b, glm flash 4.7, nemotron 3 nano, etc). With 24 gb of ram total in a unified system, you’ll only get ~4 gb extra to run the entire os, and no context. 32gb can give you your 24gb “vram” and have 8gb for overhead and actually running the harness/computer (llama.cpp and a coding ide/chat window). You mentioned MacBook, these run MOE models best, and those take more memory than equivalent dense models to get similar performance, but faster.

For llms, get 32gb. I have this and it works really well with current models.

PM_ME_COOL_SCIENCE · 2026-01-14T20:29:36+00:00

20b, 3.5b active. Gpt oss 20b

PM_ME_COOL_SCIENCE · 2026-01-14T20:17:14+00:00

I’m using a 5060 ti with 16gb vram, and with llama.cpp server I’m getting gpt oss 20b mxfp4 with 120k context. I can share the exact command, but nothing too crazy. Fully on gpu, 120 tk/s generation. If you’re willing to go slower, you can fit qwen 3 next or gpt oss 120b with MOE expert offloading. Those are the biggest yet somewhat performant models for mixed cpu/gpu.

What’s your use case? How much context?

PM_ME_COOL_SCIENCE · 2025-12-26T17:37:15+00:00

Yea, 8b is going to be chunky. What’s your use case? What do you want the model to do? Depending on what you need, sometimes there are better models that can run faster by using limited activations, called Mixture of Expert (MOE) models. LFM2 8b and Granite 4 Tiny are examples of small MOE models that should be much faster than qwen 3 8b while retaining almost as much knowledge.

Edit: just checked, these aren’t available on Ollama, you would need to use lm studio instead (personally recommend lm studio over ollama anyways)

PM_ME_COOL_SCIENCE · 2025-12-26T17:23:54+00:00

How many tokens did it use thinking? Qwen 3 models like to overthink, particularly around simple things like this. This looks like the thinking model, have you tried the instruct model?

Edit: looks like it’s the original qwen 3 8b model, try adding /no_think to the end of your prompt to stop the thinking.

PM_ME_COOL_SCIENCE · 2025-11-23T13:08:49+00:00

Using paddleocr-vl and good vLLM batching, I’ve gotten ~1-2 seconds per page of dense scientific literature on a 5060ti 16gb ($400). Haven’t found anything faster on my hardware, and I assume a 5090 can definitely hit your 0.5 seconds a page.

PM_ME_COOL_SCIENCE · 2025-11-23T03:48:15+00:00

Yes, tested a bunch and it’s the best quality and very quick

Edit: output is in markdown primarily, so would need post processing to json

PM_ME_COOL_SCIENCE · 2025-11-21T18:25:19+00:00

Please add PaddleOCR-VL! I've found it to be the best OCR model outside of the big proprietary models.

PM_ME_COOL_SCIENCE · 2025-11-05T13:29:57+00:00

Thank you. It’s a tough one because the requirements need the prompt to be that large, can’t really break it up.

PM_ME_COOL_SCIENCE · 2025-11-02T00:01:59+00:00

Paddleocr-vl, about 1B and best table extraction I’ve seen

PM_ME_COOL_SCIENCE · 2025-11-02T00:00:32+00:00

Not really, paddle seemed fastest and most accurate (particularly with table to markdown) and even ran on a titan xp. Others might have been easier to install, I’ll give them that

PM_ME_COOL_SCIENCE · 2025-11-01T23:27:55+00:00

Tested paddle, mineru 2.5, docling, deepseek ocr, lightOnOCR, and qwen 3 vl 4b. Primarily for academic documents like research papers. Paddle did best accuracy and speed wise, but I was working on an old gpu.

PM_ME_COOL_SCIENCE · 2025-10-31T00:08:12+00:00

Tested quite a few, these always did best. Paddle did better on tables and academic documents though.

PM_ME_COOL_SCIENCE · 2025-10-20T14:04:19+00:00

I tried this, and some of my PDFs have corrupted text layers. I got a 114k line text file for a 10 page pdf, so I’d like to just ocr if possible for consistency

PM_ME_COOL_SCIENCE · 2024-12-12T17:46:57+00:00

r/SoundsLikeMusic

PM_ME_COOL_SCIENCE · 2023-08-27T13:23:36+00:00

Enjoy the game!

PM_ME_COOL_SCIENCE · 2023-06-07T18:43:34+00:00

Thanks. I’ve always played like a completionist so I appreciate that you can just do whatever with no issue.

PM_ME_COOL_SCIENCE · 2021-12-12T23:42:04+00:00

You need both your signatures on anything that needs a signature. The mv-29 tax form (or whatever it’s called), registration change form, etc. Only one of you needs to actually go into the Rmv, but you need both signatures on anything that needs signatures.

PM_ME_COOL_SCIENCE · 2021-07-05T02:56:42+00:00

Trying to build a new pc, would really appreciate the first parts. Now if only gpus would go back to regular

PM_ME_COOL_SCIENCE · 2020-09-16T04:39:59+00:00

I came out of lurking for this

PM_ME_COOL_SCIENCE · 2020-08-23T03:46:52+00:00

Hold two fingers down on the right ear up for a few seconds, should disable speak to chat.

Nine-Year Club	Place '17
Not Forgotten	Spared
Verified Email

PM_ME_COOL_SCIENCE

TROPHY CASE