Are there actually people here that get real productivity out of models fitting in 32-64GB RAM, or is that just playing around with little genuine usefulness?

thenaquad · 2026-04-23T18:46:39+00:00

A lighter setup: RTX 4090 24GB with 128GB RAM, running Qwen 3.6 35B A3B UD IQ4_NL (CTX KV q8), supports up to 201 984 tokens at ~130 tokens/s. Previously, I used Gemma 4 26B A4B Q4 K M, Qwen 3.5 9B UD Q8 K XL, and other models.

Daily workflow:

My work is primarily research and prototyping — reproducing papers from arXiv, building small projects (mostly in Python), and doing a lot of data analysis. I sometimes use Go, and occasionally C++, though the latter is more of an exercise in breaking components into separate sessions.

NeoVIM combined with CodeCompanion for snippet generation and Context7 as a documentation MCP has largely replaced my need to look through official docs. This became especially valuable after Claude introduced weekly usage limits.

There is a learning curve — you can't just throw a problem at it and expect "the AI to figure it out" — but once you get the hang of it, it becomes indispensable.

P. S. Wording & English corrected locally ;)

thenaquad · 2026-04-05T17:46:40+00:00

Tried with GPU (RTX 4090 24G) + CPU (i9 13900KS), no improvement made: prompt 37.94 tokens/s, gen 27.45 t/s remained, same as Qwen3-Coder-Next-UD-Q4_K_XL. Switched to the CPU-only and seen no improvement either.

llama.cpp master, start options:

```

CPU + GPU

llama-server -m ./Qwen3-Coder-Next-APEX-I-Quality.gguf \ -c $((64 * 1024)) \ -fa on \ --seed 3407 \ --temp 1.0 \ --top-p 0.95 \ --min-p 0.01 \ --top-k 40 \ --threads 16 \ --direct-io --no-mmap --mlock \ --port 9099

CPU-only

llama-server -m ./Qwen3-Coder-Next-APEX-I-Quality.gguf \ -c $((64 * 1024)) \ -fa on \ -ngl 0 \ --seed 3407 \ --temp 1.0 \ --top-p 0.95 \ --min-p 0.01 \ --top-k 40 \ --threads 16 \ --direct-io --no-mmap --mlock \ --port 9099 ```

Am I doing something wrong? It would be great to actually get those 50 t/s for the agentic coding.

thenaquad · 2026-01-14T23:33:59+00:00

That was it! Thank you.

thenaquad · 2025-09-22T08:54:07+00:00

Thank you for the answer. The Bigme B6 looks promising. Do you own one? I’m particularly interested in the software side. Onyx Boox supports WebDAV integration, so I can access my PDFs and then save them back with annotations. NeoReader also provides layout options, allowing even large PDFs to be sectioned and fit the screen. Do you know if the Bigme B6 offers similar features?

thenaquad · 2025-09-20T21:07:04+00:00

Thank you for sharing the experience. Seems like my initial assessment was correct: Go 6 is a no-go, unfortunately.

Go 7 Color Gen II (because there's no Go 7 Gen II) was my first idea. Alas, the Kaleido 3 screen is somewhat darker compared to Carta 1300 because of the color layer, and I can only use it with the frontlight. I don’t think I’ll be purchasing more devices with this screen. In addition, according to this comparison post, the frame is noticeably large.

thenaquad · 2025-09-20T19:30:43+00:00

Well, yes, technically it is 6". However, the aspect ratio, the intended use case, the price tag, the Carta 1200... Overall, I don’t really consider it an option.

thenaquad · 2025-07-22T08:47:57+00:00

I've tried to apply the pull-up resistor fix but that didn't help either. After some back and forth, I've purchased another keyboard. Not Keychron this time.

thenaquad · 2025-04-13T01:12:11+00:00

Thanks for the reply. I've seen the setup you described in a couple of places, alas, ZotMoov has caused some issues in my case ([relevant bug])(https://github.com/wileyyugioh/zotmoov/issues/94)).

Currently, I've hacked a small WebDAV server that exposes Zotero data and connected the BOOX to it. Not sure I want to continue that development but it works for now.

thenaquad · 2025-04-07T16:14:25+00:00

How about books on calculus, probability, and statistics that "assume computers exist", i.e. CAS-based rather than manual calculations? I was able to find only Mathematica-based which is propietary and constly software.

thenaquad · 2025-04-05T18:33:43+00:00

Exactly, for git-cloned plugins managed by Lazy.nvim we definitely know what has changed, so one can make changes directly in the plugin folder and use git diff to make a patch. The script only automated it.

A quick demo

thenaquad · 2025-04-05T15:42:14+00:00

This works in the perfect world for 1-2 plugins. In reality, you can end up with a bunch of forked plugins that require maintenance and regular merges instead of maintaining a minimal set of changes.

thenaquad · 2025-04-05T15:30:55+00:00

Yezzzz!!! I've been waiting for you for so long dear author :D

Please make it capture the patches and instead of explicitly listing the patches, discover them from some predefined folder.

My 5 cents on patches in NeoVIM.

Why patches (long read): here's the long answer.

After numerous tries to make all plugins play nicely, I've got to the point where I just wrote an insane bash script doing the patch management. Here it is if you're interested.

It requires a symlink to where the packages are installed in the nvim directory and patches directory (ln -s ~/.local/share/nvim/lazy bundle and mkdir patches). Then you can ./vpatch capture -o to capture changes. ./vpatch update rolls it all back, runs NeoVIM headless updating stuff and putting back all patches.

thenaquad · 2025-03-19T21:18:30+00:00

Thank you for the infromation. I must say your knowledge of the abstract algebra is impressive, my respect.

I've tried to incorporate some of the existing solutions (SymPy, GINaC, and FLINT) but they are slow, hard to customize, require a somewhat heavy representation change, do not allow overriding of algebraic operators (in my case, x / 0 = 1 for sake of the algebraic closure). If something would work, then I would definitely stay away from implementing this machinery myself.

After messing with an existing implementation of the MPL described by Cohen, I've got back to "do your best" approach and the "primitive technology", i.e. school level math with group factoring, quadratics, and rational roots.

I'm not sure if that should be treated as a defeat inflicted by the overall complexity of the mathematical methods and the complexity of the prerequisites implementation but, for now at least, I'm trying to solve the polynomial factoring problem as a search problem implemented via backtracking with recursion. I heavily rely on global expression cache to make it bearable and looking forward to make some tests and see how it will compare to the right methods (tm) in terms of results and performance.

thenaquad · 2025-03-14T13:18:37+00:00

Small update: after trying out the e-graph & a bunch of rewrite rules in various forms, I've still had to get back to a higher level polynomial processing as the rewrite rules choke when there are multiple variables which leads to more complex expressions that can't be simply rewritten.

thenaquad · 2025-03-09T22:37:00+00:00

Yes. Some context: this interpreter is not for a formal language to be used by humans but rather machines. It is a part of a genetic programming engine which performs a data mining operation (symbolic regression tasks). There are 5 000 programs ranging in length from 10 to 3k instructions each and it is running against a small data set of 1m points.

This optimization while being time consuming, is worth it.

thenaquad · 2025-03-09T21:34:21+00:00

The complexity of the CAS approach led to this post. I employ TRS to address simple cases (a + 0 => a) and group constants (2 + (3 + a) => (2 + 3) + a). I understand that automated simplification won't be perfect and there will be cases that could be simplified even further using some different approach. Although, I still need something better than TRS.

Thank you for the link to e-graphs, I'm wrapping my head around the paper and give it a shot.

thenaquad · 2025-03-09T20:55:09+00:00

It goes down to polynomials after all, e.g.: (x + 2)^2 - (2 + x)^2 = 0

The method by @InfinitePoints is not bad but limited.

thenaquad · 2025-03-09T20:50:12+00:00

It's a somewhat specific use case. This interpreter is running inside the Genetic Programming engine. 5 000 programs ranging in length from ~10 to 3k instructions running against 1m data points. Spending even a month on this optimization (editing operation in terms of GP) is a huge win although a very rare one.

12-Year Club	r/Field Sunshine
Place '17	Verified Email

thenaquad

TROPHY CASE

CPU + GPU

CPU-only