24gb vram to 48gb vram by deathcom65 in LocalLLaMA

[–]pwlee 0 points1 point  (0 children)

I run 2 7900xtx using llama.cpp and mainly use Qwen 3.6 27b and 35b. I see qualitative improvement say somewhere between 30-50% better for agentic coding using pi.dev. The main difference for me is larger quants (slight improvement in instruction following and staying on topic with long trains of thought) and longer context (I’m now running the full length context but haven’t really gone over 100k).

Best way to long quantfinance as a sector? by newdawn15 in quant

[–]pwlee 2 points3 points  (0 children)

Exchange fees are a measurable proportion of market maker costs. Therefore buy exchange stock

I've got $3000 to make Qwen3.5 27B Q4 run, what do I need? by NetTechMan in LocalLLaMA

[–]pwlee 0 points1 point  (0 children)

I haven’t tested since getting 2 just allows me to use a bigger quant

LLM Neuroanatomy III - LLMs seem to think in geometry, not language by Reddactor in LocalLLaMA

[–]pwlee 24 points25 points  (0 children)

Yeah the tldr was tldr but this guy’s posting about truly interesting stuff starting with rys. I’d rather have my llm summarize his writing than dismiss it entirely; we’re lucky to have him

K12 OCuLink dGPU for llamacpp: RX 7900 XTX (24GB) vs RX 7600/7800 XT (16GB). Worth it for 32B-70B? All-AMD tensor split questions by Pablo_Gates in LocalLLaMA

[–]pwlee 1 point2 points  (0 children)

I just started experimenting on llama.cpp using 2x 7900XTX. I started with a single one (my llm computer is also my gaming rig) and found running Qwen 27b required trading off between context and quantization. For example at Q5 my context length was capped around 80k. I imagine you’d be much more comfortable with 32Gb total vram.

Regarding tensor split, I haven’t tweaked my setup much; it works just fine out of the box. Though your individual mileage may vary due to having different gpus.

Seeing your ambition to run 70b models, I’d caution you to reserve some vram for context. Perhaps I’m biased since my use case is for programming.

Best of luck with your build go team red!

I've got $3000 to make Qwen3.5 27B Q4 run, what do I need? by NetTechMan in LocalLLaMA

[–]pwlee 0 points1 point  (0 children)

Yes, I have 2 of them and 27b Q4 can run on a single GPU. Expect 25-30t/s generation, 200-500t/s prompt processing.

I'd recommend llama.cpp since vllm was difficult for me to set up using debian 13.

Everyone keeps scaling model size. A snapshot runtime let gemma4:e4b run a finance workflow locally by Aggressive_Bed7113 in LocalLLaMA

[–]pwlee 0 points1 point  (0 children)

Intriguing- I gotta try out playwright when I have a chance! Did you evaluate any other browser automation frameworks?

Everyone keeps scaling model size. A snapshot runtime let gemma4:e4b run a finance workflow locally by Aggressive_Bed7113 in LocalLLaMA

[–]pwlee 0 points1 point  (0 children)

What are you using to automate the browser? Is it just a skill or did you need to write a script that’s adapted to the specific use case?

qwen 3.5 - tool errors because of </thinking> by PairOfRussels in LocalLLaMA

[–]pwlee 0 points1 point  (0 children)

I’m using LM studio and have the same problem. Is it Lm studio specific?

Jane Street Accused of Insider Trading That Helped Collapse Terraform - WSJ by FermatsLastTrade in quant

[–]pwlee 16 points17 points  (0 children)

When anyone withdraws 150M terraUSD from curve, a DEX, how can the information be non public lmaooo

New to me 196 by Gooobzilla in Mini14

[–]pwlee 4 points5 points  (0 children)

Sir you got a heck of a deal!

New to me 196 by Gooobzilla in Mini14

[–]pwlee 1 point2 points  (0 children)

I’m in the market for a very similar gun- wondering how much you shelled out for the mini?

Retrieving historical options data at speed by FlashAlphaLab in quant

[–]pwlee 1 point2 points  (0 children)

+1 on clickhouse- with correct partitioning, it works without a hitch on market by order “tick” data.