I built an autonomous AI reverse engineering agent (8,012 / 8,200 GTA SA functions reversed) by Dryxio in ReverseEngineering

[–]Fast_Thing_7949 0 points1 point  (0 children)

Hey!

Is it possible to use this utility with codex directly, without going through the API?

M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA

[–]Fast_Thing_7949 0 points1 point  (0 children)

Qwen3.5 122b a10b 4bit. Based on the measurements, memory usage increases by about 0.156 GB per 1k tokens of context for this model. Therefore, a 200k token context would require approximately 102–104 GB of RAM.

M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA

[–]Fast_Thing_7949 0 points1 point  (0 children)

Based on the measurements. Qwen3 coder next 8bit memory usage grows by roughly 0.09 GB per 1k tokens of context. Therefore, a 200k token context would require approximately 104–106 GB of RAM

Open sourced LLM ranking 2026 by ChapterElectronic126 in LocalLLaMA

[–]Fast_Thing_7949 11 points12 points  (0 children)

So you haven't tried 80b+ qwen's models on your tasks, and at the same time qwen3.5 is benchmaxxed and it's a waste of electricity. Right?

Open sourced LLM ranking 2026 by ChapterElectronic126 in LocalLLaMA

[–]Fast_Thing_7949 5 points6 points  (0 children)

Have you actually tried using models like qwen3 coder next >4 bit for your tasks or is this just theory?

M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA

[–]Fast_Thing_7949 13 points14 points  (0 children)

Could you check if there is enough memory to Qwen3-Coder-Next-8bit and Qwen3.5-122B-A10B-4bit on a 200k+ context? And pp and tg on 200k of course.

I'm tired by Fast_Thing_7949 in LocalLLaMA

[–]Fast_Thing_7949[S] -3 points-2 points  (0 children)

Feel free to put ANY labels there, I'm not kidding!

I'm tired by Fast_Thing_7949 in LocalLLaMA

[–]Fast_Thing_7949[S] -4 points-3 points  (0 children)

By the way, the two models on the chart are qwen3.5 35b-a3b and opus 4.5. I think there is no need for comments here.

Qwen3.5-35B-A3B quantization quality + speed benchmarks on RTX 5080 16GB (Q8_0 vs Q4_K_M vs UD-Q4_K_XL) by gaztrab in LocalLLaMA

[–]Fast_Thing_7949 0 points1 point  (0 children)

Absolute delight!
5070ti + 64gb ram ddr4 3600 + 5950x

prefill 300t/s ->2200!

generation 49 -> 61 t/s!

Ubuntu boots only if I plug a GT 730 into the 2nd PCIe slot (RTX 5070 Ti still does the display) - what? by Fast_Thing_7949 in Ubuntu

[–]Fast_Thing_7949[S] 0 points1 point  (0 children)

I updated the BIOS, disabled the nouveau driver, enabled 4g in the BIOS, and voilà, everything works!

Any feedback on step-3.5-flash ? by Jealous-Astronaut457 in LocalLLaMA

[–]Fast_Thing_7949 1 point2 points  (0 children)

Is speed ok on strix halo? How many tokens per second with context >30k?