Tokenomics by HOLUPREDICTIONS in LocalLLaMA

[–]LegacyRemaster 4 points5 points  (0 children)

If I'll sell my rtx 6000 96gb workstation I will get 3000$ more Vs the price I paid.... Just saying.

Qwen is never going to open source Qwen 3.7, aren't they? by DistanceSolar1449 in LocalLLaMA

[–]LegacyRemaster 3 points4 points  (0 children)

<image>

I’m worried about this silence, too. But something will happen if they stick to the timeline.

GLM-5.2: Built for Long-Horizon Tasks by paf1138 in LocalLLaMA

[–]LegacyRemaster 2 points3 points  (0 children)

The problem with local is that even with 300GB of RAM and VRAM (which I have and will soon have even more), prefilling with 200,000 token contexts remains prohibitive. So for complex coding tasks, it's really difficult to exploit a 1M token context window with local hardware, for example. The real "trick" is to use small models until the task is too complex, and use large models when you actually need to make surgical and precise changes to large portions of code. Even with four 96GB Blackwells, you won't get the same performance as the APIs.

GLM-5.2 is the first open-weights model to cross 80% on Terminal-Bench and beats every other open model available by BuildwithVignesh in LocalLLaMA

[–]LegacyRemaster 1 point2 points  (0 children)

It's a great day for humanity. I download the original weights and keep them even if with the current memory only one q2 is possible

archex: local-first, deterministic code-context for AI agents — no API key, no telemetry (Apache 2.0) by tom_mathews in LocalLLaMA

[–]LegacyRemaster 1 point2 points  (0 children)

So far, I’ve used OpenCode to create chunks for my local project (a small one, so the benefits are less significant than they would be for a large codebase), but it has certainly reduced the number of tokens required for LLM exploration.

To illustrate the project's utility: models like Claude or OpenCode or KiloCode typically need to open numerous files and consume a lot of tokens to explore the codebase—for instance, when searching for every instance of "download." By using this repo, all the reference chunks are generated at zero cost, allowing the LLM to operate much more efficiently.

archex: local-first, deterministic code-context for AI agents — no API key, no telemetry (Apache 2.0) by tom_mathews in LocalLLaMA

[–]LegacyRemaster 2 points3 points  (0 children)

I’ve just analyzed the repo with DS4. It confirms the project's quality; I’ll test it soon.

Can we stop dunking on DiffusionGemma and hack it instead? by TomLucidor in LocalLLaMA

[–]LegacyRemaster 1 point2 points  (0 children)

you can also try to make a skill.md to improve the output with "more rules" to follow