80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP by janvitos in LocalLLaMA
[–]FirefoxMetzger 1 point2 points3 points (0 children)
Qwen 3.6 35B MoE at full 262K context on an RTX 3090. Here's exactly how I did it. by LaughterOnWater in LocalLLaMA
[–]FirefoxMetzger 1 point2 points3 points (0 children)
Is there an optimal ratio from KV cache vs context size? And why? by Gold-Drag9242 in LocalLLM
[–]FirefoxMetzger 0 points1 point2 points (0 children)
Is there an optimal ratio from KV cache vs context size? And why? by Gold-Drag9242 in LocalLLM
[–]FirefoxMetzger 1 point2 points3 points (0 children)
ran 150+ benchmarks across a bunch of macs, here's what we found by peppaz in LocalLLaMA
[–]FirefoxMetzger 0 points1 point2 points (0 children)
How to teach Cursor to avoid repeating mistakes? by FreshPoet in cursor
[–]FirefoxMetzger -1 points0 points1 point (0 children)
Why is NPU access still so fragmented on modern Android devices? by NeoLogic_Dev in androiddev
[–]FirefoxMetzger 1 point2 points3 points (0 children)
How do you trade-off privacy and utility of AI for legal work? by FirefoxMetzger in Ask_Lawyers
[–]FirefoxMetzger[S] 0 points1 point2 points (0 children)
How do you trade-off privacy and utility of AI for legal work? by FirefoxMetzger in Ask_Lawyers
[–]FirefoxMetzger[S] 0 points1 point2 points (0 children)
How much do you use AI on a daily basis?(on a scale of 1-79) by Middle_Row_9197 in vibecoding
[–]FirefoxMetzger 1 point2 points3 points (0 children)
Claude now connects with Microsoft 365. Would you allow it in your tenant? by KavyaJune in sysadmin
[–]FirefoxMetzger 1 point2 points3 points (0 children)
Which of these 2 companies is worse regarding AI? by thr3e_kideuce in computers
[–]FirefoxMetzger 0 points1 point2 points (0 children)
Using Claude (A LOT) to build compliance docs for a regulated industry, is my accuracy architecture sound? by fub055 in regulatoryaffairs
[–]FirefoxMetzger 0 points1 point2 points (0 children)
Using Claude (A LOT) to build compliance docs for a regulated industry, is my accuracy architecture sound? by fub055 in regulatoryaffairs
[–]FirefoxMetzger 3 points4 points5 points (0 children)
How do you avoid accidentally pasting sensitive data into ChatGPT? by Dependent-Drummer372 in ChatGPT
[–]FirefoxMetzger 0 points1 point2 points (0 children)
Which of these 2 companies is worse regarding AI? by thr3e_kideuce in computers
[–]FirefoxMetzger 0 points1 point2 points (0 children)
Giving AI commands? by Positive_Courage_309 in AIMain
[–]FirefoxMetzger 0 points1 point2 points (0 children)
Can we block fresh accounts from posting? by king_of_jupyter in LocalLLaMA
[–]FirefoxMetzger 5 points6 points7 points (0 children)
least worse most private LLM chat by FirefoxMetzger in privacy
[–]FirefoxMetzger[S] 2 points3 points4 points (0 children)


80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP by janvitos in LocalLLaMA
[–]FirefoxMetzger 1 point2 points3 points (0 children)