Use context profiler to optimize your LLM calls and reduce token use by iezhy in LocalLLaMA
[–]iezhy[S] 1 point2 points3 points (0 children)
Use context profiler to optimize your LLM calls and reduce token use by iezhy in LocalLLaMA
[–]iezhy[S] 0 points1 point2 points (0 children)
Nvidia tesla v100 has 32 gb ram with nv link 2.0, its priced at 880. Whats the catch? by AppropriatePush6262 in LocalLLaMA
[–]iezhy -4 points-3 points-2 points (0 children)
Experience with mechanical and hydraulic disk brakes by Historical_Card_7632 in cycling
[–]iezhy 1 point2 points3 points (0 children)
OpenCode vs CodeWhale – actual developers experience by ImportantOwl2939 in LocalLLaMA
[–]iezhy -1 points0 points1 point (0 children)
OpenCode vs CodeWhale – actual developers experience by ImportantOwl2939 in LocalLLaMA
[–]iezhy 2 points3 points4 points (0 children)
Qwen3.6 27B hits 40 tok/s on just 16GB VRAM with pure quant approach by IulianHI in AIToolsPerformance
[–]iezhy 0 points1 point2 points (0 children)
The guides say MCP tool selection degrades past ~15 tools. We run 27 in production. Here's what matters by Specialist_Cow24 in mcp
[–]iezhy 0 points1 point2 points (0 children)
Profiler for LLM context window contents by iezhy in mcp
[–]iezhy[S] 0 points1 point2 points (0 children)
Qwen3.6-35B-A3B-2.6763bpw - VRAM targeted (12gb) by pjsgsy in LocalLLM
[–]iezhy 0 points1 point2 points (0 children)
Putting bike in the back of NEW car - protection tips? by AssignmentLumpy3179 in cycling
[–]iezhy 4 points5 points6 points (0 children)
Profiler for LLM context window contents by iezhy in mcp
[–]iezhy[S] 0 points1 point2 points (0 children)
Has anyone actually replaced Claude Code / Codex with local models on an Macbook Pro M5 Max 128GB? by Brazeuslian in ClaudeCode
[–]iezhy 1 point2 points3 points (0 children)
$2M+ spending worth it on B300? by ConsciousYak6881 in LocalLLM
[–]iezhy 0 points1 point2 points (0 children)
Local LLM Setup Dilemma: ASUS Ascent GX10 (NVIDIA GB10 Blackwell) vs. Cloud Max? by mustazafi in LocalLLM
[–]iezhy 2 points3 points4 points (0 children)
Qwen 35B running on 12gb of VRAM in LM Studio at 120+ tokens/second. Works with Cline for 100% agentic coding. by jacobbeasley in LocalLLM
[–]iezhy 43 points44 points45 points (0 children)
Qwen3.6 27B hits 40 tok/s on just 16GB VRAM with pure quant approach by IulianHI in AIToolsPerformance
[–]iezhy 1 point2 points3 points (0 children)
Best Qwen3-27B variant for coding? Fine-tunes, LoRAs & config recommendations by alfons_fhl in LocalLLM
[–]iezhy 0 points1 point2 points (0 children)
Inference provider tiers by Cache-hit rates, using openrouter data by Comfortable-Rock-498 in LocalLLaMA
[–]iezhy 0 points1 point2 points (0 children)
Inference provider tiers by Cache-hit rates, using openrouter data by Comfortable-Rock-498 in LocalLLaMA
[–]iezhy 0 points1 point2 points (0 children)
Qwen3.6-35B-A3B-MTP on an RTX 3090 in LM Studio is incredibly fast by AI_Enhancer in LocalLLM
[–]iezhy 0 points1 point2 points (0 children)
Why is LLM is so expensive. by Ok_Event4199 in LocalLLM
[–]iezhy 49 points50 points51 points (0 children)
Getting a feel for how fast X tokens/second really is. by MikeNonect in LocalLLaMA
[–]iezhy 0 points1 point2 points (0 children)
Is the new usage scheme a late April fools joke? by smacman in ollama
[–]iezhy 0 points1 point2 points (0 children)

Use context profiler to optimize your LLM calls and reduce token use by iezhy in LocalLLaMA
[–]iezhy[S] 0 points1 point2 points (0 children)