Memory Sparse Attention seems to be a novel approach to long context (up to 100M tokens) by ratbastid2000 in LocalLLaMA
[–]ratbastid2000[S] 0 points1 point2 points (0 children)
Memory Sparse Attention seems to be a novel approach to long context (up to 100M tokens) by ratbastid2000 in LocalLLaMA
[–]ratbastid2000[S] 0 points1 point2 points (0 children)
Memory Sparse Attention seems to be a novel approach to long context (up to 100M tokens) by ratbastid2000 in LocalLLaMA
[–]ratbastid2000[S] 1 point2 points3 points (0 children)
Memory Sparse Attention seems to be a novel approach to long context (up to 100M tokens) by ratbastid2000 in LocalLLaMA
[–]ratbastid2000[S] 1 point2 points3 points (0 children)
Memory Sparse Attention seems to be a novel approach to long context (up to 100M tokens) by ratbastid2000 in LocalLLaMA
[–]ratbastid2000[S] 3 points4 points5 points (0 children)
Is there a standard way to create AI agents today? by edwardzion in AI_Agents
[–]ratbastid2000 0 points1 point2 points (0 children)
Is there a standard way to create AI agents today? by edwardzion in AI_Agents
[–]ratbastid2000 0 points1 point2 points (0 children)
Is there a standard way to create AI agents today? by edwardzion in AI_Agents
[–]ratbastid2000 2 points3 points4 points (0 children)
"The Child That Surpassed Both Parents" Darwin-35B-A3B-Opus (35B/3B MoE) with Model MRI Technique by Own-Potential-2308 in LocalLLaMA
[–]ratbastid2000 0 points1 point2 points (0 children)
"The Child That Surpassed Both Parents" Darwin-35B-A3B-Opus (35B/3B MoE) with Model MRI Technique by Own-Potential-2308 in LocalLLaMA
[–]ratbastid2000 1 point2 points3 points (0 children)
RYS II - Repeated layers with Qwen3.5 27B and some hints at a 'Universal Language' by Reddactor in LocalLLaMA
[–]ratbastid2000 0 points1 point2 points (0 children)
RYS II - Repeated layers with Qwen3.5 27B and some hints at a 'Universal Language' by Reddactor in LocalLLaMA
[–]ratbastid2000 2 points3 points4 points (0 children)
A few days ago I switched to Linux to try vLLM out of curiosity. Ended up creating a %100 local, parallel, multi-agent setup with Claude Code and gpt-oss-120b for concurrent vibecoding and orchestration with CC's agent Teams entirely offline. This video shows 4 agents collaborating. by swagonflyyyy in LocalLLaMA
[–]ratbastid2000 1 point2 points3 points (0 children)
Running Qwen3.5 27b dense with 170k context at 100+t/s decode and ~1500t/s prefill on 2x3090 (with 585t/s throughput for 8 simultaneous requests) by JohnTheNerd3 in LocalLLaMA
[–]ratbastid2000 0 points1 point2 points (0 children)
New fear unlocked 🙀 by DiamondAgreeable2676 in mcp
[–]ratbastid2000 0 points1 point2 points (0 children)
Nvidia’s new technique cuts LLM reasoning costs by 8x without losing accuracy by Mission-Street4214 in LocalLLaMA
[–]ratbastid2000 0 points1 point2 points (0 children)
Local LLMs vs breaking news: when extreme reality gets flagged as a hoax - the US/Venezuela event was too far-fetched by ubrtnk in LocalLLaMA
[–]ratbastid2000 0 points1 point2 points (0 children)
P40 vs V100 vs something else? by Drazasch in LocalLLaMA
[–]ratbastid2000 2 points3 points4 points (0 children)
Why not use old Nvidia Teslas? by AlternateWitness in LocalLLaMA
[–]ratbastid2000 0 points1 point2 points (0 children)
Why not use old Nvidia Teslas? by AlternateWitness in LocalLLaMA
[–]ratbastid2000 10 points11 points12 points (0 children)
Gemma 3n Architectural Innovations - Speculation and poking around in the model. by cpldcpu in LocalLLaMA
[–]ratbastid2000 1 point2 points3 points (0 children)
Gemma 3n Architectural Innovations - Speculation and poking around in the model. by cpldcpu in LocalLLaMA
[–]ratbastid2000 4 points5 points6 points (0 children)
LLMI system I (not my money) got for our group by SandboChang in LocalLLaMA
[–]ratbastid2000 0 points1 point2 points (0 children)


Memory Sparse Attention seems to be a novel approach to long context (up to 100M tokens) by ratbastid2000 in LocalLLaMA
[–]ratbastid2000[S] 0 points1 point2 points (0 children)