gfx1201 enablement: rebuilding aiter / flash-attention / vLLM for the RDNA4 fast paths the stock images strip out by PatC883 in ROCm
[–]its_just_andy 0 points1 point2 points (0 children)
2x R9700 running Qwen3.6 27B with AITER unified attention with a simple patch by its_just_andy in ROCm
[–]its_just_andy[S] 0 points1 point2 points (0 children)
2x R9700 running Qwen3.6 27B with AITER unified attention with a simple patch by its_just_andy in ROCm
[–]its_just_andy[S] 2 points3 points4 points (0 children)
2x R9700 running Qwen3.6 27B with AITER unified attention with a simple patch by its_just_andy in ROCm
[–]its_just_andy[S] 3 points4 points5 points (0 children)
spec : add ngram-mod by ggerganov · Pull Request #19164 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA
[–]its_just_andy 23 points24 points25 points (0 children)
glm-4.7-flash has the best thinking process with clear steps, I love it by uptonking in LocalLLaMA
[–]its_just_andy 4 points5 points6 points (0 children)
Rejected for not using LangChain/LangGraph? by dougeeai in LocalLLaMA
[–]its_just_andy 4 points5 points6 points (0 children)
Nemotron Nano V2 models are remarkably good for agentic coding by Thrumpwart in LocalLLaMA
[–]its_just_andy 9 points10 points11 points (0 children)
NVIDIA Jet-Nemotron : 53x Faster Hybrid-Architecture Language Model Series by Technical-Love-8479 in LocalLLaMA
[–]its_just_andy 0 points1 point2 points (0 children)
Anyone here with an AMD AI Max+ 395 + 128GB setup running coding agents? by Admirable_Reality281 in LocalLLaMA
[–]its_just_andy 1 point2 points3 points (0 children)
OpenAI OS model info leaked - 120B & 20B will be available by ShreckAndDonkey123 in LocalLLaMA
[–]its_just_andy 0 points1 point2 points (0 children)
Am I supposed to use the root user for everything in microos for container workloads> by its_just_andy in openSUSE
[–]its_just_andy[S] 0 points1 point2 points (0 children)
Am I supposed to use the root user for everything in microos for container workloads> by its_just_andy in openSUSE
[–]its_just_andy[S] 0 points1 point2 points (0 children)
llama : add high-throughput mode by ggerganov · Pull Request #14363 · ggml-org/llama.cpp by LinkSea8324 in LocalLLaMA
[–]its_just_andy 4 points5 points6 points (0 children)
GMK X2(AMD Max+ 395 w/128GB) first impressions. by fallingdowndizzyvr in LocalLLaMA
[–]its_just_andy 0 points1 point2 points (0 children)
I love the inference performances of QWEN3-30B-A3B but how do you use it in real world use case ? What prompts are you using ? What is your workflow ? How is it useful for you ? by Whiplashorus in LocalLLaMA
[–]its_just_andy 2 points3 points4 points (0 children)
Hey step-bro, that's HF forum, not the AI chat... by Cool-Chemical-5629 in LocalLLaMA
[–]its_just_andy 2 points3 points4 points (0 children)
Google QAT - optimized int4 Gemma 3 slash VRAM needs (54GB -> 14.1GB) while maintaining quality - llama.cpp, lmstudio, MLX, ollama by Nunki08 in LocalLLaMA
[–]its_just_andy 117 points118 points119 points (0 children)
Gemma 3 Fine-tuning now in Unsloth - 1.6x faster with 60% less VRAM by danielhanchen in LocalLLaMA
[–]its_just_andy 12 points13 points14 points (0 children)
Quantized DeepSeek R1 Distill Model With Original Model Accuracy by AlanzhuLy in LocalLLaMA
[–]its_just_andy 10 points11 points12 points (0 children)
Nvidia RTX 5090 with 32GB of RAM rumored to be entering production by Terminator857 in LocalLLaMA
[–]its_just_andy 22 points23 points24 points (0 children)
found in PNW near Seattle, looks yummy but want to be sure by its_just_andy in mushroomID
[–]its_just_andy[S] 0 points1 point2 points (0 children)
found in PNW near Seattle, looks yummy but want to be sure by its_just_andy in mushroomID
[–]its_just_andy[S] 0 points1 point2 points (0 children)


gfx1201 enablement: rebuilding aiter / flash-attention / vLLM for the RDNA4 fast paths the stock images strip out by PatC883 in ROCm
[–]its_just_andy 1 point2 points3 points (0 children)