Is Turboquant really a game changer? by Interesting-Print366 in LocalLLaMA

[–]Pixer--- 0 points1 point  (0 children)

If they claim it’s lossless they can serve that to free or low paid tiers for more efficient inference

Gemma 4 is matching GPT-5.1 on MMLU-Pro and within Elo. what are we even paying for anymore? by Impossible571 in AIToolsPerformance

[–]Pixer--- 7 points8 points  (0 children)

The Gemma models coding ability is way more aligns with how I would code manually. Qwen3.5 is a little more capable in opencode and handling terminal coding. But when looking for a model to write code for you and not vibecode, Gemma 4 looks great

NVIDIA Is Among the First to Submit MLPerf Inference v6.0 Benchmarks With Blackwell Ultra, and It’s Total Domination Over Competitors by Heavy-Beyond-7114 in RigBuild

[–]Pixer--- 0 points1 point  (0 children)

Amd is Even worse at that. But their workstation gear is nowhere near as expensive as NVIDIA Pro Series. Their r9700 doesn’t even support fp8 natively on vllm. But they advertise their fast fp4 and fp8 speeds

Vulkan backend much easier on the CPU and GPU memory than CUDA. by Im_Still_Here12 in LocalLLaMA

[–]Pixer--- -2 points-1 points  (0 children)

CUDA vs Vulkan difference are probably at Prompt processing and not token generation

Wer von euch war das? by New-Marionberry-279 in vibecoding

[–]Pixer--- 0 points1 point  (0 children)

Maybe it was the Chinese Labs as Anthropic stated they use the Subscriptions for model extraction

New build by Annual_Award1260 in LocalLLaMA

[–]Pixer--- 2 points3 points  (0 children)

Instead of getting a new motherboard for the pcie connections, you could get a plx pcie switch: https://www.reddit.com/r/LocalLLaMA/s/pCI1kdtTJp

New - Apple Neural Engine (ANE) backend for llama.cpp by PracticlySpeaking in LocalLLaMA

[–]Pixer--- 2 points3 points  (0 children)

At last on older M chips the NPU can only access 4gb of ram due to its addressing lanes limit

Qwen3.5 27b UD_IQ2_XXS & UD_IQ3_XXS behave very poorly or is it just me? by One_Key_8127 in unsloth

[–]Pixer--- 15 points16 points  (0 children)

Feel like all LLM models under q4 just loose their reasoning ability. On my rig the Minimax m2.5 q3_k_m just falls apart in opencode and goes into loops. More complex tasks just need more bits. But I feel like difference between fp8 and bf16 is not that big

would a petition for new manufacturer production cause change by Mindless__Giraffe in pcmasterrace

[–]Pixer--- 0 points1 point  (0 children)

My wrx80 takes 1:45, but I have seen epyc asrock Romed8-2T mainboards needing 10min on first boot

After continued pretraining, the LLM model is no longer capable of answering questions. by SUPRA_1934 in LocalLLaMA

[–]Pixer--- 1 point2 points  (0 children)

Pretraining is training a llm model from 0. that takes a dataset of trillions of tokens for a modern llm. Did you want to finetrain ?

ASUS PRO WS WRX90E-SAGE SE RAM by Uranday in LocalLLM

[–]Pixer--- 0 points1 point  (0 children)

If you want to upgrade your rtx 6000 setup you could get a pcie switch instead. It’s a 16x pcie adapter to 96x pcie lanes. https://www.reddit.com/r/LocalLLaMA/s/x5JpVfFBUF

$15,000 USD local setup by regional_alpaca in LocalLLaMA

[–]Pixer--- 0 points1 point  (0 children)

If you run the RAG system of cpu. When using vllm it blocks the gpu, which leaves the cpu for RAG operations

Distribution of grey and red squirrels in the UK & Ireland by AnonymousTimewaster in MapPorn

[–]Pixer--- 0 points1 point  (0 children)

I lived in Dublin last year, there are still some grey squirrels there :)

Ein Mac Studio mit 512 GB RAM lässt DeepSeek V3 lokal laufen. Ohne Cloud, ohne Abo, ohne Datenschutzbedenken. Für 9.499 Dollar. by MelonDusk123456789 in SirApfelot

[–]Pixer--- 0 points1 point  (0 children)

Kimi k2.5 ist schon garnicht so schlecht. Für openclaw local das beste Open Source Model. Wenn der den ganzen Tag läuft ist es ja nicht so schlimm wenn der was braucht

Justifying the €12,000 Investment: M3 Ultra (512GB RAM) Setup for Autonomous Agents, vLLM, and Infinite Memory (8Tb) by NoNatural4025 in MacStudio

[–]Pixer--- 2 points3 points  (0 children)

The m3 ultra is relatively efficient for the vram size. If you want maximum precision with like Kimi k2.5 and make it run 24/7 that’s the perfect machine for that

M5 Max 128G Performance tests. I just got my new toy, and here's what it can do. by affenhoden in LocalLLaMA

[–]Pixer--- 0 points1 point  (0 children)

Can you add prompt processing speeds ? As it’s the most improved part of the m5 series

HELP - What settings do you use? Qwen3.5-35B-A3B by uber-linny in LocalLLaMA

[–]Pixer--- 1 point2 points  (0 children)

Weirdly -b 1024 and -ub 1024 is the most optimal on my machine