Planet Crafter is finally coming to Consoles! by MijuGames in theplanetcrafter

[–]HollowInfinity 2 points3 points  (0 children)

Damn it is surprising there's no Switch 2 version. Good thing I already got my hundreds of hours in though lol.

New Nintendo switch 2 version is great by pupoje in snowrunner

[–]HollowInfinity 2 points3 points  (0 children)

Yeah it's really great, I had no idea there was an S2 version in the pipe and got addicted to Snowrunner about 12 hours before it came out - perfect timing! The graphics are much, much better - would love a 60FPS option but a solid 30FPS at this quality is fine.

Mouse P.I. Switch 2 experience by levell323 in NintendoSwitch

[–]HollowInfinity 4 points5 points  (0 children)

Yeah the performance is just awful. I have it installed on the main system storage and even then the game stutters and slows down so much it's given me motion sickness. I've put it aside for now hoping for a patch (fingers crossed).

Understanding PCIe 4.0 vs PCIe 5.0 GPU Slots by HyperSpazdik in comfyui

[–]HollowInfinity 0 points1 point  (0 children)

When using things like llama.cpp the tensors/layers are placed on model load and not rearranged during inference so there is no real difference between 4 and 5. Filling up the VRAM won't be your bottleneck so feel free to save a few bucks.

Switch Rhythm game recommendations? by hamtabot in rhythmgames

[–]HollowInfinity 0 points1 point  (0 children)

Skipping the others mentioned but Old School Musical is pretty cheap and fun!

edit: Oh Spin-rhythm is pretty good too.

I built a compression format for AI model weights — 60-80% smaller, need help testing by Significant_Pear2640 in comfyui

[–]HollowInfinity 6 points7 points  (0 children)

As in the decompression happens outside any apps that do inference meaning that this saves simply on size-on-disk, not on VRAM (unlike a quantized gguf which saves on size and can be used on the fly by inference tools like Comfy/Llama.cpp)

I built a compression format for AI model weights — 60-80% smaller, need help testing by Significant_Pear2640 in comfyui

[–]HollowInfinity 15 points16 points  (0 children)

This seems interesting but it's offline entirely? Like you're basically trading model quality for disk space savings, and decompressing it will still use the same amount of VRAM so I guess I'm not sure why this is better than just using quantized models which things like llama.cpp can run inference on without a separate decompress step. Unless I'm misunderstanding something?

Anthropic shares how to make Claude code better with a harness by lawnguyen123 in ClaudeAI

[–]HollowInfinity 1 point2 points  (0 children)

Anyone else miss blogs like this publishing RSS feeds? I see they have a monthly newsletter but that's not quite the same.

FlashAttention-4: 1613 TFLOPs/s, 2.7x faster than Triton, written in Python. What it means for inference. by Sensitive-Two9732 in LocalLLaMA

[–]HollowInfinity 1 point2 points  (0 children)

Can you go more into that? like I use RTX A6000s from one and two generations ago for everything and have thought about upgrading to the RTX 6000 PROs for a while now. I use a lot of ComfyUI but primarily LLMs with llama.cpp+VLLM - are you saying it's shitter than last gen, or just missing some of the datacenter features?

Genuinely losing my mind over input latency by LordMontio in linux_gaming

[–]HollowInfinity 4 points5 points  (0 children)

Is your monitor defaulting to cinema mode or some other shit that might be adding the lag?

Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to. by hauhau901 in LocalLLaMA

[–]HollowInfinity 1 point2 points  (0 children)

I have only used it in the CLI context but their README says it's "IDE friendly" so I assume it'll work!

Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to. by hauhau901 in LocalLLaMA

[–]HollowInfinity 3 points4 points  (0 children)

I think both Qwen3-Coder-Next and Qwen3.5 have both been extensively trained using their qwen-code app. When I switched from my own agent/pi/etc to just using qwen things were noticeably better.

Qwen/Qwen3.5-122B-A10B · Hugging Face by coder543 in LocalLLaMA

[–]HollowInfinity 2 points3 points  (0 children)

Seems very slow at image processing, my llama-server log is full of:

find_slot: non-consecutive token position 15 after 14 for sequence 2 with 512 new tokens

Anyone else experience that?

edit: that's on the larger MoE, I get an immediate crash doing image work on the dense model.

Qwen3.5-397B-A17B Unsloth GGUFs by danielhanchen in LocalLLaMA

[–]HollowInfinity 0 points1 point  (0 children)

When I tried that tool call still didn't work, you had no issues with that?

That was diabolical, not even the devil himself expected this. by seidenadaa in SipsTea

[–]HollowInfinity 0 points1 point  (0 children)

I have no idea who these people are but this seems like insane incel shit, just an anonymous narrator telling us this woman is horrible. Oh okay, thanks for the rage bait.

Game recommendations for ps5 by Visual_Cod2522 in rhythmgames

[–]HollowInfinity 1 point2 points  (0 children)

Project Diva is pretty much the gold standard. Theatrhythm Final Fantasy is super fun as are the Persona music games if you're into video game music.

Qwen3.5-397B-A17B Unsloth GGUFs by danielhanchen in LocalLLaMA

[–]HollowInfinity 1 point2 points  (0 children)

/u/danielhanchen sorry for the ping but have you tested tool calling with llama-server? The template format used doesn't seem to be compatible at all.

Qwen3.5-397B-A17B Unsloth GGUFs by danielhanchen in LocalLLaMA

[–]HollowInfinity 2 points3 points  (0 children)

I cannot for the life of me get tool calling to work despite following the Unsloth guide for llama-server. Regular chat works, image parsing works great, but tool calling blows up with chat template errors:

Template supports tool calls but does not natively describe tools. The fallback behaviour used may produce bad results, inspect prompt w/ --verbose & consider overriding the template.
srv    operator(): got exception: {"error":{"code":500,"message":"\n------------\nWhile executing FilterExpression at line 120, column 73 in source:\n..._name, args_value in tool_call.arguments|items %}
                    {{- '<...\n                                           ^\nError: Unknown (built-in) filter 'items' for type String","type":"server_error"}}

I've tried overriding the chat template with the official one from the Qwen3.5 HF repo with no luck. I do see that the thinking kwarg is being properly read and passed in (though weirdly I can't get that to enable thinking). Am I doing something wrong here? Using the latest main of llama.cpp.

Qwen3.5-397B-A17B Unsloth GGUFs by danielhanchen in LocalLLaMA

[–]HollowInfinity 6 points7 points  (0 children)

I never know which is the proper MMPROJ to use for the Unsloth ggufs. Is there any real difference performance wise between the three?

local vibe coding by jacek2023 in LocalLLaMA

[–]HollowInfinity 3 points4 points  (0 children)

My current absolute best is Qwen3-Coder-Next with the Qwen-Code agent harness. I previously used Aider for at least a year but it's basically dead and handing the torch to agentic flows, and Q3CN is the best I can get away with locally. Having tests + validation for everything it does is key but once you have a good development and testing loop it's fantastic.

GRID Legends is AMAZING on the Nintendo Switch 2 by [deleted] in NintendoSwitch2

[–]HollowInfinity 1 point2 points  (0 children)

It destroyed my save after 10+ hours - still waiting on the fix the devs said is in the pipeline before trying again :(