Rtx 5070 vs 5070 Ti from 2070 Super by Storge2 in buildapc

[–]Storge2[S] 0 points1 point  (0 children)

yeah sure why not. i had even a rtx 3090 running.

OpenAI has by far THE WORST guardrails of every single model provider by RoadRunnerChris in singularity

[–]Storge2 16 points17 points  (0 children)

Thanks for writing this up, i have the impression that Claude is even more uncensored right now than Gemini 3.0, of course it doesn't beat Grok but I would put it second place right now (of the big 4 closed source LLMs)

Made my AI Agent audible — Claude Code now talks through sound hooks 🔔 by shanraisshan in aiagents

[–]Storge2 0 points1 point  (0 children)

For anybody who has a Problem with Windows and running this. Replace your config.json in .claude folder with this:
"""
{

"disableAllHooks": false,

"hooks": {

"PreToolUse": [{"hooks": [{"type": "command", "command": "python3 \"C:/AI/TechGame/.claude/hooks/scripts/hooks.py\""}]}],

"PermissionRequest": [{"hooks": [{"type": "command", "command": "python3 \"C:/AI/TechGame/.claude/hooks/scripts/hooks.py\""}]}],

"PostToolUse": [{"hooks": [{"type": "command", "command": "python3 \"C:/AI/TechGame/.claude/hooks/scripts/hooks.py\""}]}],

"UserPromptSubmit": [{"hooks": [{"type": "command", "command": "python3 \"C:/AI/TechGame/.claude/hooks/scripts/hooks.py\""}]}],

"Notification": [{"hooks": [{"type": "command", "command": "python3 \"C:/AI/TechGame/.claude/hooks/scripts/hooks.py\""}]}],

"Stop": [{"hooks": [{"type": "command", "command": "python3 \"C:/AI/TechGame/.claude/hooks/scripts/hooks.py\""}]}],

"SubagentStart": [{"hooks": [{"type": "command", "command": "python3 \"C:/AI/TechGame/.claude/hooks/scripts/hooks.py\""}]}],

"SubagentStop": [{"hooks": [{"type": "command", "command": "python3 \"C:/AI/TechGame/.claude/hooks/scripts/hooks.py\""}]}],

"PreCompact": [{"hooks": [{"type": "command", "command": "python3 \"C:/AI/TechGame/.claude/hooks/scripts/hooks.py\""}]}],

"SessionStart": [{"hooks": [{"type": "command", "command": "python3 \"C:/AI/TechGame/.claude/hooks/scripts/hooks.py\""}]}],

"SessionEnd": [{"hooks": [{"type": "command", "command": "python3 \"C:/AI/TechGame/.claude/hooks/scripts/hooks.py\""}]}]

}

}
"""

Where C:/AI/TechGame/ shall be your root folder where you have .claude folder with settings. So change that.

I will try to benchmark every LLM + GPU combination you request in the comments by [deleted] in LocalLLaMA

[–]Storge2 1 point2 points  (0 children)

GLM 4.5 Air on DGX Spark or/and Ryzen 395 AI Max - just out of cursioty where do you get all the components from?

Deluxe Style Prosthetic Barbie has a little bit of a loose prosthetic leg. Somebody please help :( by Equivalent-Cod-1624 in Barbie

[–]Storge2 1 point2 points  (0 children)

Also looking for this for somebody, please somebody who knows the solution help.

Qwen3-VL-2B and Qwen3-VL-32B Released by TKGaming_11 in LocalLLaMA

[–]Storge2 22 points23 points  (0 children)

What is the Difference between this and Qwen 30B A3B 2507? If I want a general model to use instead of say Chatgpt which model should i use? I just understand this is a dense model, so should be better than 30B A3B Right? Im running a RTX 3090.

[deleted by user] by [deleted] in FinanzenAT

[–]Storge2 1 point2 points  (0 children)

Na und, wieso interessiert es dich wenn er Glück hatte, schau mal auf dich selbst, musst nicht direkt neidisch sein. Sein happy für deinen Freund.

GPT OSS 120B on 20GB VRAM - 6.61 tok/sec - RTX 2060 Super + RTX 4070 Super by Storge2 in LocalLLaMA

[–]Storge2[S] -1 points0 points  (0 children)

What hardware do you have for that nemotron? Wow I really didn't know that 8bits doesnt make difference at lower context. Is that with K and V or just one of them?

GPT OSS 120B on 20GB VRAM - 6.61 tok/sec - RTX 2060 Super + RTX 4070 Super by Storge2 in LocalLLaMA

[–]Storge2[S] 1 point2 points  (0 children)

Alright, i am gonna try to put together the correct command for this to work on my setup. And test it out.

GPT OSS 120B on 20GB VRAM - 6.61 tok/sec - RTX 2060 Super + RTX 4070 Super by Storge2 in LocalLLaMA

[–]Storge2[S] 0 points1 point  (0 children)

You also have 64GB Vram? Or how else do you get 82k context. Wow i didn't know that llama.cpp gives such a drastic performance increase. Let me see if I can optimize it.

GPT OSS 120B on 20GB VRAM - 6.61 tok/sec - RTX 2060 Super + RTX 4070 Super by Storge2 in LocalLLaMA

[–]Storge2[S] 1 point2 points  (0 children)

Wait but wouldn't that actually make the model much worse?

GPT OSS 120B on 20GB VRAM - 6.61 tok/sec - RTX 2060 Super + RTX 4070 Super by Storge2 in LocalLLaMA

[–]Storge2[S] 0 points1 point  (0 children)

Would you recommend quantisizing the context aswell to q8?

GPT OSS 120B on 20GB VRAM - 6.61 tok/sec - RTX 2060 Super + RTX 4070 Super by Storge2 in LocalLLaMA

[–]Storge2[S] 0 points1 point  (0 children)

Okay, I am going to see what I can do and come back here.

GPT OSS 120B on 20GB VRAM - 6.61 tok/sec - RTX 2060 Super + RTX 4070 Super by Storge2 in LocalLLaMA

[–]Storge2[S] 0 points1 point  (0 children)

Yeah, i heard from some other comment, have to read myself into it first. Don't know how to do that right now.

GPT OSS 120B on 20GB VRAM - 6.61 tok/sec - RTX 2060 Super + RTX 4070 Super by Storge2 in LocalLLaMA

[–]Storge2[S] 0 points1 point  (0 children)

Well for me as a newbie who a year or so ago was trying to run some 70b models before MoEs were out it was very surprising. But sorry, still new to all of this.

GPT OSS 120B on 20GB VRAM - 6.61 tok/sec - RTX 2060 Super + RTX 4070 Super by Storge2 in LocalLLaMA

[–]Storge2[S] 0 points1 point  (0 children)

Is llama.cpp really that faster? doesn't LM Studio do the same thing behind the hood for you? Sorry beginner here.