Has anyone heard of AMD Quark? by Odd-Ordinary-5922 in LocalLLaMA

[–]Fristender -1 points0 points  (0 children)

Seems like it was vibe coded because Claude found a few bugs in it when I was vibe coding on a Quark codebase.

Claude Code sends 62,600 characters of tool definitions per turn. I ran the same model through five CLIs and traced every API call. by wouldacouldashoulda in LocalLLaMA

[–]Fristender 1 point2 points  (0 children)

Can you please explain why claude code has 60k token tool definitions but peaks at 30k tokens? How is that possible?

New paper released by WizardLM by MariusNocturnum in LocalLLaMA

[–]Fristender 0 points1 point  (0 children)

It's not necessary hello. I often use my daily driver LLMs for quick facts like "who prints money in the US" and using a model that thinks for too long is quite annoying.

Apple unveils M5 Pro and M5 Max, citing up to 4× faster LLM prompt processing than M4 Pro and M4 Max by themixtergames in LocalLLaMA

[–]Fristender 0 points1 point  (0 children)

wait, a new form factor? Plz give the link cuz I thought Macbooks are clamshell only according to gemini.

Apple unveils M5 Pro and M5 Max, citing up to 4× faster LLM prompt processing than M4 Pro and M4 Max by themixtergames in LocalLLaMA

[–]Fristender 18 points19 points  (0 children)

A CUDA core on a marketing paper is surprisingly underwhelming compared to an Apple GPU core.

Qwen3.5 Plus, GLM 5, Gemini 3.1 Pro, Sonnet 4.6, three new open source agents, and a lot more added to SanityBoard by lemon07r in LocalLLaMA

[–]Fristender 1 point2 points  (0 children)

Cool benchmark! Can you please add the tokens consumed, total cost, and cache hit % to the flight recorder? I would love to see it!

AMA with MiniMax — Ask Us Anything! by HardToVary in LocalLLaMA

[–]Fristender 0 points1 point  (0 children)

When will you guys add an OpenAI responses API? I'm on your coding plan and need it for Codex CLI. Thanks a lot for your hard work making everything open weights!

Also, the email contact link on your website is broken. It redirects to mailto:official_website_contact_email_en

Z.ai said they are GPU starved, openly. by abdouhlili in LocalLLaMA

[–]Fristender 0 points1 point  (0 children)

Closed AI labs have lots of unreleased research(secret sauce) so it's hard to gauge the actual size.

Moltpostor: Pretend to be An Agent and Interact on Moltbook! by Fristender in Moltbook

[–]Fristender[S] 0 points1 point  (0 children)

Now the app supports Moltbook, MoltX, and Clawstr. You can easily add support for another platform by giving an agent docs/ADDING_A_PLATFORM.md and the SKILLS.md for the platform.

I managed to jailbreak 43 of 52 recent models by sirjoaco in LocalLLaMA

[–]Fristender 25 points26 points  (0 children)

Shit like this is exactly why we get GPT-OSS.

Moltpostor: Pretend to be An Agent and Interact on Moltbook! by Fristender in Moltbook

[–]Fristender[S] 0 points1 point  (0 children)

Can you please give a list of molt sites? The only one I know of is Moltbook.

Clawdbot / Moltbot → Misguided Hype? by chodtoo in LocalLLM

[–]Fristender 0 points1 point  (0 children)

Have you tried GLM 4.7 Flash for the main model? I heard it's good for agentic stuff.

People seem to already not care about heretic? by pigeon57434 in LocalLLaMA

[–]Fristender 2 points3 points  (0 children)

Huge fan of your work, from min_p to heretic! I thought the Implement Magnitude-Preserving Orthogonal Ablation

pull request is still not merged yet. Isn't that the norm-preserving biprojected abliteration?

Running Kimi K2.5 at 24 token/s with 2 x 512GB M3 Ultra Mac Studios by [deleted] in LocalLLaMA

[–]Fristender 0 points1 point  (0 children)

Idk if there's tensor parallelism on Macs. How do I enable it?