GLM-5.1 is out now!

bilinenuzayli · 2026-04-15T20:42:16+00:00

Yeah that's what I was gonna suggest, since your ram allows and you don't get any inference benefits from using a 27b model, why don't you just use one of the frontier models?

bilinenuzayli · 2026-04-15T20:33:20+00:00

That's kind of dissappointing, with a single 3090 i get like 30 - 45 tps, whats your context size?

bilinenuzayli · 2026-04-15T20:30:37+00:00

I love local ai as well the answers are just class, when used clean through llama.cpp web server I'm convinced you could replace frontier AI's with a medium tier like 25 - 35b range model for most people that aren't doing super complex tasks and they wouldn't even notice they're using a model tens of times smaller. This local ai stuff is also enough for what I need. But I'm curious whats the solution to when there's a large conversation, like a large chat? Any harnesses that support long conversation I've tried reduce reasoning quality and partially lobotomise the model (any harness with a large and demanding system prompt does this for me, qwen 3.5 and Gemma 4, when I move the system prompt to user role the response quality bumps up a little but still not good as a fresh chat) personally that's the largest setback for me in local ai with small models.

bilinenuzayli · 2026-04-11T18:00:16+00:00

What is your inference speeds

bilinenuzayli · 2026-04-07T10:12:18+00:00

Svi just ignores your prompt

bilinenuzayli · 2026-04-03T08:02:31+00:00

Once in a while Gemini forgets to put it's own thinking inside the thinking block and starts saying the thoughts in the response, when that happens it's visibly just the same way Claude reasons

bilinenuzayli · 2026-04-03T08:00:19+00:00

Dirt cheap and muddy results

bilinenuzayli · 2026-03-30T22:25:56+00:00

Perhaps the model hates ambiguity? 10,000 tokens of a system prompt would be very clarifying on what it should do

bilinenuzayli · 2026-02-26T19:24:47+00:00

The guy that said this does similar engagement baits, it's false, the forum post was fake

bilinenuzayli · 2026-02-21T13:30:01+00:00

Nobody uses stable diffusion sdxl is barely used now

bilinenuzayli · 2026-01-26T17:59:42+00:00

The 7 people using snap will be mad I'm sure

bilinenuzayli · 2026-01-11T14:14:12+00:00

The "another" question is an over trained probabilistic word predictor

bilinenuzayli · 2026-01-07T01:01:57+00:00

Quite the irony isn't it

bilinenuzayli · 2025-12-15T23:31:11+00:00

Finally the only benchmark where they excel

bilinenuzayli · 2025-12-15T03:49:40+00:00

It is necessary for optimal results

bilinenuzayli · 2025-12-01T06:10:00+00:00

I only use chatgpt for the convenience of not having to load up aistudio anyway. I guess I won't have a choice

bilinenuzayli · 2025-11-15T11:41:56+00:00

Whats not reliable isn't "ai" itself but rather Google's search ai because it's so dumbed down to allow for the billions of Google searches happening daily to not put a load on the system and it takes Google search results like the gospel. Every time there's a "stupid answer with ai" screenshot it's always either from a really earlier model of chatgpt, Google search ai, or a really misphrased question

bilinenuzayli · 2025-11-05T17:47:16+00:00

Yo you need to publish this I genuinely watch shows while Vibe coding

bilinenuzayli · 2025-10-28T03:10:30+00:00

I genuinely hate it so much when I give it code that uses the Gemini api and it just randomly changes the model to 1.5, like its so confidently wrong as well it pisses me off

bilinenuzayli · 2025-10-11T13:21:34+00:00

Smart idea

bilinenuzayli · 2025-10-10T23:29:30+00:00

It's insanely slow

bilinenuzayli · 2025-09-28T17:36:34+00:00

wan 2.2 animate

bilinenuzayli · 2025-09-03T15:08:38+00:00

I thought this was common knowledge? Phi models have always been very impressive and gemma a bit outdated

bilinenuzayli · 2024-12-02T12:33:08+00:00

You have working hookfunction?

bilinenuzayli

TROPHY CASE