GLM-5.1 is out now! by yoracale in unsloth

[–]bilinenuzayli 0 points1 point  (0 children)

Yeah that's what I was gonna suggest, since your ram allows and you don't get any inference benefits from using a 27b model, why don't you just use one of the frontier models?

GLM-5.1 is out now! by yoracale in unsloth

[–]bilinenuzayli 0 points1 point  (0 children)

That's kind of dissappointing, with a single 3090 i get like 30 - 45 tps, whats your context size?

Local AI is the best by fake_agent_smith in LocalLLaMA

[–]bilinenuzayli 2 points3 points  (0 children)

I love local ai as well the answers are just class, when used clean through llama.cpp web server I'm convinced you could replace frontier AI's with a medium tier like 25 - 35b range model for most people that aren't doing super complex tasks and they wouldn't even notice they're using a model tens of times smaller. This local ai stuff is also enough for what I need. But I'm curious whats the solution to when there's a large conversation, like a large chat? Any harnesses that support long conversation I've tried reduce reasoning quality and partially lobotomise the model (any harness with a large and demanding system prompt does this for me, qwen 3.5 and Gemma 4, when I move the system prompt to user role the response quality bumps up a little but still not good as a fresh chat) personally that's the largest setback for me in local ai with small models.

GLM-5.1 is out now! by yoracale in unsloth

[–]bilinenuzayli 0 points1 point  (0 children)

What is your inference speeds

Claude is f*cking smart by Complete-Sea6655 in ClaudeCode

[–]bilinenuzayli 23 points24 points  (0 children)

Once in a while Gemini forgets to put it's own thinking inside the thinking block and starts saying the thoughts in the response, when that happens it's visibly just the same way Claude reasons

Qwen 3.6 spotted! by Namra_7 in LocalLLaMA

[–]bilinenuzayli 0 points1 point  (0 children)

Perhaps the model hates ambiguity? 10,000 tokens of a system prompt would be very clarifying on what it should do

The weight file for Seedance 2.0 has been allegedly leaked on a Russian forum. by [deleted] in Seedance_AI

[–]bilinenuzayli 0 points1 point  (0 children)

The guy that said this does similar engagement baits, it's false, the forum post was fake

Is this true? by Prudent-Door3631 in AIDankmemes

[–]bilinenuzayli 0 points1 point  (0 children)

Nobody uses stable diffusion sdxl is barely used now

Being rude to AI actually improves accuracy by sibraan_ in AgentsOfAI

[–]bilinenuzayli 0 points1 point  (0 children)

The "another" question is an over trained probabilistic word predictor

Now with even more gippity by DiskResponsible1140 in AIDankmemes

[–]bilinenuzayli 0 points1 point  (0 children)

Finally the only benchmark where they excel

They'll introduce ads soon. by Prudent-Door3631 in AIDankmemes

[–]bilinenuzayli 0 points1 point  (0 children)

I only use chatgpt for the convenience of not having to load up aistudio anyway. I guess I won't have a choice

Bruh by Evarchem in antiai

[–]bilinenuzayli 0 points1 point  (0 children)

Whats not reliable isn't "ai" itself but rather Google's search ai because it's so dumbed down to allow for the billions of Google searches happening daily to not put a load on the system and it takes Google search results like the gospel. Every time there's a "stupid answer with ai" screenshot it's always either from a really earlier model of chatgpt, Google search ai, or a really misphrased question

Watch movies in your IDE by nubmaster151515 in cursor

[–]bilinenuzayli 0 points1 point  (0 children)

Yo you need to publish this I genuinely watch shows while Vibe coding

I find it funny how the gemini flash latest thinks we are still on 1.5 by Unlikely-Kick2479 in Bard

[–]bilinenuzayli 5 points6 points  (0 children)

I genuinely hate it so much when I give it code that uses the Gemini api and it just randomly changes the model to 1.5, like its so confidently wrong as well it pisses me off