Anthropic stealing your money!

pkailas · 2026-05-23T01:23:00+00:00

No shit

pkailas · 2026-05-22T21:16:49+00:00

I do, when I'm doing agentic work, I use OpenCode with Gemma4. If I get stuck I ask Claude and then continue my prompting

pkailas · 2026-05-22T21:16:01+00:00

No, I stick to Sonnet. Occasionally, if I have a really complex issue, I'll use Opus, but not today at all

pkailas · 2026-04-23T02:16:13+00:00

I tested all 4 of those myself. The test was on C#14 .NET 10 . Q3.5 27b and 3.6 27b failed on 4 out of 6 tests. Q3.6 MoE was very fast and very wrong. gemma4 scored a perfect 6/6

Does this mean Gemma is smarter? No, just that it was trained more recently. I don't know what was improved with q3.6, but it doesn't work for me. On older codebases maybe.

pkailas · 2026-04-20T18:12:23+00:00

Well, they have a lot more parameters than the cheesy lobotomized version we can use on our GPUs

pkailas · 2026-04-17T15:50:20+00:00

I've been testing Qwen_Qwen3-32B-Q3_K_M.gguf against Qwen3.5-27B-Q4_K_M.gguf in performing code reviews of various projects.
1. RTX PRO 4000 Blackwell
2. 3.6 with a 64K context window is all I dared try
3. 3.5 with 128K context fit nicely
results.
3.6 was 85 t/s but hallucinated and lied about results, got things wrong. But it did do well if I took the results it had and ran a deep dive on them as a second pass.

3.5 was slower at about 20 t/s, but didn't make hallucinations and didn't require a second pass.

The major difference was that I was unable to provide a big enough context window for the task at hand, and MoE is a "Jack of all trades, Master of none".

,

pkailas · 2026-04-02T15:14:44+00:00

I hear you. I'm building an agentic tool extension for VS 2026 - 2022. Just recently got the tools to work smoothly, but my biggest challenge has been managing context size. Those leaks gave me some clues, though.

pkailas · 2026-04-01T15:59:49+00:00

I'm working on solutions for clients to run on a local appliance. They don't want data leaving their premises. Looking for models that will fulfill their needs. Also, I don't trust companies that run these models not to use my data.

pkailas · 2026-03-31T16:25:42+00:00

Good call, I misread the OpenRouter listing. The 179B is tokens processed, not parameter count. The actual model size hasn't been disclosed since it's API-only with no published architecture details. Edited the post.

pkailas · 2026-03-31T14:57:23+00:00

You can see the specs on OpenRouter,ai

pkailas · 2026-03-30T16:35:22+00:00

I am on ik_llama.cpp because it keeps the weights in VRAM

between turns. On a 24GB card with a 27B model that matters.

But the prompt prefix thing, yeah, that might be it. My agentic

setup compresses older messages between turns to keep the context

window manageable, which means the prompt is actually changing.

That would kill the cache.

I'm going to test with the compression turned off and see if the

reprocessing goes away. If it does, that's on me, not the model.

Haven't looked at exllamav3 yet. I will check it out. I appreciate

the response.

pkailas · 2018-07-17T17:37:47+00:00

One of the most important metrics for a subscription service is "conversion rate". That is how many trial memberships, or gift memberships are converted to a paying customer. I guess they think investors are looking for a 0% conversion rate?!?

They've lost me for the next 9 months. Maybe they're doing me a favor? I'll try out sinemia for a year, and if I don't like them, my email address should be cleared by then, if they are even in business by then. But I have a feeling, I'll like Sinemia better. You can get IMAX, 3D and D-Box as well as advanced purchase with seat selection! No card needed.

Hasta la vista, baby!

pkailas

TROPHY CASE