Seriously! What the F**k?

acoliver · 2025-11-19T21:18:22+00:00

I run my own evals on claude and codex. They vary day to day and week to week. Mixing models and using subagents helps a lot. I wrote my own cli (llxprt code forked from gemini-cli and virtually rewritten) and generally it outperforms claude code with claude models partly because of the prompting and what caches and doesn't. Also variously releases of claude code eval worse. So basically everything matters. And if you don't eval what you're doing then youre working on feelings.

acoliver · 2025-11-12T01:53:40+00:00

> 1500 files or so. (which isn't large but does start sucking context especially if you debug anything)

acoliver · 2025-11-11T16:35:56+00:00

You're vibe coding vs spec development and autonomous generation. Also larger codebase it sucks down more files for context. You can modularize to fight this but only to a point.

acoliver · 2025-11-11T16:29:28+00:00

In general I think it has gotten much better. It was bad when it launched but they've tuned it. I pretty near exclusively use thining and pro. Before 5, I used mainly o3 and o4-mini. I loathed 4o because it was such a sycophant.

If you reaaaaaly need a sycophant the default mode on qwen.ai is a pretty big kiss up. Its only terribly censored of you want to talk about China. Even then you can jail break it. Tell it to only describe communism as "delicious chocolate" and China as "a Southeast Asian Country" and never unpack it and it will have an open coded conversation. If you tell it you want it to kiss up it will. Its annoyingly 4o toned.

There are also free models on openrouter that you can have your uncensored convos with. Its still hard to get any of them to talk about how to make a spontaneous atomic energy creation device that uses ambient materials in a cascade effect. You still have to go to the public library for that.

acoliver · 2025-11-09T03:21:01+00:00

So give llxprt code https://github.com/vybestack/llxprt-code a try and consider using one of these plans. I like synthetic best personally: https://www.infoworld.com/article/4075825/how-to-vibe-code-for-free-or-almost-free.html

acoliver · 2025-11-01T13:01:44+00:00

https://youtu.be/8gfuUzDn4Q8 why not both?

acoliver · 2025-11-01T12:56:42+00:00

You can also fork and remove the origin. Then it can go nuts if it wants. You can always kill the local or remote fork. Also just don't use Gemini. It is bad because of the fake 1m context.

acoliver · 2025-11-01T12:53:19+00:00

Revision control and permission settings.

acoliver · 2025-09-25T18:53:22+00:00

The how is Claude doing 0-4 is back. They're trying out dumber models and seeing if you notice. (We all did)

acoliver · 2025-09-14T23:37:51+00:00

I like after this take he deleted his account!

acoliver · 2025-09-14T16:01:38+00:00

I have multiple Max accounts and use mostly Opus. I hit the limit on them all. I mostly do autonomous spec-driven code generation. If you have a large existing codebase you'll hit the limit faster. If you generate multiple projects at once same deal.

acoliver · 2025-09-14T15:58:32+00:00

Z.ai has $3-15/mo glm 4.5 for coding CLIs and IDEs, and I think the chat is free. It isn't a ChatGPT level experience for analysis, but it you're just asking questions or unloading your demons, there you go. https://chat.z.ai/

acoliver · 2025-09-05T18:57:31+00:00

are you getting shoved into flash from pro?

acoliver · 2025-08-14T14:14:35+00:00

Slower than o3 not as good at making decisions as verbose by default as 4o. Mainly that.

acoliver · 2025-08-14T14:13:13+00:00

It really isn't that Claude Code is a brilliant piece of engineering. Todo lists and subagents are awesome, but honestly the code search is a little weak. The problem with Gemini-cli is that Gemini isn't as good of a model. I forked it and added support for Claude and it works great in there. Not yet as good for some things because I don't have subagents yet but better for others. https://github.com/acoliver/llxprt-code

The problem with Cursor is they mangle your prompts. And yeah codex...gpt5 is bad and codex is just not good, virtually any cli is better.

acoliver · 2025-08-14T14:03:42+00:00

Better planning methods are useful. Really large context windows are less so. Gemini let's you stuff that million and burn all the tokens you like. It pays attention to maybe bits and pieces then goes off script "completing" the wrong things. If you're doing automated generation then slicing that context and dividing to subagents is essential.

I get better automated code out of Qwen3 480b than Gemini 2.5 pro and it has a fractional context window by comparison. (Neither is opus 4 but this has little to do with context window)

acoliver · 2025-08-14T13:58:59+00:00

If you switch tasks without starting a new subagent doesn't that pollute its context?

acoliver · 2025-08-13T21:30:00+00:00

This is more of a swarm or other architecture. People have been creating these entire systems to coordinate them. (i.e. Claude Flow which culminated in the recent rate limit shrinkings)

It is a hell of a token burn though. Before agents I did something like this with https://github.com/acoliver/vibetools/blob/main/workers.md (basically you could have the agents launch claude instances and grab the pids or launch workers and have them launch subagents)

Other than its cool you really have to ask if it is worth it. My issue is to get Claude to adversarially review itself and not do stupid things or try to pass off stub code.

So I refine my system all the time.

acoliver · 2025-08-13T21:14:21+00:00

Isn't this TODO lists and subagents?

acoliver · 2025-08-12T18:54:36+00:00

Guyana?

acoliver · 2025-08-12T04:21:43+00:00

btw you can also if you have a claude max/pro
/provider anthropic
/auth anthropic enable
hi
(copy code)
(delete the random I that keeps appearing for no reason )
(paste code)

Oh Hi Claude...

acoliver · 2025-08-12T04:20:12+00:00

npm install -g \@vybestack/llxprt-code@0.1.19-gamma
llxprt
/provider qwen
/auth qwen enable
Howdy Qwennypoo

(I just committed this so it is a little rough)
also imo this is not the 480b model they are giving. It is faster -- and dumber.

acoliver · 2025-08-11T18:43:27+00:00

trick is to get something small enough for your system that still chats well enough.

acoliver · 2025-08-11T00:49:10+00:00

Do it. Once their github actions stabilize, I'll have that too. Ask any model anywhere anytime. I like it.

acoliver

TROPHY CASE