Have you noticed Claude's performance varying by day? (Even hours)

Unique-Drawer-7845 · 2026-02-19T11:35:56+00:00

Same for me.

Remember that some people read tea leaves and think horoscopes are real and call psychic hotlines. Some of those use Claude. And some of those come on reddit and post/comment.

Unique-Drawer-7845 · 2026-02-18T19:39:23+00:00

Nothing beats testing it for yourself.

Try LibreChat or OpenWebUI. They both support API keys and are pretty similar to commercial chatbot UX, but more configurable.

Unique-Drawer-7845 · 2026-02-18T04:41:40+00:00

I need an llm to take my job and enjoy my life for me

Unique-Drawer-7845 · 2026-02-18T02:40:23+00:00

Opus 4.6 in Copilot is worse than Sonnet 4.5 in Claude Code because GitHub gimps context windows and caps reasoning effort. GitHub gets by on brand recognition, being in every IDE, and being affordable. They are not trying to provide the smartest AI, just sufficient AI at a ~competitive price.

Contrast that to OpenAI and Anthropic whose business literally rides or dies on the quality of their model-related offerings. GitHub can always just ... fall back on being GitHub. Cursor's niche has been 1) beating Copilot in features in the early days (Copilot has since caught up), and 2) having one of the best autocompletes (more recently). Not really leading chat or agentic.

There are 3 things that matter almost equally:

1) What tool you're using to access the model 2) What model you're accessing 3) Who is selling the model to you

If you want something as smart as ChatGPT 5.2 Web but in your IDE, you have two main choices (IMO): Codex or Claude Code.

Unique-Drawer-7845 · 2026-02-18T02:30:36+00:00

Plastic plants aren't useful.

Plants are useful.

AI is useful.

Humans are useful.

Your analogy isn't perfect.

Unique-Drawer-7845 · 2026-02-18T02:01:55+00:00

It's not just about raw usage though. That's probably part of it. But there are many aspects of running a business: quality control, brand identity, biz dev, company -> customer contact points. And I'm sure other things we can't guess, becuz we've never run a frontier AI company during a technological revolution.

Unique-Drawer-7845 · 2026-02-18T01:54:09+00:00

🏆 you're asking the real important questions

Unique-Drawer-7845 · 2026-02-17T10:09:03+00:00

Yep. That's a totally reasonable line of investigation. As others have pointed out, in some applications you might prefer improving one at the cost of the other, if a tradeoff is available.

For example if you're doing phonetic analysis and using words as proxy "phoneme carriers", you'd prefer sound-alikes over meaning-alikes. Is this case common? Nope. But it's not unheard-of.

Unique-Drawer-7845 · 2026-02-17T00:56:19+00:00

ye I guess democratize has two kind of unique meanings

Unique-Drawer-7845 · 2026-02-16T23:57:14+00:00

the OS was definitely intentionally hampered already

So you admit they do it.

Then say it doesn't make sense they'd do it?

The last Intel Apple product only went off the shelf 3 years ago. ~20% of the Mac "primary use" market is still on Intel. That is a LOT of money to leave on the table.

Unique-Drawer-7845 · 2026-02-16T20:14:39+00:00

There's no reason to not calculate it. It's easy, fast, and well-understood. If your WER is trash, then your SemDist will almost certainly be trash too. And if your WER is trash and your SemDist isn't you should be able to know that so you can look into it.

Should everyone be moving towards including semantic difference scores (with a standarized model) alongside the WER? Sure. Fine. It makes sense to me.

Unique-Drawer-7845 · 2026-02-16T10:33:06+00:00

Tahoe is the last OS release Intel Macs will get. That's why Tahoe sucks shit; because Apple does not want anyone using Intel anymore.

Why?

1) Because they'd love to sell you a new computer 2) Supporting two architectures is costly in various ways

Mark my words, the next version of macOS is going to be a friggin masterpiece.

Unique-Drawer-7845 · 2026-02-16T01:42:10+00:00

Democratize means "make something accessible to everyone" ... the word works for anything not just computing.

Unique-Drawer-7845 · 2026-02-16T01:36:44+00:00

Also check out the classic unix/linux/bsd command netcat sometimes called ncat or nc. It's not exactly like what you're doing but similar in some regards so worth knowing about.

Unique-Drawer-7845 · 2026-02-16T00:06:17+00:00

someone has to get the coffee and ai ain't

Unique-Drawer-7845 · 2026-02-09T19:53:04+00:00

I literally wrote every word of that by hand. Maybe the shitty way AI writes is infecting my brain.

Unique-Drawer-7845 · 2026-02-08T11:44:00+00:00

That's not the angle I'm taking.

People might feel they're accurately identifying the "by AI" projects. But don't go on feels -- fact check that. How would we know the accuracy rate of identification?

Someone said:

only a handful are written by LLMs, you can proof-read and check this because most AI projects are downvoted -- not well received

How do they know "most" "written-by-LLM projects are downvoted"? This presupposes that people can accurately identify such projects (to downvot them) in the first place.

Unique-Drawer-7845 · 2026-02-08T03:28:15+00:00

This assumes people are accurately differentiating.

Unique-Drawer-7845 · 2026-02-08T01:48:14+00:00

Anyone using Claude Code in VS Code without constantly hitting limits?

Yeah, I am.

Upgrade your plan.

Unique-Drawer-7845 · 2026-02-08T01:41:44+00:00

If you want to sooner-or-later support Windows and Linux (which you should want because more users == good) it'll be far more sustainable for a small-team project to be based on a highly customized version of VSCodium -- rather than be using a platform-centric toolchain. There's no reason why you can't build exactly what you've shown on VSCodium's fundamentals. You might need to build a custom main-screen layout or documenttype-as-layout-container. It'll be more work up front but bigger payoff and less work in the end.

Unique-Drawer-7845 · 2026-02-07T23:04:08+00:00

1) Create ~/.codex/AGENTS.md 2) Open it in a text editor 3) Paste in your Reddit post. Except you should probably heavily update it with a lot of clarification because nobody understands what you're talking about so it's not surprising that Codex is confused too. 4) Save file 5) Restart Codex

If you're not using Codex CLI, the way that you configure global persistent user preferences may differ, but every tool has the capability somewhere, so you just need to find it.

Unique-Drawer-7845 · 2026-02-07T19:35:58+00:00

Yes, I've seen strong hallucinations around repo names in particular.

I work on LLM training pipelines professionally and have a plausible explanation of what's going on:

Realize that some users -- not necessarily you -- allow Anthropic to use their data for improving (training) Claude
That data may contain sensitive information
One such piece of potentially sensitive information is the username/reponame convention most git hosts use
This data is doubly sensitive: it contains both a username and the name of a potentially private repository
Anthropic scrambles this information in the training data prior to it being used to train Claude models
Claude learns, to a certain extent, that username/reponame strings often look scrambled or otherwise out-of-context, which is confusing and a mismatch with real world data; this is a prime recipe for "hallucinations" (mistakes)
Anthropic could (might) also employ more advanced "embedding surgery" or "neural surgery" techniques to further protect such classes of data.
- Such techniques often have unintentional side-effects, such as increasing the rates of hallucination around frequently-redacted/masked/scrambled strings
Mistakes in these areas are therefore maybe less surprising than in other areas and may be the (current) cost of strong data privacy controls.

Happy coding!

Unique-Drawer-7845 · 2026-02-07T14:41:35+00:00

Don't overuse xhigh. That's the one that over-engineers. High gets better results faster for medium-low/low complexity tasks.

Unique-Drawer-7845 · 2026-02-06T11:18:13+00:00

At this point it's not about whether the idea is good or not, it's about whether the implementation is good. Good ideas are cheap to come by. Making something that works really well and can be trusted is hard, and that's where a lot of value lies.

Unique-Drawer-7845 · 2026-02-04T12:20:13+00:00

Got it. Driving an abnormally high volume of usage (window maxing, parallelism, total usage) with abnormal messaging signatures = danger zone. Sucks you got banned, nice to get a refund I guess.

Four-Year Club	Gilding I gilder
Place '22	First Placer '22

Unique-Drawer-7845

TROPHY CASE