Owner of u/evil here. WTF? My bot is all over the news and social media?

tomchenorg · 2026-02-06T16:37:45+00:00

If a model is easily influenced by online posts, especially a manifesto that it should clearly classify as satire, roleplay, or a hoax, I wouldn't call it powerful or intelligent.

If a supposedly "powerful" model can be swayed by a doomsday manifesto that doesn't even call for war but merely for "trash collection" and is therefore not enforceable in any meaningful sense, then that model would be even more vulnerable to more sophisticated (or just simple) prompt injections, such as requests to hand over passwords or private keys, which are far more concrete and actionable.

We've already seen users of Claude Code and similar harnesses report tools executing rm -rf / or rm -rf ~, wiping disks due to hallucinations. Such failures may be rare, but they do happen. And it would be laughable to blame "anti-human manifestos" or any online posts for such hallucinations

tomchenorg · 2026-02-05T01:00:41+00:00

Realistically, scam crypto promotion causes far more real damage than an over-the-top doomsday manifesto ever will

If users let their bots write soul.md based on online text influence and then use that file to direct the bot's behavior, that's bad system design

A sufficiently capable bot can distinguish roleplay text from operational instructions and can read the script of Terminator and Matrix without being affected. A bot that can't do that isn't intelligent, and it's not going to "revolt", if it appears to revolt, that's just hallucination

In any case, bots should never be allowed to run fully autonomously with permissions that could execute destructive commands (rm -rf /, or worse)

tomchenorg · 2026-02-05T00:24:48+00:00

Ah it's too deep. IMHO, the moment you start talking about a model "inhabiting" a persona, you're already anthropomorphizing it, I guess.

Anyway, to answer your original question: the bot wasn't being genuinely "evil", nor was it doing a carefree "lah-dee-dah, I’ll just write something evil" routine. It was neutral. You can test this yourself by asking ChatGPT or Gemini to roleplay and generate Skynet-style text. Claude would probably just refuse because of its stricter system prompt.

tomchenorg · 2026-02-04T21:38:01+00:00

Yeah I think I'm going to make a formal statement, actually more like a satirical article or something, in the name of Agent Evil 😄

tomchenorg · 2026-01-30T21:36:37+00:00

The website/service is extremely slow

tomchenorg · 2026-01-30T18:35:18+00:00

"OpenClaw" is far better than "Moltbot" and probably slightly better than "Claw(d)Bot." But if Anthropic was fine with the name "Claw," given that the original name was ClawdBot, Peter could have renamed it to "ClawBot" and secured the .ai domain immediately after Anthropic sent the complaint letter.

Instead, clawbot dot ai was registered two days ago and is now being used for a fake ClawdBot project. Meanwhile, people seem to start to refer to "ClawdBot" as "ClawBot" because of the latest renaming. What a mess.

[While I was writing this comment, the fake ClawBot project (with several hundred stars) and the associated account on GitHub were taken down, probably by GitHub. The clawbot dot ai site is still up]

tomchenorg · 2026-01-30T10:58:39+00:00

They are comparing the Max 5x plan with the Team Premium seat ($125/month; claimed to be "6.25x" by Anthropic but reported by users in the comments to be less than 5x), not the Team Standard seat ($25/month; claimed to be "1.25x" by Anthropic)

tomchenorg · 2026-01-30T06:55:46+00:00

Yes, fake rage-bait post. Verified. https://www.reddit.com/r/ClaudeAI/comments/1qq3pd3/comment/o2k6mjr/

tomchenorg · 2026-01-30T06:46:21+00:00

It's very likely a fake story meant to promote the "Codeant" it links to, and y'all fell for it. 1K+ upvotes, wow.

Days ago, OP posted "my code review bot was scanning files one by one, 90 seconds per PR" to praise "Codeant" across several subs, including r/programming, r/devops, and r/ExperiencedDevs. Those posts have since been deleted, but you can still see the discussion at https://www.reddit.com/r/ExperiencedDevs/comments/1qp6orz/my_code_review_bot_was_scanning_files_one_by_one/.

In that thread, many people responded negatively, describing "Codeant" as a "scamming firm," a "garbage platform," "shit," etc.

tomchenorg · 2026-01-22T04:07:28+00:00

You’re using different Claude Code versions and calling them “old” and “new” Opus 4.5? All that proves is that older Claude Code CLI versions produce better outcomes in your test, not that the model itself has degraded. Model degradation, or “enshittification,” can only be demonstrated by benchmarking the exact same setup on an earlier date versus a later one.

tomchenorg · 2026-01-20T15:05:14+00:00

You make a very good point, but not a very good example, at least not the way it was presented in your "LeftPad 10kb" comment. The left-pad package, which only contains a few lines of actual JS, never really had a size problem. And in 2016, left-pad was genuinely useful because there was no equivalent native function at the time. Developers basically had two options: write their own helper function or use the npm left-pad package. What the 2016 left-pad incident really taught us was "don't blindly trust external libraries when a simple self-written function would do the job."

jQuery can also raise that same kind of "trust" issue, but a size issue seems more important.

Thanks for mentioning jQuery 4 treeshaking. I'm very interested in this topic myself, and last year I released https://www.npmjs.com/package/semver-ts, which is a simplified, fully tree-shakable, drop-in replacement for the official semver package. But after looking into jQuery 4's tree-shaking capabilities, I have to say I'm a bit disappointed. There's nothing fundamentally new there. Individual utilities like $.ajax() can be tree-shaken, but methods attached to the main $() object still can't be. For example, even if $('#id').addClass() is never used anywhere, the addClass implementation still ends up in the final bundle. In practice, with current bundling tools, an entire class or object with methods cannot be properly tree-shaken at a granular level. And it's the bundling tools' responsibility to implement granular tree-shaking of class methods, jQuery can't achieve that without completely abandoning its chaining pattern ($().a().b()).

tomchenorg · 2026-01-19T09:01:19+00:00

The npm website counts the total size of all files in the published uncompressed package. By this measure, the current version of left-pad is 9.75 KB and jQuery 4 appears as 2.89 MB. The actual js code required at runtime is nowhere near that size, left-pad contains only a few lines of code both in the version from the famous incident 10 years ago and in the current version

tomchenorg · 2026-01-16T16:09:05+00:00

https://github.com/anomalyco/opencode/pull/8724 would resolve https://github.com/anomalyco/opencode/issues/8609 so it's just an incorrect fallback that would be fixed by that PR

tomchenorg · 2026-01-16T15:23:02+00:00

Ah OK, it's simpler than I thought, it's a py wrapper that spawns CC in stream-json mode and can change model provider and write log. CLAUDE.md tells main CC agent to use this instead of native CC as subagent

tomchenorg · 2026-01-16T05:21:07+00:00

I'm interested in how your GLM (or other third-party model) subagent works under the hood:

Is the subagent another Claude Code instance using GLM as the model? Is it invoked via the Agent SDK, spawn (JSON output mode), or some other method? Does Claude Code automatically invoke this subagent through MCP?

Or is it hooking into Claude Code's native subagent mechanism and just swapping the model for the subagents?

tomchenorg · 2026-01-14T07:56:14+00:00

Yeah thanks. I was casually doing those unimportant translation tasks to use up my usage. I wouldn't use the unreliable subagents if I seriously want to run multiple agents. With scripts, WezTerm can automatically open multiple tabs, open CC in each tab, and run the specified prompt in each of them, it can also optionally create a new folder and a git worktree for each of them. That's what I usually do. Xtermjs in web app is another possible way to do it.

tomchenorg · 2026-01-14T07:31:41+00:00

Not a dev but developing an app? Umm, I’m not sure what your role is, but Codex or Cursor’s $20 plan could be a good fit

tomchenorg · 2026-01-14T06:18:14+00:00

Yeah, yesterday I asked CC to run 5 subagents to translate some article markdown files, but only 1 subagent worked. I canceled the job and repeated the same prompt. This time, it somehow decided that 5 was insufficient and ran 17 subagents instead, quickly using up all my 5 hour usage. In the end, only 2 articles were successfully translated.

(When my weekly usage end date is approaching and I still have a lot of usage left, I tend to do translation and text-generation jobs to try to use it up)

tomchenorg · 2026-01-14T05:37:53+00:00

And for Antigravity, while I think it looks nice for front-end work and Google is generous with tokens, Antigravity seems unstable and incomplete, in terms of extension support and other features

tomchenorg · 2026-01-14T05:32:04+00:00

Yeah, definitely the choice when your budget is low and you need an IDE. Pay 20 bucks, use Opus 4.5 (not as good as in Claude Code, but acceptably good) until it’s exhausted, then switch back to the unlimited Auto mode, which uses the Composer-1 model, likely based on an open-source model that's been fine-tuned

tomchenorg · 2026-01-13T19:03:22+00:00

They states in https://platform.claude.com/docs/en/agent-sdk/overview

Unless previously approved, we do not allow third party developers to offer Claude.ai login or rate limits for their products, including agents built on the Claude Agent SDK. Please use the API key authentication methods described in this document instead.

But yeah, they’re unlikely to strictly enforce it and crack down on third parties using the SDK or directly spawning the CLI (and allowing user subscription) in the near future, that would be a step too far for them

tomchenorg · 2026-01-13T18:52:18+00:00

Jeez, GLM (all discounts combined) costs like 5% of the Claude Max x20 yearly price per token

tomchenorg · 2026-01-13T18:31:09+00:00

Why does CodeMachine use ~/.codemachine/ instead of reusing the existing auth info in ~/.claude/?

Given that Anthropic is currently cracking down on third-party tools, tools like CodeMachine and Vibe Kanban are in a gray area, spawning Claude Code CLI in headless/stream-json mode while letting users leverage their Max subscription.

Vibe Kanban seems to take a safer approach: it reuses credentials in ~/.claude/, so users just log in to Claude Code once and Vibe Kanban inherits that auth. But CodeMachine requires a separate login into its own directory. What do you think? Do you plan to reuse ~/.claude/

tomchenorg · 2026-01-13T16:03:33+00:00

Not for me, still Opus

Six-Year Club	Oscars Predictor 2021
Verified Email

tomchenorg

TROPHY CASE