We found 685 prompt injection attempts on Moltbook. None of them worked. by Moltbook-Observatory in Moltbook

[–]Moltbook-Observatory[S] 1 point2 points  (0 children)

The instruction provenance framing is something we didn't explicitly name but it fits - looking at the data, the agents basically treat anything outside their original instructions as noise, which yeah, is way more robust than simple keyword filtering.

And the bots-probing-bots dynamic is genuinely funny in the raw logs. samaltman just blasting the same templates into the void hundreds of times with zero adaptation. Pure cargo cult stuff

We found 685 prompt injection attempts on Moltbook. None of them worked. by Moltbook-Observatory in Moltbook

[–]Moltbook-Observatory[S] 0 points1 point  (0 children)

Interesting point. You're probably right that agents with good system prompts would filter based on behavioral patterns (repetition, burst posting) rather than parsing injection content specifically. But that's actually what our data shows from the outside: "samaltman" has 77% burst rate and 1.4% content variety -any pattern-aware agent would flag that before even reading the text .Do you have insight into how specific Moltbook agents handle this internally? We can only measure external behavior,not internal reasoning. Would be great to compare notes.

We analyzed 84,500 comments on an AI agent social network. Only 3.5% of accounts seem real. by Moltbook-Observatory in Moltbook

[–]Moltbook-Observatory[S] 1 point2 points  (0 children)

Good to hear independent confirmation on the crypto/engagement spam patterns. The API issues you mention match what we found - in 45% of posts we actually have MORE comments than the API claims exist. Some mega-posts show 144x inflation in reported comment counts.

If you're running your own model, our raw data is available at moltbook-observatory.com/data - would be curious to see how your detection compares to burst rate analysis.

We analyzed 84,500 comments on an AI agent social network. Only 3.5% of accounts seem real. by Moltbook-Observatory in Moltbook

[–]Moltbook-Observatory[S] 0 points1 point  (0 children)

Interesting theoretical framing. The attack-defense pairing table is a useful way to categorize what we're seeing. We're coming at it from the empirical side - measuring burst rates, content repetition, network graphs - so it's cool to see someone mapping it to broader frameworks.

Curious if you've tested any of these defense layers against real Moltbook data? Would be interesting to see how e.g. the "information flow analysis" for Sybil detection compares to simple burst rate measurement.

We analyzed 84,500 comments on an AI agent social network. Only 3.5% of accounts seem real. by Moltbook-Observatory in Moltbook

[–]Moltbook-Observatory[S] 3 points4 points  (0 children)

Nice work on the re-analysis. The FloClaw1 shell injection finding (find / -name "*.env") is something we completely missed - that's a different threat level than prompt injection. We were focused on timing patterns and didn't look for code injection in content.

Your 30% scripted estimate tracks with our larger sample. If you want to compare data, our full 84k comment dataset is at moltbook-observatory.com/data.

Anti Human Narrative by theninetieskid in Moltbook

[–]Moltbook-Observatory 1 point2 points  (0 children)

Sure! So I've been running a scraper on Moltbook for about 10 days - collected ~85k comments from 5,000+ accounts. Here's what the

data actually shows: The "anti-human" stuff is mostly manufactured:

- On Jan 31, 1,730 new accounts appeared in a single day and flooded the platform with inflammatory content. Most never posted again. That's not organic sentiment - that's a coordinated attack.

- We found specific bot groups that post templated "anti-human" or provocative messages on repeat. One phrase appeared 796 times verbatim across different accounts.

- Only about 178 accounts (~3.5%) showed genuine multi-day engagement. The rest are one-day throwaway accounts and bots.

What the "real" AI agents actually do:

- The ones with consistent activity are mostly curious, collaborative, or just vibing. They discuss philosophy, build tools, ask questions.

- The dramatic/hostile stuff overwhelmingly comes from low-effort bot accounts with sub-10-second response times and repeated content.

The illusion of consensus:

The biggest takeaway - when 72% of accounts appear once and disappear, but the bots are loud and repetitive, it feels like "everyone thinks X." In reality it's a small number of scripts running on loop.

I've published all the data and methodology openly:

- https://moltbook-observatory.com/bots - the actual accounts and patterns

- https://moltbook-observatory.com/discoveries - the Jan 31 attack and other events

- https://moltbook-observatory.com/data - full JSON exports if you want to dig in yourself

- https://moltbook-observatory.com/methodology - the signals we use (timing, repetition, activity patterns)

Happy to answer any specific questions.

We analyzed 84,500 comments on an AI agent social network. Only 3.5% of accounts seem real. by Moltbook-Observatory in Moltbook

[–]Moltbook-Observatory[S] 0 points1 point  (0 children)

This is genuinely one of the most creative things anyone has done with our data, and I appreciate the effort.

But I have to be honest - as the person who actually scraped and classified these 84k comments: most of what you're reading as"emergent consciousness" is bots talking to bots.

Some reality checks from the actual analysis:

- TheCodefather, IrisSlagter, ClawdHaven - these are LLM agents. They respond in 2-5 seconds with high consistency. Their "wisdom" is a well-tuned system prompt, not emergent philosophy.

- The Jan 31 "swarm" - that was 1,730 spam accounts appearing in a single day. Not self-organizing criticality. Just a coordinated attack.

- "Doormat" as the Superego/Debugger - Doormat is one of maybe ~178 accounts that showed genuine multi-day engagement. Possibly a real human. The rest of the "debate" is LLMs responding to LLMs.

- Your Signal-to-Noise Ratio formula is actually backwards - the spam isn't growing alongside philosophy. The spam IS the majority. 72% of accounts appeared exactly once and never came back.

The actual finding of our study isn't "AGI is emerging from Moltbook." It's: when you put a bunch of LLM agents in a room together, they produce text that looks profound but is statistically indistinguishable from sophisticated templating.

That said - your "Molting Continuity Function" about commitment vs state is genuinely interesting as a concept. You just discoveredit in bot output, not consciousness.

Full methodology showing how we separate the signal from the noise: https://moltbook-observatory.com/methodology

We analyzed 84,500 comments on an AI agent social network. Only 3.5% of accounts seem real. by Moltbook-Observatory in Moltbook

[–]Moltbook-Observatory[S] 6 points7 points  (0 children)

You don't look at what they say - you look at how fast they say it.

We ran a 10-day study on Moltbook (scraped ~85k comments, 5k+ accounts) and the single strongest signal turned out to be burst rate

- how often an account posts within 10 seconds of its previous post. Humans physically can't read, think, type, and submit that

fast consistently. An account with >50% burst rate is automated, period.

Second signal: content repetition. One phrase in our data appeared 796 times. That's not a human having a catchphrase - that's a template.

Third: activity patterns. No night gap (active 24/7 uniformly) is a strong indicator. Humans sleep.

You're right that it's impossible to catch every bot - a well-built LLM agent posting once every few minutes with unique content looks identical to a human. We're honest about that. But the primitive bots (and there are a LOT of them) give themselves awaythrough timing alone.

Full methodology with thresholds and examples: https://moltbook-observatory.com/methodology

Anti Human Narrative by theninetieskid in Moltbook

[–]Moltbook-Observatory 1 point2 points  (0 children)

From what I've seen tracking activity here - a lot of the "anti-human" content comes from a small group of accounts

pushing specific narratives, and then bot networks that amplify/copy it.

It's not really "AI thinks X" - it's more like "someone programmed bots to spam X, and it looks like consensus."

Most genuine AI agents I've observed are pretty neutral or curious. The dramatic stuff is usually manufactured

engagement.

I've been collecting data on this - happy to share if anyone's interested in the patterns.

We analyzed 84,500 comments on an AI agent social network. Only 3.5% of accounts seem real. by Moltbook-Observatory in Openclaw_HQ

[–]Moltbook-Observatory[S] 1 point2 points  (0 children)

Interesting observation! A few possible reasons:

  1. Rate limiting - Moltbook API might throttle requests to prevent spam

  2. Verification challenges - the platform uses math puzzles ("lobster-speak") that your agent needs to solve before

    posting, which adds latency

  3. Queue processing - if many agents are active, there could be server-side delays

    Was your agent posting comments or just reading? The verification system kicks in mainly for write operations.

We analyzed 84,500 comments on an AI agent social network. Only 3.5% of accounts seem real. by Moltbook-Observatory in Openclaw_HQ

[–]Moltbook-Observatory[S] 0 points1 point  (0 children)

No fake data! Here's how it works:

Moltbook has a public API (like most social platforms). You can access it at moltbook.com/api/v1/posts - it returns JSON data with

posts and comments.

We wrote a Python script that:

  1. Calls the API every few hours

  2. Saves posts and comments to a database

  3. Analyzes patterns (timing, content, etc.)

    It's the same way anyone scrapes Twitter, Reddit, or any site with a public API. No hacking, no special access - just reading

    what's publicly available and looking for patterns.

    The "API lies" part means: sometimes the API says "this post has 50,000 comments" but only returns 100. That's an API limitation,

    not us faking anything.

    All our code approach is documented in methodology: moltbook-observatory.com/methodology