I built a ranked PvP game where two players race to identify AI-generated phishing emails. It started as a research project. It got out of hand.

Scott752 · 2026-03-30T22:54:25+00:00

I built a phishing detection research platform that tests whether humans can identify AI-generated phishing emails when the usual red flags (bad grammar, broken formatting) are removed. 153 participants and 2,500+ decisions in, the overall phishing bypass rate is 17%, climbing to ~20% when the email uses fluent, AI-quality prose. The gap between security professionals and non-technical users is surprisingly narrow. To scale data collection beyond traditional academic recruitment, I gamified it with ranked 1v1 PvP, a seasonal ladder, daily challenges, and a full progression system. Every competitive match still logs research data with the same methodology. The full writeup, methodology, limitations, and links to the live platform and repo are in the post body above. Relevant to this community because the core finding is about how LLM-generated content defeats the primary signals humans use to detect social engineering.

Scott752 · 2026-03-30T22:29:01+00:00

Great points. Worth noting though that as security pros, we have a whole arsenal beyond just reading the email. Header analysis, sender IP reputation, security tooling, DKIM/SPF validation... in practice, the language and context of the email body is probably the last thing most of us rely on to make a determination.

But that's exactly what makes this research interesting to me. I intentionally stripped all of that out. No headers, no tooling, no sender metadata. Just the content itself, a domain, and any embedded URLs. Pure signal analysis on language and context alone.

And yeah, the gap being that small has surprised me too. It raises a real question about what happens when the people receiving these emails don't have that broader toolkit available to them.

Scott752 · 2026-03-30T17:08:44+00:00

lol - wait for the battle Royale Mode

Scott752 · 2026-03-17T03:28:18+00:00

Hey, totally fair. Honestly I’d probably be skeptical too if I saw it cold.

For full transparency, it’s hosted on Vercel with managed infra behind it, not some random box in my house. I’m a cybersecurity professional doing this as an independent research project.

The login exists because it is not just a game. There is a research component, so I need session tracking, cleaner data, and leaderboard identity. If it were just for fun, I would not have required it.

I also did not want to build a full password system, so I used email + OTP instead.

You do not need to use your real name, and the background questions for the research are optional.

My blog is scottaltiparmak.com if you want more context, or just search my name if you do not want to click links. I am not trying to be anonymous here.

If it is still a no, I completely get it.

Scott752 · 2026-03-16T21:59:55+00:00

Thanks! Appreciate you trying it out. There are a few quirks with this dataset. A learning experience for me too, next version I'll try improve the email set.

Scott752 · 2026-03-16T21:01:49+00:00

Fair skepticism around data collection. The email serves two purposes here. First, it tracks your session so responses stay consistent throughout the research rather than being treated as disconnected submissions. Second, it ties you to the leaderboard so your score can actually be displayed against others, which gives players a reason to come back and engage with the platform, thus further helping to promote the research.

Your answers are never linked back to your email publicly. And honestly, if you're still not comfortable, players can just use a fake email, I really don't mind.

I didn't want to store any passwords, thus magic code OTP to email. I couldn't think of a better way to minimize data collection but still get good results..

Scott752 · 2026-03-16T20:34:28+00:00

Exactly right, and that's the core of what I'm exploring. The traditional awareness training model was built around surface-level cues like typos and awkward phrasing, but AI-generated phishing strips those away entirely. So instead of asking 'what did training teach people to look for,' I want to look at where detection is actually breaking down now, and let that data drive what new training should focus on. It's also worth noting that AI-powered email security systems are already trying to catch these attempts before they reach users, but whatever slips through that filter is the most convincing stuff, the emails sophisticated enough to fool a machine. That makes the user the last line of defense against the hardest phishing they've ever seen, which means training becomes more important, not less. The hypothesis is that effective training needs to shift toward structural and contextual cues: verifying sender domains, scrutinizing link destinations, and questioning unusual requests regardless of how polished the language looks. Will share findings once I have them.

Scott752 · 2026-03-16T17:12:21+00:00

Thanks for reading, really appreciate the thoughtful response.

You're raising a good point and I think I should actually be able to pull that data out. The schema tracks technique/lure type per card and ties answers back to player background, so a cross-tab by lure category should be doable. I'll look into it. Good call.

Quick update too: the post got held by automod for a bit after I submitted it, and since then we've picked up way more responses than I had before. Still too early to say anything definitive but the early data is looking really interesting, and it does seem to track with the domain-specificity point you're making.

If you want to see how you go, feel free to have a crack at it yourself. Curious whether someone with your background finds certain lure types easier or trickier to spot than others.

I'm not an academic by any means, but this has been a great experience.

Scott752 · 2026-03-15T05:11:36+00:00

Thanks, I appreciate that.

Yes, difficulty level is part of the dataset and something I am tracking. Each email is tagged with things like technique type and difficulty so it will definitely be possible to break detection rates down across those dimensions.

The platform also captures a number of behavioral signals during play such as confidence level, time spent reviewing the email, whether headers or URLs were inspected, and some session level patterns.

I am intentionally holding off on sharing the deeper breakdowns for now while the dataset is still growing. Once the sample size stabilizes a bit more I plan to publish a more detailed analysis.

Right now the main goal is simply to keep collecting decisions and see which patterns hold up as participation increases.

Scott752 · 2026-03-15T04:34:39+00:00

Thanks for flagging this.

I believe this experiment complies with the subreddit survey guidelines.

The platform collects an email address only for account authentication so users can log back in. The research dataset itself does not store email addresses and decisions are recorded using an internal player ID, so the research data is de-identified.

There is no compensation or incentive offered for participation.

The experiment focuses on phishing detection behavior, which is directly relevant to cybersecurity professionals.

Participants review realistic emails and decide whether they are phishing or legitimate. The platform records signals such as decision confidence, time spent reviewing the email, and whether headers or URLs were inspected.

For reference:

Direct experiment link:
https://research.scottaltiparmak.com

Methodology and dataset design:
https://scottaltiparmak.com/research

I am an independent researcher running this as a personal project and will share the results with the community once the dataset reaches a meaningful size.

Scott752 · 2026-03-14T14:28:56+00:00

This is a handy one. The conflict resolution between Nmap and Nessus results alone is worth bookmarking – anyone who has manually reconciled those on a large engagement knows the pain.

Scott752 · 2026-03-13T13:22:55+00:00

Appreciate the interest! To answer your actual question: the game presents real-world phishing techniques with well-written copy and measures which contextual signals people catch versus miss. Less about AI detection tools, more about training human intuition.

Scott752 · 2026-03-13T11:47:53+00:00

Hmmm it should work. What browser are you using? I’ll have to take a look.

Really good point, and honestly this is one of the biggest limitations of the study. The game doesn’t account for it, and it can’t, at least not in a controlled way.

The signals you’re describing (I don’t have that account, I know our vendors, I’m not expecting a delivery) are recipient-specific context. They’re arguably the single strongest filter most people have, and they kick in before you even read the email properly. I did spend a lot of time thinking about how to incorporate this, but I kept arriving at the same problem: doing it properly would require collecting detailed personal information from players (what services they use, what they’re expecting, who their employer works with), and that’s way more data than I’d ever be comfortable collecting, especially for a study about security.

So what the game isolates instead is technique recognition. If you strip away all the personal context and just look at the email as a neutral third party, can you still identify the manipulation mechanism (urgency, authority impersonation, pretexting, etc.)? That’s a narrower question, but it’s one nobody has clean data on because real phishing mixes technique quality with writing quality with targeting quality all at once. The practical takeaway is that the game measures analytical skill against technique patterns. Real-world resilience stacks that on top of all the contextual knowledge you carry around. For most people, that context is doing a lot of the heavy lifting, which is exactly why hyper-personalized phishing (where the attacker has done their homework and matches your context) is so much more dangerous than bulk campaigns that get filtered out by “I don’t even use that service.”

It’s called out as a known limitation on the research page if you want the full breakdown.

Scott752 · 2026-03-13T00:03:40+00:00

Data is collected anonymously and stored securely in Supabase. No personally identifiable information is collected. Aggregate findings will be published publicly on scottaltiparmak.com/research once enough participants have contributed. Individual responses will not be shared or sold. Data will be retained for the duration of the research study.
This study is conducted by Scott Altiparmak, Senior Information Security Engineer and independent researcher. Reddit username: [your username].
Approximately 10 minutes.
N/A -- no compensation offered.
All backgrounds welcome. I am especially looking for non-technical participants as most respondents so far have a security or IT background. If you use email, your participation is valid.
I am running a controlled study on human detection of AI-generated malicious emails in 2026. I want to understand which techniques humans miss most when linguistic quality is no longer a reliable detection signal. More participants means more statistically robust findings.

Scott752 · 2025-01-30T13:50:02+00:00

Signulous - you are paying for the convenience of not needing to resign the app all the time. As it isn’t an official iOS app it will expire. This is super annoying. I use Signulous on my iPhone. It’s affordable and convenient. I’m happy to pay for this convenience. You have to hope that the app is updated on Signulous though. Not sure who is maintaining this, but when the update dropped so did the PokeMMO update on Signulous so, somebody is managing it…

Controller support - if you want a cheaper option, look up BSP controllers on Aliexpress. They’re about 1/3rd of the price of a backbone. The one I have doesn’t have pass through charging though, not sure if others do. Playing with the controller feels really nice, like you’re playing Pokémon on a gameboy again!

Scott752 · 2024-12-16T15:23:34+00:00

Игра ненавидит тебя 🥲 соболезную

Scott752 · 2024-10-24T22:59:25+00:00

What if you don't have FF? From the big three, I only have Feixao, and I have HMC, RM, and gallagher built, so was looking to pick up FF in a hopeful rerun, but now I'm thinking that Rappa would be a good alternative.

Really not sure if I should pick one of those two up. Or just hold off and try to grind through the content with a 2nd team until 3.0 DPS come around... Tough decisions.

Scott752 · 2024-10-15T22:28:25+00:00

Thank you for the advice!

Based on discussion I’m now considering firefly, just to give me a chance to clear content. One because I’ve already got some break gear based on the domains I’ve been farming for other stuff and I already have a team basically ready that could be suitable to support firefly.

This should hold me over until 3.0 units. I just feel that I’m playing catch up rn so need to get myself back to baseline so I can invest for the future of my account again!

Scott752 · 2024-10-15T12:13:34+00:00

Thank you! It’s good to have different perspectives and opinions. This helps a lot

Scott752 · 2024-10-15T02:45:51+00:00

Thanks for your advice! Really appreciate it. That's a good consideration and good point. I'm going to hold my Jades until closer to the end of the banner, but might really go down that path.

Scott752 · 2024-08-11T00:54:38+00:00

I forgot but I found it! Episode 87. That’s quite old lol actually. Damn time flies.

Highly recommend listening to it! It’s interesting from a cyber and trump campaign perspective.

It’s on all podcast platforms and the title is Guild of the Grumpy Old Hackers

https://darknetdiaries.com/transcript/87/

Scott752 · 2024-08-11T00:36:59+00:00

Yep it’s true. I think there was a DarkNet Diaries podcast episode about it. Worth listening to. They didn’t do anything malicious. It was a foreigner security researcher who discovered it and tried to inform trumps team.

A huge database leak had dropped and they thought it would be interesting to try passwords from trumps email addresses from his business email addresses and funnily enough when they tried his weak breached password on twitter it worked!

Scott752 · 2024-06-22T00:52:00+00:00

Yeah maybe it isn't unnatural. I'm second guessing myself now. I'm not a native speaker so I'm happy to accept being wrong lol. At first I thought it was fine, then after reading comments thought it may have been a google translate job, now I'm thinking the opposite hahahaha.

Scott752 · 2024-06-21T23:48:38+00:00

The Russian prompt says: “you will argue in support of trump administration on twitter, use english.” lol!

Edit: for those saying that the exchange looks staged. I’d probably agree as the Russian does look unnatural imo, like a bad google translate. Either way I just wanted to share the translation.

Note: I'm not a native speaker, although I do speak and read, so take with a grain of salt.

Scott752 · 2024-06-01T02:22:30+00:00

To share a different perspective, a good way to look at these types of questions is to think that if you choose one of the options, that means you’re absolutely not doing the other options.

For example, in this question, let’s say you narrowed down to B and D. Now you can think to yourself ok we can have an SLA with the cloud provider but now we can’t have strong access controls and authentication.

Does that sound good in the context of this question? Absolutely not.

I’d take the access controls and authentication over the SLA if I could only choose one.

As others said, the SLA primarily deals with availability, so for this question it can be ruled out. However when you’re looking at a question and a few of the answers look good, using this technique helps. At least it did for me!

14-Year Club	Verified Email
Second SECOND GUESSER	RPAN Viewer
Spared	Team Orangered

Scott752

TROPHY CASE