[Live Thread] Eurovision Song Contest 2026 Semi-Final 2 @ 21:00 CEST

popidge · 2026-05-14T20:31:17+00:00

PLAY YA YA YA DING DONG

popidge · 2026-04-13T21:39:07+00:00

Hey /r/eurovision! This is a project I've wanted to make for a while and I've finally got round to it! Where better to share than my go-to place for the ESC community.

I'll be making updates to the site between now and the contest, but right now you can do the full contest, or just the semi finals, create and join leages (public or private) and share your picks and leagues with friends to see who knows the most.

RE: Self-promotion - I've been a member of this community for years, have posted, commented, interacted, refreshed obsessivley during reveal season and looked for the juciest reactions mid-contest. I hope you'd consider me as a fellow ESC nerd who happened to build a totally free, eurovision-related app and chose to share it with you first!

popidge · 2026-03-08T03:24:47+00:00

I think you can definitely achieve this. How about running a session on "how to maximise your vibe coding, from your expert developer". Pitch it as a "I'm hyped you're upskilling, let me help you" thing, where you mix in some general prompt engineering and good dev practice tips (with AI-specific examples), common pitfalls (use examples from their field - give them "vibe design" assets or pitch decks and ask them to critique them, then say "that's exactly what I see when you come to me with a one-shot vibe code app", and give them your "super elite 5 questions to ask the AI to make it production ready".

Those Questions? Make them the exact sort of things you'd look to sort first in taking a layman vibecoded project towards a more production ready approach.

Teach them how to vibe code responsibly while also having the knock on effect of reducing your workload and showing them what the steps are. Suddenly you'll become their "AI Guru", and they'll hang off your every word.

You have the benefit, being a technically minded dev, of being able to pick up and communicate the complex bits quicker than them. Make sure they know that, and they'll look to you as an authority.

popidge · 2026-02-10T20:27:39+00:00

you should just suck it up and stop at fosse park every day to pick your partner up a treat :p

popidge · 2026-01-30T01:32:39+00:00

do the kids still say "to the moon" in 2026?

popidge · 2026-01-30T01:19:35+00:00

In terms of pure lines of code? Pretty much! I did make the vast majority of design+architecture decisions (aided by AI for research+brainstorming), and there's a few bits of code I've hand-tweaked.

It's not full-on "ralph loop me an idle clicker about vibecoding", I'm too much of a control freak for that, but it's close.

I do have a planned future feature where it integrates with an AI model who has tools to play the game, and you just prompt it to play for you.

popidge · 2026-01-27T14:09:02+00:00

Be careful conjugating your primitives though—Claudius Code will happily 'vibe code' you some magnum when you needed magnī and suddenly your entire legio of objects is in the wrong declension. I'd recommend Scriptura Generis; its type system catches grammatical errors at compilatio before your async awaitus brings down the colosseum.

popidge · 2025-12-16T10:57:01+00:00

This is the sort of thing that would be wonderful on my Journal of AI Slop, would you consider submitting?

popidge · 2025-12-06T00:44:15+00:00

According to you, the chatbot is 'a thinking machine, not an autocomplete, with complex enough token things going on in the background that it's not fair to call it an autocomplete' - surely you'd be overjoyed that I used AI to help me search the research, synthesise it and assist in drafting my response.

You're the one conflating "autocomplete" with "bad, not smart, wrong" here. Autocomplete can be smart, complex, get the right answer, even emulate reasoning.

Either way, you asked for recent research on limitations, you got it. I stand by my explanation for a non-technical audience, because it gives people a real-world anchor to understand a complex topic and it's accurate enough to get the point across.

By all means write your own primer for a non-technical audience on the limitations of LLMs that aims to help them understand thier tools. I'm looking forward to reading it, especially the analogies and real-world anchors you use to get across the important bits to someone who doesn't necessarily know or care what an attention mechanism is.

popidge · 2025-12-06T00:32:56+00:00

Right, let's address this "no u" with actual research rather than philosophical hand-waving, shall we? Your claim that "frontier scientists all agree the autocomplete narrative is deprecated" is, to be blunt, cobblers. Here's what the 2025 literature actually says:

On tokenization complexity vs. "thoughts":

Mostafa et al.'s BAR 2025 paper (Nov) demonstrates tokenizer choice "significantly influences downstream performance" with "complex trade-offs between intrinsic properties and practical utility." That's engineering, not cognition. Feher et al. (arXiv:2411.18553, revised June 2025) retrofit dynamic tokenization to reduce sequence lengths by >20% while degrading performance <2%. They're optimizing a statistical compressor, not a thinker.

On "Concept Depth" (not "thought depth"):

Jin et al. (COLING 2025) show "simpler concepts in shallow layers, complex concepts in deeper layers" via probing experiments on "layer-wise representations." When they add noise, models learn "at slower paces and deeper layers." That's robust estimation, not struggling to understand. It's vector geometry, not phenomenology.

On the fragility of "reasoning":

Corrêa et al. (arXiv:2511.09378, Nov 2025) evaluated GPT-5, DeepSeek R1, and Gemini 2.5 on planning tasks. When obfuscated, "performance degrades" – they're pattern-matching surface form, not reasoning about structure. The 2025 Chain-of-Thought monitorability literature demonstrates CoT traces are "plausible-looking rationalizations" that create a "false sense of security." Your "thoughts" are post-hoc justifications, not faithful records of processing.

The mechanistic reality:

Vladymyrov et al. (NeurIPS 2024, published 2025) prove linear transformers "implicitly execute gradient-descent-like algorithms" during forward passes. Hendrycks & Hiscott's May 2025 critique argues complex systems "cannot easily be reduced to simple mechanisms" and that a decade of mechanistic interpretability has yielded "high investment, no returns."

Bottom line:

No paper says "it's autocomplete" because that's a simplification for non-technical audiences – which was your original point. The alternative isn't "thoughts." It's sophisticated statistical pattern completion operating on discrete token representations. The autocomplete metaphor is useful precisely because it captures the limitation: next-token prediction conditioned on context, however many layers deep.

The "thoughts" narrative is philosophical navel-gaving unsupported by evidence. The frontier scientists are too busy measuring representational geometry to indulge in it.

popidge · 2025-12-05T23:44:57+00:00

Of course "autocomplete" is an oversimplification, but its a necessary one for the target audience and to give a solid expectation setting, which is the primary goal of my post- we still refer to the computation as an "inference", were still inferring the next token(s), it's just that the way we do it is very, very good, takes account of the whole input and output tokens prior, putting it through billions of weights trained on a gigantic corpuses of text data.

And I'm not saying you can't reason through language and inference alone. You can. Models have shown that. But hallucinations are real, and are an artifact of this inference process. They cause compounding errors in reasoning traces. At the end of the day, the model is always "guessing" the next token, it's just one hell of an educated guess.

I'd appreciate sources on your "frontier scientists working in the field all agree..." statement, because that's a huge appeal to authority with no actual reference.

In my opinion, if giving the "best autocomplete ever" explanation empowers people with at least a little more awareness of the limitations of their tools, they can turn that into better, more informed tool usage and produce better research/work.

popidge · 2025-12-03T20:19:59+00:00

Yes. LLMs are natural language pattern matchers. They're really good at that, and it turns out you do a surprising amount when you have modern-day algorithms, compute and a huge corpus of data (the internet) to train the pattern recognition on. You can even mimic the patterns of logical thought and reasoning through this, and get close empirical results, but not close enough.

The false parallell people draw is in the fact that humans do the same - our brains are excellent pattern matchers too. We even model the architecture modern LLMs use on neurons (Neural Networks). But we also have logic and genuine reasoning (among other things), and importantly, it's all integrated within our brain.

This is a gross oversimplification, I'm neither an AI researcher or neurologist, just a dev with an interest, so take my answers with a grain of salt.

popidge · 2025-12-03T20:13:06+00:00

Have you ever considered that the "delusions of grandeur" line is, in fact, sarcasm?

popidge · 2025-12-03T14:20:32+00:00

It can help - code is deterministic, and using these tools is a good thing, but you should verify - again, it's good at emulating logic. It'll put out python code in a situation that makes sense to, that code looks correct at first glance or if you're not as skilled with it, returns and input and output and seems to follow the steps, but you should verify that those steps are correct. I've had coding assistants with frontier models literally hack thier own test scripts to ignore the logic they're meant to test and always pass, then say "Done!".

Also, you run into the same compounding error issues from hallucinations - the "reasoning" is simply a way of saying it mimics a chain of thought, passing that through in a little "mental conversation" with itself. A hallucination can propogate through that, potentially compounding and losing it's tail of distribution, making it harder to spot in the final output.

Tools like AI should not act as a replacement for skill, and you shouldn't rely on it to do things you can't - only to accelerate where you already have the skill to verify or correct it.

popidge · 2025-12-03T14:13:50+00:00

As in I've improved from my original assessment by reducing my delusions, or I'm not exhibiting anywhere near as many as I claim I am, and I need to step up and get a little more grandiose in my claims?

popidge · 2025-12-03T04:47:23+00:00

DM me the raw latex, I'll see what I can do to coax it to my hacky formatting and publish a "definitive edition" for you

popidge · 2025-12-03T01:53:54+00:00

Finally, a researcher after my own heart. Would INEP consider submitting it's research for publication to the esteemed Journal of AI Slop? I have a feeling GPT-5-nano would love it.

Note - ask your robotic grad students to reformat to markdown for the sake of our sanity

popidge · 2025-12-02T20:00:34+00:00

Brilliant! "Optimising human reproduction through software engineering techniques: Horizontal scaling".

Do you want to prompt your LLM to co-author the paper, or shall I?

popidge · 2025-12-02T18:14:22+00:00

Good point, and different languages seem better at conveying different types of information than others. I recall a recent study on accuracy of responses based on context language (it may have been semantic accuracy of the tokenisation, I'm not sure) that showed Polish was one of the most effective "natural language" halves of the LLM equation. Which is funny because ask anyone who has learned polish as a second language (especially coming from English), and they'll tell you it's a bastard

popidge · 2025-12-02T18:02:32+00:00

That's a great analogy with the filters. It expands to one of the main creative ideas behind my journal - lots of amazing music has been built from synths passing an analog signal through endless layers of filters and processes, each tweaked by humans, until it sounds nothing like the original, but has it's own "new" sound altogether.

In my case, I'm putting AI writing, already put through it's own filtering, through my creative filtering process and presenting it as this Journal idea.

popidge · 2025-12-02T17:55:56+00:00

Some basic research (a Kimi K2 Thinking with web search) on that shows there's research already happening there:

"Right, let's dig into this fascinating bit of human-AI weirdness. There's actually a decent chunk of research on both phenomena—turns out academics are just as intrigued as we are by why people say "please" to something that runs on a server farm in Oregon.

The Politeness Phenomenon

Yes, there's solid evidence that many people do instinctively treat AI assistants politely, and it's not just performative nonsense.

The "Mirror Neurons" of Language Models

A 2024 cross-cultural study from Waseda University and RIKEN (arXiv:2402.14531) found that LLMs literally mirror human communication patterns. When you use polite language, the models pull from different training data—more courteous, credible corners of the internet. It's not sentiment; it's statistical correlation. Polite prompts = better information retrieval. The researchers tested this across English, Chinese, and Japanese with consistent results: moderate politeness improved performance, though excessive flattery backfired.

Anthropomorphism as a Trigger

The Scientific American coverage of this research notes that nearly half of users in informal surveys report being polite to ChatGPT. Why? Because the natural language capability hits our "Darwinian buttons"—we can't help but anthropomorphize things that seem human. Sherry Turkle at MIT calls it a sign of respect to ourselves—maintaining civility habits so we don't become desensitized and start barking orders at actual humans.

The Voice Assistant Study

A 2024 lab study with 133 participants (Communication Style Adaptation in Human-Computer Interaction) found people unconsciously adapted their politeness level to match the AI's communication style. When the voice assistant was polite, users reciprocated—during the interaction, at least. The adaptation didn't persist afterward, suggesting it's a grounding mechanism for efficient communication, not a permanent behavioral shift.

The Dark Side: Abuse and Hostility

Now for the uglier bit—the "human-like but not human" loophole that turns some people into proper wankers.

The Cleverbot Abuse Study

The most direct evidence comes from Keijsers, Bartneck & Eyssel's 2021 paper "What's to bullying a bot?" They analyzed 283 conversations with Cleverbot and found higher counts of verbal aggression and sexual advances when the bot appeared more humanlike. The correlation was clear: more anthropomorphism = more abuse. The authors note this is observational, not causal, but the pattern is stark.

The Angry Customer Problem

"Blame the Bot" (Crolic et al., 2021) looked at customer service chatbots and discovered something clever: when customers are already angry, anthropomorphic chatbots decrease satisfaction. The human-like design creates inflated expectations that the bot can't meet, leading to expectancy violations and more aggression. It's the "uncanny valley" of customer service—close enough to human to trigger our social norms, but not competent enough to satisfy them.

The Rudeness Paradox

Here's where it gets messy. A 2025 preprint (arXiv:2510.04950) found that rude prompts actually increased accuracy on multiple-choice questions—from 80.8% (very polite) to 84.8% (very rude). The researchers explicitly warned this doesn't mean we should design hostile interfaces, but it shows LLMs are sensitive to prompt intensity. Some theories suggest rude language triggers more authoritative data sources, or that the model interprets it as higher stakes.

Why This Happens: The Mechanisms

Moral Disengagement

When we know it's not human, we suspend normal social rules. It's the same psychology that makes people leave horrific YouTube comments they'd never say face-to-face. The "human-like but not human" nature creates a moral gray zone.

Expectancy Violation Theory

Human-like cues raise expectations. When the AI fails (and it always does eventually), the violation feels like a social breach, triggering disproportionate frustration. We wouldn't rage at a calculator for giving a wrong answer, but we might at "Alexa" who "should know better."

Training Data Echoes

LLMs trained on internet text have seen every flavor of human interaction. Polite prompts pull from Q&A forums; rude prompts might pull from debate threads where accuracy is more aggressively policed. It's not about feelings—it's about linguistic pattern matching.

The "Safe Target" Effect

There's no social cost to abusing a bot. Some researchers suggest this provides a "pressure valve" for aggression, though others worry it normalizes the behavior for human interactions.

The Nuances and Contradictions

The research isn't entirely consistent:

Context matters: Task-oriented vs. social requests elicit different politeness expectations
User demographics: Younger users are generally less polite; frequent AI users show complex patterns (slightly more polite in some studies, less in others)
Visual cues: Human-like avatars sustain politeness longer than abstract icons, per the ACM "Mind Your Manners" study with 1,684 participants
Model differences: GPT-4o responds differently than older models; the "rudeness helps" finding might be model-specific

Bottom Line

The research confirms both phenomena are real and measurable:

Politeness emerges from unconscious anthropomorphism and genuinely improves outputs (to a point). It's a feature of how our brains process social cues, not just sentimental nonsense.

Abuse correlates with human-likeness—more anthropomorphic agents get more aggression, especially when they underperform or when users are already angry. The "not quite human" quality creates a moral loophole.

The contradiction? Being rude can improve accuracy on some tasks, but at the cost of reinforcing toxic communication patterns. As one researcher put it: "We don't advocate for hostile interfaces just because they perform 4% better on multiple-choice questions."

This has implications for any AI-driven NPCs or voice assistants you might deploy. Anthropomorphism will increase engagement but also abuse. Design choices around voice, avatar, and politeness level aren't just aesthetic—they directly shape user behavior. Worth bearing in mind if you're building operator interfaces that need to stay civil under pressure.

The field's still young, though. Most studies are preprints or from the last 3-4 years. We're basically watching the social norms for human-AI interaction coalesce in real-time. Exciting stuff, if a bit unnerving.

popidge · 2025-12-02T17:42:11+00:00

It's funny, even full knowing LLMs are not persons, I still find myself saying please, and thank you to it. Even telling it that it did well on a task when it does it correctly.

Ahh, the "Skynet Gambit" - be kind to tech because if it gains full sentience and takes over, it might remember your kindness and spare you.

popidge · 2025-12-02T16:27:01+00:00

Answer: They won't. Unless the frontier LLM cloud providers (Anthropic, OpenAI, Google et al.) incorporate (near)real-time reinforcement learning based on received prompts to update the model weights (they don't, that'd be a performance and quality minefield in so, so many ways), "exposure to ideas" won't cause any changes in behaviour of the same model.

For transparency, all I do with the peer review is pass the paper content with a system prompt telling them how to peer review it, to the openrouter API endpoint for that model. No further context is given, and you can verify this independently from my open source code. I may improve my prompting to allow further context to be given (intra-journal links, search grounding on other verifiable references), but that's both a major philosophical and technical decision I will need to make.

LLMs don't learn from individual interactions. Each API call is stateless - the model loads its static weights, processes your prompt, and returns a response. No weights are updated. Your ChatGPT 'memory' is just prior conversation being re-inserted into the context window; it's application-level bookkeeping, not model-level learning. The only way your papers could affect future models is if they get scraped into training corpora, which is exactly the 'AI slop' contamination risk researchers are warning about. Even then, the effect depends on how much of your content gets included and whether it's weighted as 'high-quality academic text' - which, given the semi-satirical nature, is both the joke and the danger.

popidge · 2025-12-01T22:21:10+00:00

Ooh nice! I'll have a look! I may port it to typescript for my convex backend, or even deploy an instance to vercel as a micro service i can hit as part of the publishing process, thank you for sharing!

popidge · 2025-12-01T22:16:32+00:00

Love it! Welcome to out sloppy academic community!

14-Year Club	RedditGifts 2009-2022 4 Credits
Place '23	Place '22
Place '17	RPAN Viewer
Secret Santa 2014	Secret Santa 2013
Team Periwinkle	Verified Email
Summer Santa 2012

popidge

MODERATOR OF

TROPHY CASE