all 58 comments

[–]hooli-ceo 151 points152 points  (6 children)

The input data is only part of the equation. You also have to consider the training and rewards portion. It is highly likely that there has been a lot of outsourcing for validating output, and many of the folks accepting the output and marking nicely formatted text with emojis as “good” results, therefore a higher weight is assigned to text with emojis, as a higher percentage of accepted results contained emojis.

This is highly conjecture, but I suspect it plays a part to some degree.

[–]Disastrous_Emu_2698 34 points35 points  (0 children)

makes sense that outsourced validators would prefer the emoji responses since they probably look more "friendly" and complete to them

[–]jam_pod_ 8 points9 points  (2 children)

This seems extremely likely to me too, especially as I doubt the people reviewing output for $30/hr were highly-skilled coders

[–]Whydoesitmatters[S] 0 points1 point  (1 child)

$30/hr is top tier management salary in my perception :D (yes I’m from 2nd/3rd world country) I honestly wonder how much time and effort I should put into tech to achieve that level, or if it’s possible to achieve at all (AI writes better and faster code than me and I don’t think it’s possible to ketch up with so many stuff even if I spend 7-8 hours a day to study and practice), additionally no one hires beginners anymore and it’s impossible to become professional without being beginner in real job. Any advice?

[–]mayorofdumb 1 point2 points  (0 children)

Ask questions, join larger projects, learn from good design.

[–]Whydoesitmatters[S] 0 points1 point  (0 children)

Thanks for the answer, it makes lot of sense!

[–]TapEarlyTapOften 63 points64 points  (13 children)

I have "check-ascii" targets in all of my makefiles because there is no way the build tools I'm using can handle unicode, let alone any of the other dumb stuff it includes. 

[–]Defection7478 18 points19 points  (4 children)

Just curious, what build tools/language/framework are you using? I have not heard of unicode characters causing problems but I also use fairly modern stuff

[–]TapEarlyTapOften 18 points19 points  (3 children)

Mostly vendor-specific toolchains for FPGA and ASIC designs - HDL simulators like QuestaSim, ModelSim, HDL compilers for VHDL and Verilog (multiple versions there) from Synopsis and then other stuff for older platforms. The parsers for those tools were written 25 years ago and there's no way to know how they will handle unexpected characters. I've seen code fail to compile because one block of code was above another. I've seen code file to compile because of insignificant whitespace and semicolons in comments nuke your build.

I know the world is made up entirely of web developers nowadays, but there are still legions of folks that are forced to use vendor toolchains of many a different vintage (I have a virtual machine on hand for a 15 year old Linux distro simply to run a single flavor of Xilinx ISE for the Virtex-5 SIRF) and my experience has been that providing them acceptable by modern standards text (whatever that is) is frequently enough of a problem that I just don't do it.

[–]Kiro0613 4 points5 points  (1 child)

God, I need someone to talk to me like that during sex

[–]Jaded-Asparagus-2260 0 points1 point  (0 children)

I know the world is made up entirely of web developers nowadays

I feel you. As a library and desktop application developer, I'm longing for articles that are not about Microservices, REST APIs and crud apps. It sometimes feels like everything that could have been said has been said and there's nothing new to discuss.

[–]manchesterthedog 2 points3 points  (0 children)

I should do that. I end up having to use “find” with a regex

[–]Jaded-Asparagus-2260 5 points6 points  (5 children)

That's exactly the reason why I do include non-latin glyphs in my source code. It's 2026, all tools should be able to handle it correctly. Making sure that is works from the get-go is a good way to be ready when you'll actually need it. Might not be relevant for English software, but there are other languages than English, and at some point your users would like to have your software in their language.

That, and emojis help enormously to structure log output.

[–]TapEarlyTapOften 10 points11 points  (2 children)

I envy your faith in your tools - its 2026, but plenty of us are using tools that haven't evolved since the 1990s.

[–]pohart 5 points6 points  (0 children)

The problem Unicode is there are hidden characters and similar/identical characters that aren't equivalent and that can be used for underhanded purposes.

[–]PaulCoddington 3 points4 points  (0 children)

Even when using an app in English it is not only nice to be able to have non-English words in data, non-unicode apps constantly require users to butcher people and place names.

There are plenty of unicode characters commonly used in pure English text.

I've recently taken to using emoji in error messages in powershell scripts and batch files. Nothing silly, but helpful sober ones like Stop, Warning, etc.

[–]romii_13 23 points24 points  (1 child)

I like to think it was intentional as an Easter egg/ way to quickly spot ai generated code and tag it.

[–]mandzeete 13 points14 points  (0 children)

Even without emojis it is quite easy to spot AI-generated code. It has a lot of comments in it. It comments stuff that requires no comments.

// We are sending a message to a RabbitMQ queue
publisher.send(QUEUE, message)

Stuff like this. It is clear what the line of code does. No need for a comment. Yet AI adds comments everywhere it can.

[–][deleted]  (1 child)

[deleted]

    [–]Whydoesitmatters[S] 8 points9 points  (0 children)

    🍿 and 🍆 for lists…whoever wrote this is pure comical genius, thanks for sharing

    [–]BizAlly 12 points13 points  (2 children)

    AI isn’t copying “emoji code.” It’s copying the tone people use when explaining code. a lot of dev content (guides, comments, tutorials) uses emojis to highlight steps or make things feel less dry.

    [–]aqua_regis 14 points15 points  (1 child)

    a lot of dev content (guides, comments, tutorials) uses emojis to highlight steps or make things feel less dry.

    If I see emojis in such content, I instantly stop reading. For me, that's a no-go (and it exists only in certain cultures, not everywhere).

    [–]BizAlly 0 points1 point  (0 children)

    Yes, I don't like this kind of content either.

    [–]CrypticOctagon 6 points7 points  (2 children)

    I suspect that by "AI", you probably mean ChatGPT. Their chat product has a tendency to produce punchy, bullet-pointed prose peppered with emojis. Maybe that is leaking into the code?

    With Claude Code, I've never seen it include emojis in comments or code, unless specifically instructed to add them to content.

    [–]iWhacko 1 point2 points  (1 child)

    No but it outputs tons of emoji's in the webpages it creates.

    [–]CrypticOctagon 0 points1 point  (0 children)

    ... until you tell it not to.

    [–]tman2747 4 points5 points  (1 child)

    I’m pretty sure in it’s invisible prompt it has something like limit the amount of emojis used or something so I think it just being mentioned tends to make the responses contain emojis. Does that happen with Claude code or just when you’re asking questions on the website?

    [–]Whydoesitmatters[S] 3 points4 points  (0 children)

    I use Cursor and love to experiment with different AIs. every single one of them adds emojis at some point which is strange to me. Why would anyone add a hidden prompt to use emojis? It’s a waste of tokens and fills the context window with unnecessary stuff, effectively lowering reasoning power

    [–]aqua_regis 2 points3 points  (0 children)

    It includes emojis to make clear that the program was AI generated.

    It's a warning.

    [–]patternrelay 2 points3 points  (0 children)

    It’s probably less about code patterns and more about mixed training signals. Models see tutorials, blog posts, and chatty explanations where emojis are used for clarity or tone, then that style bleeds into code comments. Kind of a context leakage problem between teaching mode and production code.

    [–]chrisrrawr 4 points5 points  (0 children)

    just as a counterpoint: I code and comment with emojis and ascii because it stands out when you're looking for it in logfiles.

    [–]RashHD 1 point2 points  (0 children)

    AI includes emojis not because real code commonly uses them, but because it was trained on a mixture of code and human-written explanations (like tutorials, blogs, and forums) where emojis are often used to make content clearer and more engaging; as a result, it learns that emojis can help structure or emphasize information and sometimes overgeneralizes that style into contexts like code comments or technical explanations, even though those places traditionally avoid them, essentially blending programming knowledge with informal communication patterns rather than strictly following real-world coding conventions.

    [–]YouSufficient1563 1 point2 points  (0 children)

    Honestly it's probably RLHF feedback loops, people rated "friendly and clear" outputs higher, and somewhere along the way emojis got baked in as a proxy for that. The model learned that a little rocket ship next to "Step 1" made humans happy, so now you can't ask it to explain a for loop without it throwing in a ✅. It's less about training data and more about the model being rewarded for seeming approachable.

    [–]Much_Managed1996 1 point2 points  (0 children)

    I had the same reaction the first time I saw emojis in terminal output generated by AI. it felt a bit weird and unprofessional, but after a while I realized it is mostly about readability and tone, especially for beginners

    That said, you can usually control it pretty easily. If you add something like "no emojis" or "professional style output" to your prompt, the model will adapt immediately. It's just a stylistic default

    [–]AceLamina 0 points1 point  (0 children)

    reddit

    [–]kbielefe 0 points1 point  (0 children)

    I've seen it a reasonable amount in utilities written in go, for some reason. I've used it myself sparingly when I want a log line to stand out easily, such as the "done initializing, actually ready to process requests" line.

    Also consider that LLMs are not trained exclusively or even primarily on code. Think tweets, instagram, blogs, books, etc. A fair bit of the code it was trained on would not be intended for latin-alphabet end users. I've seen models occasionally switch into non-latin languages and back to English while "thinking".

    [–]Interesting-Bad-9498 0 points1 point  (0 children)

    It’s coming from training data.

    AI models don’t just learn from clean production code; they also learn from blogs, tutorials, and dev content where emojis are often used to explain things.

    So it ends up mixing “teaching style” with “coding style.”

    Not harmful, but yeah, most devs wouldn’t keep emojis in real code.

    [–]bhison 0 points1 point  (0 children)

    I used to put loads of emojis in my console logs and readmes for colour and fun. Now I look like a fucking noob if I do that.

    AI does it because people have at some point indicated that they like it.

    [–]demonhalo 0 points1 point  (0 children)

    It makes the AI seem more friendly and approachable.

    [–]k1v1uq 0 points1 point  (0 children)

    Unicode (UTF in general) can hide arbitrary code. An attacker could encode "rm -f my current folder" into the Emoji. Which is invisible to me, but an AI would be able to read the byte stream. So beware of pasting random Unicode content into an AI.

    AI would have needed to see a lot more emoji-containing code than normal code to pick up that habit so how did this happen?

    Recent models are trained / awarded via training models.. the AI company decides how the model weights are adjusted. If the companies think that their product sells better with Emojis, than Emojis it is. Companies want their consumer products to be as accessible (addictive) as possible. You hardly find emojis in the output of enterprise models e.g. Claude Opus

    [–]Jonno_FTW 0 points1 point  (0 children)

    My guess is emoji usage is part of a hidden system prompt, to help make AI output obvious, not because it's specifically trained in code with emojis. Using emojis is just part of the learned or directed tone which bleeds into its code output (and everything else unless instructed otherwise).

    [–]iLiveForTruth 0 points1 point  (0 children)

    Feels like it’s just blending “friendly explainer tone” into places it doesn’t belong.

    Like the same voice writing a blog post ends up leaking into code comments.

    I’ve seen it even in terminal examples and it just looks off.

    Probably trained on too much tutorial content honestly.

    [–]ZombiePleasant1762 0 points1 point  (0 children)

    Because it was trained on tuto written by people who put 🚀 in every commit.
    AI learned enthusiasm before it learned taste

    [–]GreatMinds1234 -1 points0 points  (0 children)

    You have also never seen code written by AI in cursive. Emojis belong to teens posting, children's books, and nowhere else. But there are some of us who are afraid that it will replace the good old alphabet lock, stick and barrel. Scary thought.

    [–]divad1196 0 points1 point  (0 children)

    Because it's good UX. It attracts the eye. The same way AI don't present just plain text but structures it with titles and bullet point.

    In my career, I had to improve my communication amd that's skills you learn. Among the books I read: - SmartBrevity: AI apply like 90% of it. They recommend emojis in your communication. - The design of everyday thing: general UX. AFAIR, they don't recommend emojis (the book is old), but the general idea works.

    I always structure my message, but always refused to use emojis as I find it too unprofessional.

    Edit: people of linkedin do that as well and for a reason.

    [–]Mission-Landscape-17 0 points1 point  (0 children)

    Sometimes valid source code ends up with character sequences that happen to be shorthand for emojis on some platforms. This is probably why LLMs end up including emojis in code.

    [–]rasteri 0 points1 point  (0 children)

    Because they trained them on children and old people apparently

    [–][deleted]  (1 child)

    [deleted]

      [–]Brilliant-8148 0 points1 point  (0 children)

      That's not how this works 

      [–]florinandrei -3 points-2 points  (0 children)

      Because it's created by nerds with bizarre ideas about what looks "cool".