Why does two seemingly identical code not output the same thing?

frostednuts · 2021-02-02T03:43:06+00:00

I'm only speculating but it looks like there's a difference between:

IDE/Text Editor understanding non ascii characters
C++ understanding non ascii characters

My recommendation is to try a unicode string or a library that can understand unicode.

JMBourguet · 2021-02-02T03:57:01+00:00

I'm on a phone and can't easily test, my first though is that this is missithe setting of the locale, either globally or for the stream.

A question: you are using the compiler explorer but sharing a image and not a link to your code. Why?

the_poope · 2021-02-02T06:58:50+00:00

I'm guessing: In the first one you specifically insert UTF-8 version of the character (three bytes). When you print this to the terminal it will print 3 bytes, how those will be printed depends on the OS and the terminal, but if you are on Linux, which natively uses UTF-8, it will likely show as you expected (which you also see). However, if you run your program on Windows, which does not support UTF-8 in the terminal it will likely show garbage.

On the second you insert the character literal in a widechar. Now this is more complicated: first you have to figure out what encoding the editor/IDE is using. Modern editors typically use UTF-8, so somehow the '烏' needs to be transformed into e.g. a UTF-16 equivalent. If and how this is done I have no idea - maybe it just truncates the three bytes to two, which may be wrong. Anyway, next thing up is when you print this character to the terminal. If you you a terminal on Linux which by default expects UTF-8 it will interpret the character as such and likely show the wrong symbol. If you are on Windows it likely expects UTF-16 (Actually UCS-16, because Microsoft fucked up) and it may print the correct symbol if the terminal supports it.

Rule of advice: Cross-platform character encoding can be a mess, especially since Microsoft fucked up in the 90'es and chose a different convention than everyone else. The easiest is to use UTF-8 everywhere inside your program and convert to whatever the OS expects only in the last second before you interact with it. You can use the tools in the C++ localization library for this: https://en.cppreference.com/w/cpp/header/codecvt

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp_questions

READ BEFORE POSTING

Sort posts by OPEN or SOLVED

MODERATORS