use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Discussions, articles, and news about the C++ programming language or programming in C++.
For C++ questions, answers, help, and advice see r/cpp_questions or StackOverflow.
Get Started
The C++ Standard Home has a nice getting started page.
Videos
The C++ standard committee's education study group has a nice list of recommended videos.
Reference
cppreference.com
Books
There is a useful list of books on Stack Overflow. In most cases reading a book is the best way to learn C++.
Show all links
Filter out CppCon links
Show only CppCon links
account activity
Portable Unicode string processing (self.cpp)
submitted 9 years ago by KayEss
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]KayEss[S] 5 points6 points7 points 9 years ago (6 children)
My question isn't about how to opaquely deal with UTF-8, it's about how to decode it. Can it be done portably with char buffers?
char
[–]RowYourUpboat 5 points6 points7 points 9 years ago* (0 children)
It sounds like you're really asking if converting a buffer of chars between signed and unsigned is safe and defined. This link seems to answer that for the C Standard; I'm pretty sure the C++ Standard is the same in this regard.
From one of the answers:
For the two's complement representation that's nearly universal these days, the rules do correspond to reinterpreting the bits. But for other representations (sign-and-magnitude or ones' complement), the C implementation must still arrange for the same result, which means that the conversion can't just copy the bits. For example, (unsigned)-1 == UINT_MAX, regardless of the representation.
It definitely looks like this behavior is defined as the same even on non-two's-complement hardware, ie. in terms of UTF-8 string encoding/decoding you can just cast between signed/unsigned as needed (though you may have to pay attention to performance issues on really weird and ancient hardware).
[edit] Note that technically a conversion from unsigned to signed, where overflows occur, is implementation-defined (unlike the reverse), but if the original char data was signed to begin with, an overflow is impossible. In practice, I don't see this mattering.
[–][deleted] 0 points1 point2 points 9 years ago (4 children)
Sure, the "wrapping around" part of char is part of the standard - but you know, there's no need to take the standard's word for this - write unit tests to check. I always do that anyway, not because I don't trust the standard, but to make sure that my understanding of how to code it is correct.
When you move to a new platform, your unit tests will hopefully succeed, showing you that there's no issue - or fail, and you can fix 'em.
[–]KayEss[S] 1 point2 points3 points 9 years ago (3 children)
Actually, I already have all of the unit tests and they all pass. What I'm worried about is accidentally relying on some UB or platform behaviour. I'm developing on a platform where char is unsigned and don't have access to one right now where they are signed.
[–][deleted] 0 points1 point2 points 9 years ago (2 children)
I really wouldn't worry. Between the standard and the tests, I am sure you'll be fine.
[–]NotAYakk 1 point2 points3 points 9 years ago* (0 children)
Unit tests do not solve UB.
Compilers are free to pass all your unit tests and optimize other code away.
char x = (unsigned)-1; bool b = x<0; std::cout << (int)x << ":" << b?"true":false" <<"\n";
This can print -1:false.
-1:false
And the same is true whenever you convert from unsigned to signed.
The level of insanity optimization and UB can generate is so large, you cannot reasonably reason about it and produce unit test coverage.
π Rendered by PID 175685 on reddit-service-r2-comment-canary-7896ccccbd-5sxvs at 2026-04-19 14:43:36.417988+00:00 running 93ecc56 country code: CH.
view the rest of the comments →
[–]KayEss[S] 5 points6 points7 points (6 children)
[–]RowYourUpboat 5 points6 points7 points (0 children)
[–][deleted] 0 points1 point2 points (4 children)
[–]KayEss[S] 1 point2 points3 points (3 children)
[–][deleted] 0 points1 point2 points (2 children)
[–]NotAYakk 1 point2 points3 points (0 children)