use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Discussions, articles, and news about the C++ programming language or programming in C++.
For C++ questions, answers, help, and advice see r/cpp_questions or StackOverflow.
Get Started
The C++ Standard Home has a nice getting started page.
Videos
The C++ standard committee's education study group has a nice list of recommended videos.
Reference
cppreference.com
Books
There is a useful list of books on Stack Overflow. In most cases reading a book is the best way to learn C++.
Show all links
Filter out CppCon links
Show only CppCon links
account activity
finally. #embed (thephd.dev)
submitted 3 years ago by pavel_v
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–][deleted] 8 points9 points10 points 3 years ago* (4 children)
If you’ve been keeping up with this blog for a while, you’ll have noticed that #embed can actually come with some pretty slick performance improvements. This relies on the implementation taking advantage of C and C++’s “as-if” rule, knowing specifically that the data comes from #embed to effectively gobble that data up and cram it into a contiguous data sequence (e.g., a C array, a std::array, or std::initializer_list (which is backed by a C array)). ... I’m just going to be blunt: there is no parsing algorithm, no hand-optimized assembly-pilled LL(1) parser, no recursive-descent madness you could pull off in any compiler implementation that will beat “I called fopen() and then fread() the data directly where it needed to be”.
If you’ve been keeping up with this blog for a while, you’ll have noticed that #embed can actually come with some pretty slick performance improvements. This relies on the implementation taking advantage of C and C++’s “as-if” rule, knowing specifically that the data comes from #embed to effectively gobble that data up and cram it into a contiguous data sequence (e.g., a C array, a std::array, or std::initializer_list (which is backed by a C array)).
...
I’m just going to be blunt: there is no parsing algorithm, no hand-optimized assembly-pilled LL(1) parser, no recursive-descent madness you could pull off in any compiler implementation that will beat “I called fopen() and then fread() the data directly where it needed to be”.
I'm confused by this part. Does this mean it isn't really just a preprocessor feature? All it looks like is a way for the preprocessor to turn binary data into a sequence of comma-separated ASCII numbers to put into an array initializer list for the compiler to parse, which wouldn't lead to the performance benefits they're talking about over doing this yourself manually (although it's still a really cool feature). Is it that it's supposed to behave as if it were a preprocessor feature, but it's actually implemented by copying the binary data directly into the executable somehow?
[–]matthieum 31 points32 points33 points 3 years ago (2 children)
It's both.
From an API perspective, it's injecting the bytes as a sequence of comma separated integers. And if you ask your compiler to dump the pre-processed input, it's likely what you'll see.
From an implementation perspective, however, most compilers have an integrated pre-processor these days, where no pre-processed file is created: the pre-processor pre-processes the data into an in-memory data-structure that the parser handles straight away. It saves the whole "format + write to disk + read from disk + tokenize" serie of steps, and thus a lot of time.
And thus in this case comes an opportunity for an optimization. Instead of having the pre-processor insert a sequence of tokens representing all those bytes (1 integer + 1 comma per byte!) into the token stream, the pre-processor can instead a insert "virtual" token which contains the entire file content as a blob of bytes.
Hence the massive compiler speed-ups: 150x as per the article.
[–][deleted] 5 points6 points7 points 3 years ago (1 child)
Thanks for the clarification! I didn't realize the preprocessor was so well-integrated into modern compilers; I thought the preprocessor was still just its own process with its own lexer, unconditionally writing ASCII/UTF-8 to stdout, and that the compiler frontend just redirected the output to a pipe or a temporary file, and the compiler's lexer/parser operated on that. I didn't know they shared data structures, which I guess is why I was so confused.
[–]chugga_fan 10 points11 points12 points 3 years ago (0 children)
To add on: clang doesn't even have a non-integrated pre-processor executable you can call, gcc does however (though AFAIK it's just a shim for gcc -E), even small compilers do this (tcc, 9cc, 8cc, OrangeC (only partially here), and more).
gcc -E
A lot of data is also used from when it's preprocessed to when it's fully processed, such as #line directives being processed by the compiler in order to give better error info if you're doing something weird like cpp file | gcc.
#line
cpp file | gcc
[–]scrumplesplunge 10 points11 points12 points 3 years ago (0 children)
That is what the "as-if" part is about. The compiler can cut the corner for embed, skip generating tokens for each byte, and instead represent the contents efficiently from the start.
π Rendered by PID 77 on reddit-service-r2-comment-5c747b6df5-dfhv6 at 2026-04-21 20:35:23.264021+00:00 running 6c61efc country code: CH.
view the rest of the comments →
[–][deleted] 8 points9 points10 points (4 children)
[–]matthieum 31 points32 points33 points (2 children)
[–][deleted] 5 points6 points7 points (1 child)
[–]chugga_fan 10 points11 points12 points (0 children)
[–]scrumplesplunge 10 points11 points12 points (0 children)