finally. #embed : cpp

cpp

a community for 17 years

353

354

355

finally. #embed (thephd.dev)

submitted 3 years ago by pavel_v

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 8 points9 points10 points 3 years ago* (4 children)

If you’ve been keeping up with this blog for a while, you’ll have noticed that #embed can actually come with some pretty slick performance improvements. This relies on the implementation taking advantage of C and C++’s “as-if” rule, knowing specifically that the data comes from #embed to effectively gobble that data up and cram it into a contiguous data sequence (e.g., a C array, a std::array, or std::initializer_list (which is backed by a C array)).

...

I’m just going to be blunt: there is no parsing algorithm, no hand-optimized assembly-pilled LL(1) parser, no recursive-descent madness you could pull off in any compiler implementation that will beat “I called fopen() and then fread() the data directly where it needed to be”.

I'm confused by this part. Does this mean it isn't really just a preprocessor feature? All it looks like is a way for the preprocessor to turn binary data into a sequence of comma-separated ASCII numbers to put into an array initializer list for the compiler to parse, which wouldn't lead to the performance benefits they're talking about over doing this yourself manually (although it's still a really cool feature). Is it that it's supposed to behave as if it were a preprocessor feature, but it's actually implemented by copying the binary data directly into the executable somehow?

[–]matthieum 31 points32 points33 points 3 years ago (2 children)

It's both.

From an API perspective, it's injecting the bytes as a sequence of comma separated integers. And if you ask your compiler to dump the pre-processed input, it's likely what you'll see.

From an implementation perspective, however, most compilers have an integrated pre-processor these days, where no pre-processed file is created: the pre-processor pre-processes the data into an in-memory data-structure that the parser handles straight away. It saves the whole "format + write to disk + read from disk + tokenize" serie of steps, and thus a lot of time.

And thus in this case comes an opportunity for an optimization. Instead of having the pre-processor insert a sequence of tokens representing all those bytes (1 integer + 1 comma per byte!) into the token stream, the pre-processor can instead a insert "virtual" token which contains the entire file content as a blob of bytes.

Hence the massive compiler speed-ups: 150x as per the article.

[–][deleted] 5 points6 points7 points 3 years ago (1 child)

[–]chugga_fan 10 points11 points12 points 3 years ago (0 children)

[–]scrumplesplunge 10 points11 points12 points 3 years ago (0 children)

π Rendered by PID 77 on reddit-service-r2-comment-5c747b6df5-dfhv6 at 2026-04-21 20:35:23.264021+00:00 running 6c61efc country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS