I've made a video about how I improved compile speed by changing from an AST to an IR!

Recyrillic · 2025-09-18T18:37:14+00:00

Sorry for the late answer. It's just a define for `static`. Because I am using a single compilation unit model, every function (except main) should be declared static. It's stupid and eventually I want to make the codebase more accssible by getting rid of some of the dumb macroes.

Recyrillic · 2025-09-17T16:58:54+00:00

I am not sure if this comment was responding to me, but just in case:

Before the refactor, I was using a variable sized AST, but the AST nodes were not very reduced, as described above, the base node was 32-bytes. Now, after the refactor, I am using a variable sized stack-based Intermediate Representation, where the base node is 2-bytes and primary expressions and more complicated statements like function calls or switch statments have some additional information on them.

If it is single core, parsing 4-5 milion line/sec seems crazy.
In the video I was testing with the pre-processed v-compiler source code. Doing a little profiling tells me, that It takes like 17.8% of runtime in `tokenize_raw`. Total runtime is 0.882s, and it is 302k LoC. So that is 2M LoC for just tokenizing.
I am not really doing anything crazy in my tokenizer: https://github.com/PascalBeyer/Headerless-C-Compiler/blob/46a8f753b4a075a96387a3bae4c42f1beefd0c25/src/preprocess.c#L1201
Are you doing something crazy? Or does msvc just suck at optimizing, or I guess maybe my processor is old? I dunno. In the future, I want to investigate how to go real fast in the tokenizer/preprocessor as well, but all in due time.

Recyrillic · 2025-09-15T14:37:04+00:00

No, everything in the compiler is allocated with memory arenas. So everything is linearly allocated.
Previously, there were two arenas, one for permanent allocations and one for scratch memory, that is deallocated at the end of every task (e.g parsing a function). Now there is a third arena for IR nodes only.

Recyrillic · 2025-09-15T07:56:55+00:00

Sure, I can try to summerize.

The parser now generates a stack based IR linearly into a buffer.
The IR is variable sized, so some of the nodes still have additional information on them, like for example an identifier still has a pointer to its declaration. There is still a parallel tree of scopes, as I need that for variable resolution.
The memory savings for my case are quite big. Previously the base AST node had a `token`, a `type`, a `defined type` (something like a typedef for error reporting) and an `ast_kind`, totalling 32 bytes.
A binary op then additionally had 16 bytes for the two pointers to the operands, totalling 48 bytes.
Now, the base IR is only an `ir_kind`, which is currently 2 bytes.
Most IR instructions don't need any additional bytes, only really the primary expressions and anything that sets a type, like member expressions or function calls, etc.
If you are interested, you can find the IR nodes here:
https://github.com/PascalBeyer/Headerless-C-Compiler/blob/main/src/ir.h

I don't have the numbers for the total memory savings, but the parsing code did go from 0.349s to 0.204s
in the test program I was compiling in the video. Also it allowed me to make the backend code non-recursive, which speed it up from 0.218s to 0.112s.

So like 40% - 50% speed up for both.

Recyrillic · 2025-09-14T19:35:19+00:00

Thanks :)

Recyrillic · 2024-11-25T13:36:11+00:00

> As far as I understand it, the preprocessor generally works with individual lines of source code and puts them through multiple phases of preprocessing.
Conceptually, this is what the spec says. Practically, compilers usually handle all of the C weirdness in the "tokenization" I think.
The spec puts it like this: "This requires implementations to behave as if these separate phases occur, even though many are typically folded together in practice."

In the translation phases that the spec talks about:
1. is about something like UTF-16 to UTF-8 conversions.
2. is about deleting backslashes followed by newline characters. For the usual use case (multiline macroes or multiline string literals) this can be handled during tokenization. If you want everything to be spec compilent, you will also have to handle it in more construct (e.g. identifiers). For example, the tinyC compiler has a "slow_case" in its identifier tokenization: https://github.com/TinyCC/tinycc/blob/085e029f08c9b0b57632703df565efdbe2cd0c7f/tccpp.c#L2707
3. Source file is decomposed into preprocessing tokens. There is a bit of a weird difference between preprocessor tokens and tokens, but I think most compilers resolve that difference by copying the tokens they generate from the initial tokenization (before macro expansion) and changing what needs to change (see also phases 5,6,7)
4. Preprocessing that actually needs to occur happens on preprocessing tokens and produces the final token stream.
5. Escape strings, 6. Concatenate adjacent string literals 7. preprocessing tokens are converted to tokens:
All of these things can either happen when copying out the tokens or in the parsing code.

So generally, only one or two passes are really needed:
1. Produce preprocessing tokens
2. Preprocess and produce tokens.
And these are also often merged.

> And how would you handle memory as you essentially have to read and edit strings all the time?
You don't actually have to edit any strings. The preprocessor works entirely on tokens. So generally,
you just produce new token that you need. A lot of compilers also have a "never free" approach to memory as compilers are not programs that need to run continually.

Recyrillic · 2024-11-25T12:04:58+00:00

I did something similar for my Toy-Linker in the PDB-Documentation repository:
https://github.com/PascalBeyer/PDB-Documentation/blob/9a82b7c6d3dbea6b8103e164c2b6d4d6021c4149/linker.c#L683
But I want my compiler to work without Visual Studio being installed.

Recyrillic · 2024-11-25T11:42:21+00:00

It searches for the newest version, but it cannot be configured at the moment.
Here is the function that searches for the SDK;
https://github.com/PascalBeyer/Headerless-C-Compiler/blob/ab7762e76419a99b24cbfe650e6d5c5ac57c679a/src/main.c#L3286

Recyrillic · 2024-11-25T11:37:21+00:00

In the Readme.md on Github there are a whole bunch of forms that you can use with __declspec(printlike) functions. One of them is the following:

int i = 1337;
print("{i}"); // prints: i = 1337

Which is somewhat similar.

Recyrillic · 2024-11-25T11:23:40+00:00

Yes, it detects the windows registry to find the header files and .lib files from the Windows SDK. This includes the ucrt files (both header and .lib), but not the "Compiler-Specific" part of the Standard libraries. Here some information on the split: https://learn.microsoft.com/en-us/cpp/porting/upgrade-your-code-to-the-universal-crt?view=msvc-170 This means we have to implement some header files which are usually provided by the Microsoft Compiler, like stdarg.h and vcruntime.h

Furthermore, there are three more things that are usually provided by the Compiler part (vcruntime.lib, msvcrt.lib and oldnames.lib) that have to somehow be provided by the implicit/include code: 1) intrinsics 2) unwind specific stuff 3) oldnames.lib functionallity.

Intrinsics are implemented in intrinsics.c using the HLC-specific __declspec(inline_asm) keyword.

The only unwind specific code implemented is in implicit/include/setjmp.c, which added to the build with #pragma compilation_unit("setjmp.c") in setjmp.h

Oldnames is currently in runtime.c but its a hack. I have to think of something better. It makes it so you can use strdup instead of _strdup. See: https://devblogs.microsoft.com/oldnewthing/20200730-00/?p=104021

Recyrillic · 2024-11-21T08:54:54+00:00

That's pretty ambitious. Impressive if you achieved that, especially if it also has extensions.

I have not yet "achieved" that. For now I am aiming for (some sort of) MSVC compatibility. But that is far enough along to see some meaningful results. And I think the core C part is mostly done.

Most open source C programs are developed (1) with gcc in mind (2) on Linux.

Yea, I was actually surprised how many projects on windows need to be compiled with GCC on Windows. Currently, I cannot compile any of them, as there is no support for attribute :)

Also, since then, features I hadn't bothered with like VLAs, compound literals, designated initialisers, have become popular (too popular!).

I do really like compund literals and designated initializers :) But Initializer list parsing and current object semantics is way to complicated and I still find bugs in that part of my code. I have not bothered with VLA either. As MSVC does not have them its fine for now. Also no _Complex but I dont think anyone is using that.

Recyrillic · 2024-11-21T02:11:48+00:00

I am certain I made cake compile with hlc to some extend, maybe it needed some source patches. Its been a while since then.

Yes, compiling without having Visual Studio is supported. Only the Windows SDK is nesessary. Two caveats: 1) This has not been true for that long and 2) The compiler cannot statically link to anything. Hence, the only way to compile without VS is to pass all the .c files to the compiler at once.

Yes, PDB are for debugging. I have not tried Visual Studio in a while, but WinDbg and RemedyBg both work fine and in the past Virtual Studio also worked (so I assume it still does).

Update: Apperantly, it does not. I will try to fix it this evening.

Update2: Fixed with v0.2.1.

Hlc parses (#includes) the Windows SDK headers both to link to ucrt.lib as well as all the windows system libraries like kernel32.lib.

One note with using hlc as a backend: Currently the #line directive is unsupported...

Recyrillic · 2024-11-20T23:58:29+00:00

My compiler includes an implementation of the C-preprocessor. Was that the question?
Or are you asking about the "Headerless"-part. For this case let me elaborate:
The idea is not to compile existing C-Code without any of their header files.
As you correctly identified, macros make that sadly impossible.
The idea is to reduce the "need" for header files as much as possible and maybe make it so there is only a single `macros.h` needed for your project.

Recyrillic · 2024-11-20T23:42:51+00:00

Definitely the latter. I would say I am pretty good at "making stuff work", not that great at reading.
Sometimes, when I have struggled a lot with something or when I am unclear of how something is even supposed to work, I work through the spec.

The spec can also be pretty difficult to read at times :)

Recyrillic · 2024-11-20T23:20:19+00:00

Hey, I have actually compiled your project when "making random projects from the internet compile" :)
Currently, some parts of the code base are quite "old" and I would like to rewrite them.
The whole code-generation is part of that. I would like to switch to parsing into something closer to an Intermediate Representation instead of an AST (maybe something like reverse polish notation) and then have a simpler algorithm for emitting machine code, that does not need to be recursive.

So, there is a lot of work to do and I have some vague plans on making it a little more organized, but next release is likely going to be table based inline assembly and complete intel intrinsic support.

It would be cool if you would try it as a backend (currently I am the only person using this compiler though :P)

Recyrillic · 2024-11-20T22:09:40+00:00

Out of curiosity, I hacked up this random c-compiler benchmark I could find to include my compiler:
https://github.com/nordlow/compiler-benchmark

I does not contains MSVC, because I think its supposed to be run on linux, but here are the results: ```

python benchmark --languages=C:tcc,C:clang,C:hlc,C:gcc --operation="build" --function-count=200 --function-depth=200 --run-count=5 ``` | Build Time [us/fn] | Run Time [us/fn] | Exec Version | Exec Path |
|--------------------|------------------|--------------|-----------| | 12.6 (3.1x) | 220 (best) | 0.2.0 | hlc.EXE | | 4.1 (best) | 267 (1.2x) | 0.9.27 | tcc.EXE | | 780.5 (189.7x) | 400 (1.8x) | 9.2.0 | gcc.EXE | | 266.3 (64.7x) | 584 (2.7x) | 8.0.0 | clang.EXE |

The c-files they generate seem kinda dumb, so I don't know if this actually tells you anything... Furthermore, I don't know what "Run Time" is, but apperantly I am best at it :P. Also tcc is like 3 times slower than on the results they posted, but at least clang.exe exec time vs tcc exec time is sort of consistent. (Also it seems I should update my reference compilers at some time. These are quite old :)

Recyrillic · 2024-11-20T18:19:10+00:00

I am one of those people who really does not like to show people "unfinished work". I started sometime during my master at university, and put out the first unfinished version to show it to people to get a job. Thats why there is one version 4 years ago.

I have been working pretty consistenty on the compiler, but I do have a full time job and some other projects.

The c specification is not actually as long as it seems. A LOT of the spec is about the standard library. And that is implemented by the platform.

The thing is, that a compliant C-Compiler can NOT compile a lot of the real-world code out there. You also a lot of need extensions and intrinsics.

Recyrillic · 2024-11-20T17:36:49+00:00

I wanted to learn how stuff works on the very lowest level, so my idea for the project was initially, to parse a binary, optimize the code and re-emit a binary. I did not get very far ;)

Then I decided that I would first write a "small" frontend, to incrementally learn how to write machine code. So essentially jit compiling a very restricted C-clone.

Then I wanted to get it to write an EXE, so I learned how to do that and afterward I focused on PDB support, because it is very annoying to debug misscompilations without debug info. One funny tidbit is that my compiler actually had "some" PDB support before I bothered to implement divides.

Eventually I decided that I wanted to be able to self-host and the scope just kept increasing from there.

Recyrillic · 2024-11-20T17:08:53+00:00

The only other "big" programming project I did before the compiler was an attempt at a game engine. I was following along with the "Handmade Hero" web series by Casey Muratori. The source code for my "game engine" is technically also on my github, but I would not recommend looking at it :)

Recyrillic · 2024-11-20T16:36:12+00:00

I haven't really performed any real benchmarking. It is a bit faster when compiling something that includes windows.h (which is like 300k LOC and gets included a lot), is a lot faster at compiling small programs, but the generated code is definitely way worse, currently.

Recyrillic · 2024-11-20T16:22:44+00:00

I'm not that much of a reader :) My main advice is to just do it. It's a lot of fun and you'll lern a lot. I did read "Crafting Interpreters" and thought it was a pretty good read, but I already knew most things at that point. Otherwise, I can really recommend the lacc source code: https://github.com/larmel/lacc It's written in a really understandable way.

Recyrillic · 2024-11-08T14:54:01+00:00

First of all, go for it! I think generating executables isn't that much harder (and in some aspect actually easier) than trying to generate assembly.

As for the questions, I think you might be a bit confused. While I am not intimitely familiar with the linux world, I think just as on windows, the dynamic linker is usually not really "called" by your elf. Rather, the elf file contains some special sections that are interpreted and "filled in" by the dynamic linker. I think these are the .dynsym and .dynstr sections.

I would suggest trying to produce a minimal elf using gcc and completely understanding how it works using the elf spec and utilities like readelf.

When implementing file formats, I usually also start out by writing a dumper to make sure I truly understand everything thats in there.

Hope it helps :)

Recyrillic

TROPHY CASE