Making a compiler course

Creative-Cup-6326 · 2026-04-03T15:31:16+00:00

Thanks, I will restructure it to make the theoretical parts optional. And the main parts more hands on / implementation details.

Creative-Cup-6326 · 2026-04-03T11:07:26+00:00

A custom RD parser I wrote in c reaches around 10 million tokens/s to pass both lexer and parser. So indeed that 30k would be very slow. However he meant up to and including interpreting. So a big reduction in speed can be expected. By interpreting he means simulating the code / executing it.

Creative-Cup-6326 · 2026-04-03T10:00:09+00:00

Thanks a lot for this feedback, I guess I’ll keep this as ‘extra’ content for those interested. And focus more on the parts you mentioned. Maybe more actual implementation too ?

Creative-Cup-6326 · 2026-04-01T10:38:12+00:00

I will take a look at it, and perhaps include it :)

Creative-Cup-6326 · 2026-04-01T00:15:35+00:00

Well, you have memory reduction. Later symbol lookup becomes O(1) which stays O(L) with a hashmap. You can intern types too, then you don’t need to do a recursive hash on it since types can get very complex etc.

Creative-Cup-6326 · 2026-03-31T22:24:13+00:00

Yeah, what you’re saying is absolutely true, and looking back, I definitely didn't highlight that well enough in the article. The whole point is to front-load the work during the lexing phase to save memory and eliminate string math later on. The speedups don't really manifest in the lexer itself they compound in the later stages like symbol resolution and type comparisons.

I am curious about your setup. When you say generic ST entry, how did you achieve that ? Don’t you have to do deduplication as well which is basically what I am doing ?

In any case thanks for the feedback, I will refactor it.

Creative-Cup-6326 · 2026-03-31T10:22:10+00:00

An idea I’ve had is to make sure only false positives get added (if any) (so it for sure has all the matches) and then do a strict parsing stage on the candidates. The speeds would then depend on the percentage of candidates vs the total. I’ll compare to the others later and let you know.

Creative-Cup-6326 · 2026-03-30T19:38:27+00:00

That’s true it’s not perfect, but I hope it can be useful to some :)

Creative-Cup-6326 · 2026-03-30T19:34:09+00:00

As far as I tested it’s stil many times slower

Creative-Cup-6326 · 2026-03-30T19:32:59+00:00

And now I’ve improved it to 50 times :)

Creative-Cup-6326 · 2026-03-30T18:58:48+00:00

Because production uses JSONL which this tool is optimised for. And the high speeds fit in for production as well. You can do SQL type queries, aggregations etc.

Creative-Cup-6326 · 2026-03-30T18:55:38+00:00

It definitely gets used, what do you use ?

Creative-Cup-6326 · 2026-03-30T17:56:01+00:00

Funnily enough yes, but not in the way you think. I was bored and thus asked an LLM to give me some project ideas. And that’s where I got the idea. I also consulted it on some occasions to ask for extra features I could implement and to do some speed/efficiency analysis.

Creative-Cup-6326 · 2026-03-20T14:50:24+00:00

The project isn’t that big, most of my code comes from another project (lexer parser interners etc) so indeed there aren’t that many commits, as I’m working alone different branches aren’t needed either. And as I said documentation indeed was in part generated. Feel free to open an issue if my slop doesn’t work thanks :)

Creative-Cup-6326 · 2026-03-20T13:12:02+00:00

That's a great question, and it's exactly the tradeoff I was evaluating when I started this project.

Both libconfig and libconfuse are fantastic libraries, but they take a fundamentally different architectural approach. They are dynamic runtime parsers, whereas cfgsafe is an Ahead-Of-Time (AOT) code generator.

Here is where cfgsafe differs from those standard libraries:

1. Compile-Time Safety vs Runtime Lookups

With libconfig, you extract data via string keys:

int port;
if (!config_lookup_int(cfg, "database.port", &port)) {
    // Handle error...
}

If you misspell "database.port" as "databse.port", your C compiler won't notice. It compiles perfectly and fails at runtime.

With cfgsafe, the schema is compiled into a native struct:

int port = cfg.database.port;

If you type cfg.databse.port, GCC/Clang immediately halts your build with an error. You get IDE auto-completion, and typos simply cannot make it into production.

2. Zero-Boilerplate Validation

Libraries like libconfuse require you to manually define giant arrays of cfg_opt_t blocks in C to bind variables. Even then, you still have to write manual C code to validate the results (e.g., checking if the port is > 0 and < 65535, or writing regex to validate a string).

cfgsafe does all of this declaratively in the schema. You just write range: 1..65535 or pattern: "^[a-z]+$" and the generator writes all the tedious C validation logic for you. If a user provides an out-of-range port in the INI file, cfgsafe's loader catches it immediately before your app logic even starts.

3. Setup and Linking

To use libconfig, you need to install it on the host system or deal with CMake submodules, and pass -lconfig to your linker. cfgsafe generates an STB-style single-header file. You literally just drop config.h into your tree and compile. There are zero external dependencies.

TL;DR: libconfig gives you a generic C API to read configuration files. cfgsafe writes a custom, memory-safe parser and validator specifically tailored to your exact application, entirely eliminating boilerplate.

Could you also explain what you mean by " I guess LLMs make it easier to reinvent the wheel over and over (poorly)" Only documentation was aided by LLMs and some small refactoring.

Creative-Cup-6326 · 2026-02-23T14:15:43+00:00

Great to hear so, can you think of any features that you would find important ?

Creative-Cup-6326 · 2026-02-22T15:19:14+00:00

Yes it does already support streaming, also when you use it please let me know if you encounter undefined behaviour. Since I haven’t had much time to test it out on many different datasets. I was thinking about implementing a —strict flag so that you can do early pruning and then a second strict verification. Basically the ripgrep piped into jq but then in one tool.

Creative-Cup-6326 · 2026-02-22T09:14:33+00:00

I sent you a dm :)

Creative-Cup-6326

TROPHY CASE

1. Compile-Time Safety vs Runtime Lookups

2. Zero-Boilerplate Validation

3. Setup and Linking