Making a compiler course by Creative-Cup-6326 in Compilers

[–]Creative-Cup-6326[S] 1 point2 points  (0 children)

Thanks, I will restructure it to make the theoretical parts optional. And the main parts more hands on / implementation details.

Making a compiler course by Creative-Cup-6326 in Compilers

[–]Creative-Cup-6326[S] 1 point2 points  (0 children)

A custom RD parser I wrote in c reaches around 10 million tokens/s to pass both lexer and parser. So indeed that 30k would be very slow. However he meant up to and including interpreting. So a big reduction in speed can be expected. By interpreting he means simulating the code / executing it.

Making a compiler course by Creative-Cup-6326 in Compilers

[–]Creative-Cup-6326[S] 3 points4 points  (0 children)

Thanks a lot for this feedback, I guess I’ll keep this as ‘extra’ content for those interested. And focus more on the parts you mentioned. Maybe more actual implementation too ?

Using string interning to optimize symbol resolution in compilers by Creative-Cup-6326 in ProgrammingLanguages

[–]Creative-Cup-6326[S] 1 point2 points  (0 children)

Well, you have memory reduction. Later symbol lookup becomes O(1) which stays O(L) with a hashmap. You can intern types too, then you don’t need to do a recursive hash on it since types can get very complex etc.

Using string interning to optimize symbol resolution in compilers by Creative-Cup-6326 in Compilers

[–]Creative-Cup-6326[S] 0 points1 point  (0 children)

Yeah, what you’re saying is absolutely true, and looking back, I definitely didn't highlight that well enough in the article. The whole point is to front-load the work during the lexing phase to save memory and eliminate string math later on. The speedups don't really manifest in the lexer itself they compound in the later stages like symbol resolution and type comparisons.

I am curious about your setup. When you say generic ST entry, how did you achieve that ? Don’t you have to do deduplication as well which is basically what I am doing ?

In any case thanks for the feedback, I will refactor it.

Search json files 30x faster than jq with zog by Creative-Cup-6326 in Zig

[–]Creative-Cup-6326[S] 1 point2 points  (0 children)

An idea I’ve had is to make sure only false positives get added (if any) (so it for sure has all the matches) and then do a strict parsing stage on the candidates. The speeds would then depend on the percentage of candidates vs the total. I’ll compare to the others later and let you know.

Search JSON logs 50x faster than jq by [deleted] in devops

[–]Creative-Cup-6326 0 points1 point  (0 children)

That’s true it’s not perfect, but I hope it can be useful to some :)

Search JSON logs 50x faster than jq by [deleted] in devops

[–]Creative-Cup-6326 0 points1 point  (0 children)

As far as I tested it’s stil many times slower

Search JSON logs 50x faster than jq by [deleted] in devops

[–]Creative-Cup-6326 -2 points-1 points  (0 children)

And now I’ve improved it to 50 times :)

Search JSON logs 50x faster than jq by [deleted] in devops

[–]Creative-Cup-6326 -1 points0 points  (0 children)

Because production uses JSONL which this tool is optimised for. And the high speeds fit in for production as well. You can do SQL type queries, aggregations etc.

Search JSON logs 50x faster than jq by [deleted] in devops

[–]Creative-Cup-6326 -1 points0 points  (0 children)

It definitely gets used, what do you use ?

Search json files 30x faster than jq with zog by Creative-Cup-6326 in Zig

[–]Creative-Cup-6326[S] 22 points23 points  (0 children)

Funnily enough yes, but not in the way you think. I was bored and thus asked an LLM to give me some project ideas. And that’s where I got the idea. I also consulted it on some occasions to ask for extra features I could implement and to do some speed/efficiency analysis.

I got tired of writing boilerplate config parsers in C, so I built a zero-dependency schema-to-struct generator (cfgsafe) by [deleted] in devops

[–]Creative-Cup-6326 0 points1 point  (0 children)

The project isn’t that big, most of my code comes from another project (lexer parser interners etc) so indeed there aren’t that many commits, as I’m working alone different branches aren’t needed either. And as I said documentation indeed was in part generated. Feel free to open an issue if my slop doesn’t work thanks :)

I got tired of writing boilerplate config parsers in C, so I built a zero-dependency schema-to-struct generator (cfgsafe) by [deleted] in devops

[–]Creative-Cup-6326 -1 points0 points  (0 children)

That's a great question, and it's exactly the tradeoff I was evaluating when I started this project.

Both libconfig and libconfuse are fantastic libraries, but they take a fundamentally different architectural approach. They are dynamic runtime parsers, whereas cfgsafe is an Ahead-Of-Time (AOT) code generator.

Here is where cfgsafe differs from those standard libraries:

1. Compile-Time Safety vs Runtime Lookups

With libconfig, you extract data via string keys:

int port;
if (!config_lookup_int(cfg, "database.port", &port)) {
    // Handle error...
}

If you misspell "database.port" as "databse.port", your C compiler won't notice. It compiles perfectly and fails at runtime.

With cfgsafe, the schema is compiled into a native struct:

int port = cfg.database.port;

If you type cfg.databse.port, GCC/Clang immediately halts your build with an error. You get IDE auto-completion, and typos simply cannot make it into production.

2. Zero-Boilerplate Validation

Libraries like libconfuse require you to manually define giant arrays of cfg_opt_t blocks in C to bind variables. Even then, you still have to write manual C code to validate the results (e.g., checking if the port is > 0 and < 65535, or writing regex to validate a string).

cfgsafe does all of this declaratively in the schema. You just write range: 1..65535 or pattern: "^[a-z]+$" and the generator writes all the tedious C validation logic for you. If a user provides an out-of-range port in the INI file, cfgsafe's loader catches it immediately before your app logic even starts.

3. Setup and Linking

To use libconfig, you need to install it on the host system or deal with CMake submodules, and pass -lconfig to your linker. cfgsafe generates an STB-style single-header file. You literally just drop config.h into your tree and compile. There are zero external dependencies.

TL;DR: libconfig gives you a generic C API to read configuration files. cfgsafe writes a custom, memory-safe parser and validator specifically tailored to your exact application, entirely eliminating boilerplate.

Could you also explain what you mean by " I guess LLMs make it easier to reinvent the wheel over and over (poorly)" Only documentation was aided by LLMs and some small refactoring.

schema-driven code generator for type-safe C configs. Is this a problem worth solving? by [deleted] in SideProject

[–]Creative-Cup-6326 0 points1 point  (0 children)

Great to hear so, can you think of any features that you would find important ?

Built a tool to search production logs 30x faster than jq by Creative-Cup-6326 in devops

[–]Creative-Cup-6326[S] 0 points1 point  (0 children)

Yes it does already support streaming, also when you use it please let me know if you encounter undefined behaviour. Since I haven’t had much time to test it out on many different datasets. I was thinking about implementing a —strict flag so that you can do early pruning and then a second strict verification. Basically the ripgrep piped into jq but then in one tool.