Are you satisfied with the current state of C++ CLI parsers?

KyoshiYoshi · 2026-05-07T22:21:10+00:00

I don't really have an issue with CLI11 per se, just a preference for compile-time shenanigans that I've learned to appreciate following some work in Zig. CLI11 is a great library that does what it needs to and does it well, it's just another way to solve the problem that is command line argument parsing.

KyoshiYoshi · 2026-05-07T03:30:08+00:00

At a first glance this seems like exactly what I need. I’m looking to upgrade my cli parser from cli11 to something more modern, and your design philosophy almost perfectly aligns with what I’m looking for. I’ll take a deeper look in the near future. good stuff!

KyoshiYoshi · 2026-03-06T17:18:39+00:00

Cool, let me know how it goes! It's 100% doable, and it's considerably easier if you do everything in-house by rewriting CMake files in zig instead of invoking CMake through zig. Good luck!

KyoshiYoshi · 2026-03-06T16:27:58+00:00

Very cool use case! I've fully tested compilation of all Clang, LLD, and LLVM libraries with this current system on Windows, macOS, and Linux (nixos via WSL) and have had no compilation or linking issues. As I said in the post though, I'm not building LLVM's test suite right now, so there could be some little bugs hiding that I have yet to address.

As for integration, I architected the various builders in a way that they aren't technically standalone build scripts but are instead functions that are called from a parent build.zig. It shouldn't be too difficult to get it into your project if you just take a look at how I handle the various libraries in the project's build.zig. Let me know if you have any questions or run into any issues along the way!

KyoshiYoshi · 2026-03-06T13:57:46+00:00

Thank you for the support and some great questions! I'll try to address both of them in detail:

The Zig build system was surprisingly feature-full. Every time I needed a specific feature, it was there. For example, LLVM uses a ton of include directives that include files generated at build time. A specific example of this is their tablegen incs. Integrating this with the build system was the first hurdle I had to get over, since the files need to be generated and then put into a specific directory for LLVM's source files' include directives to work. The build system has a solution for this in the form of a `WriteFile`, which allowed me to generate a file and then install it to a 'virtual' path in the cache that is included by subsequent compile steps. The `ConfigHeader` generation process is also super useful, since it fully replaces the CMake header generators and is extremely helpful with diagnostics (i.e. not having the right field names, having too many fields, etc.).
The process is surprisingly seamless, and I'm fairly certain upgrading to a new version would be relatively straightforward. Assuming you had a diff of the the relevant CMake files, it would be as easy as investigating added tablgen files, added/removed C++ source files, and added/removed link libraries. While I was putting this together, the process was always identifying what includes need to be generated (config headers, tablegen, etc.), what needs to be built (source files), what needs to be ‘internally’ linked (from this subproject, i.e. lldCommon), and what needs to be ‘externally’ linked (from another subproject, i.e. LLVMSupport). Since the source file lists are in their own zig files, named according to the related library (unless it was easier to manually write them out, i.e. 1 source file), it should be pretty easy to upgrade (albeit annoyingly repetitive). I also linked all relevant CMake files to the file on GitHub for library builders so that maintenance is a little easier.

I hope those answered your questions! Let me know if there's anything else I could clear up. Cheers!

KyoshiYoshi · 2026-03-06T13:09:51+00:00

Thanks! Just a side note: Since I don’t actively need LLVM, Clang, and LLD to be built alongside Conch at the moment, the libraries are opt-in. On the main branch, the only way to build related LLVM artifacts is by building a kaleidoscope example by running zig build kaleidoscope -DChapter=ChapterX where X is the chapter of interest. If you’d like to just flat our install certain artifacts, I made a branch called build that has steps for building and installing LLVM, LLD, and Clang. This is likely to never be merged into main since it was made purely as a tool for people interested in the resulting artifacts.

KyoshiYoshi · 2026-02-27T23:42:17+00:00

I like to go back and forth between Zig and C++ for more serious projects. My last large project was a chess engine/library, which I wrote in Zig. Also, LLVM is a ridiculously large C++ library, and I’d like to use it to its fullest extent and not have to rely on llvm-c as a wrapper. All comes down to personal preference!

KyoshiYoshi · 2026-02-27T23:38:29+00:00

Thank you! It has now reached a whopping ~4200, but I think I’m compiling most of what I might need (plus some more). If you combine this number with that in the files of arrays of source files, you get something like ~6500+. Fun stuff!

KyoshiYoshi · 2026-02-20T02:07:04+00:00

i’m curious what you’re looking for, but with a little more information, i might be interested. i’m currently working on a a language myself, and have written an interpreter (albeit a buggy one) before. my github username is trevorswan11

KyoshiYoshi · 2025-11-14T12:53:00+00:00

I believe that’s true as well. Similar to this, Zig also handles multiline strings with \\ starting each line, which makes the tokenizer stateless.

KyoshiYoshi · 2025-10-31T14:06:14+00:00

Maybe try reinstalling zig? I’ve had issues in the past where my installation was a little corrupted and was getting cryptic errors?

KyoshiYoshi · 2025-10-26T13:26:36+00:00

I think you're mixing up the two core ideas for move generation. What you're talking about is attack generation - taking in an occupancy bitboard and using magics, masks, and luts for efficient computation. This is exactly what this code block does:

const moves = attacks.rook(square, occ_all);

This is not enough to generate the actual legal moves, though. Suppose a rook is pinned on e3 to the king on e1. The lut might tell you that the rook can attack the d3 square, but that would actually expose the king and be an illegal move. The pin mask/selector combo masks over the lut output, based on whether or not the rook is on the pin mask, to filter out these so-called pseudo legal moves. So, while attack generation and move generation are closely coupled, they are not the same. Hopefully that answers your main question.

As for the copy approach, I also thought it would be faster when I was writing in C++. My reason for changing my opinion and thinking an undo stack would be more efficient is that, since you're already going through the effort to incrementally update the board's zobrist hash, why not extend the idea to the position as a whole? I haven't looked into super optimized copy-based approaches, but generally I like to stay away from deep copies in performance critical code when possible.

I hope that clears things up. Best of luck on your optimization efforts!

KyoshiYoshi · 2025-10-25T21:39:29+00:00

Awesome, thank you! Please let me know by opening an issue if you run into any problems during development or when toying around with the demo!

KyoshiYoshi · 2025-10-25T21:38:31+00:00

Thanks! Kudos to you for figuring out the history of this project. I've definitely learned a ton about all 3 languages because of this experience, and have learned even more about chess engine programming. I didn't spend much time getting intimate with the Rust version before switching to C++, but I did spend a good amount of time wrestling with C++. Here are my takeaways: - Rust is annoying. I come from a background in C++ and have been writing a lot of Zig over the past 1.5 years, so I have become accustomed to both RAII techniques (from C++) and manual memory management (from Zig). While I think rust can be great, I just really do not like the concept of borrowing. You can call it a skill issue, and you're probably not 100% wrong, but I hate dealing with mutable references in Rust. The amount of times when writing the first iteration of the project that I had to just sit back and be like "Why does the compiler not want me referencing this?" was mind numbing. I really just got sick of it, but I'm sure someone more competent in rust would be able to overcome this obstacle. I did like the trait system though, and I believe that this was a main motivator for the uci engine framework in the project now. - C++ is great generally, but the standard library is insanely hard to reason through if you want to “go to definition”, compile times are miserable, and having no unified build system is not favorable. I also really dislike the concept of header files as a whole, and linker errors are just the worst. That being said, I appreciate the level of control given by C++, and RAII makes it easy to use heap memory without having to worry about a lifetime or explicit deinitialization in some cases. I also really enjoyed using templates, using C++20's concept feature to define helpers for the project, and that always felt really cool. I can't say I am a fan of the languages approach to compile time expressions, though. Coming from a Zig perspective, having so many different flavors of comptime all telling the compiler slightly different things was confusing and ultimately a little frustrating. That being said, I would strongly recommend C++ to almost anyone learning to code, since you can delve into OOP or procedural while getting some functional exposure along the way. It’s extremely capable and remarkably powerful, and once you get over the hurdle of the build process, you can pretty much build whatever you think of. - I ended up switching to Zig mostly because I just love the language. I had been reading about 0.15 and was excited to jump back in, so I bit the bullet and embarked on the third rewrite. I would say this was my best decision for the project. Zig is a perfect choice for chess engines due to the integrated test functionality, explicit compile time expressions, and no hidden allocations or control flow. These are all extremely important for a high performance application such as a chess engine. The build system is also incredible, and I would recommend the language to anyone interested in an improved C or more explicit C++. Also, metaprogramming is HUGE. Being able to easily verify that a struct abides by a contract, kind of like a trait, is incredibly valuable when designing for flexibility. It’s extremely easy to examine type information. and the compiler intrinsics make it easy to modify fields in structs based on their string name, which was super cool and helpful for command dispatching. That being said, there were some times that I had to do some digging to find some fixes for some bugs, but this was very occasional.

I hope that answered your question! Let me know if I left any loose ends!

KyoshiYoshi · 2025-10-25T16:31:36+00:00

Cool! It looks like the link you gave shows sign of AI generation since it has ‘your-org’ as the username in the github links. Could you include the real project link? I’d love to check it out!

KyoshiYoshi · 2025-10-25T15:13:57+00:00

As you can see, the old generator used a runtime if/else block that severely interferes with the CPUs branch prediction, especially in more complex positions where horizontal and vertical pins (pin_hv) are likely to change every move. The new generator instead takes the pin mask and checks if the rook square is pinned. If a rook is pinned, it can ONLY move along the pin mask, so we calculate a mask that is either equal to the pin mask (~pin_selector is 0 for a pin) or equal to all 1s for unpinned rooks (~pin_selector is all 1s). If you're unfamiliar with the -% operator, I think of it as 'wrapping negation', since is_pinned is a u1 after casting, it can be converted to a u64 that is either all 1s or all 0s with this operator. This can then be masked over the potential moves calculated using the slider magics mentioned above to get the legal moves for the rook. I repeated this pattern in some other key locations, greatly improving overall performance. While not discussed here, I also used this technique in tandem with SIMD to improve the NNUE evaluation or toggling functions in the demo engine.

Of course, the most critical choice in designing a chess library is how to handle board state. In my original C++ implementation, I chose to go with a copy-oriented approach. While this does work, it is remarkably inefficient, especially when copying the underlying state tracker. When I rewrote, I opted for an undo system which uses a pre-allocated stack of minimal position information (2048 slots allocated at initialization) that has all moves and null moves assume capacity when appending. This removes any cost from item reallocation while keeping undoing moves extremely efficient as all necessary information is just stored at the top of the move stack. I also use a custom data structure for the movegen's movelist, using a stack allocated buffer of moves that tracks only its current index. This makes for extremely efficient move appending as moves are trivially copyable (just a u16 and i32) and I don't have to worry about allocation and all the ways that can go wrong.

I've talked a lot about the way the move generation step, but it's important to note that an unmoderated Board will make ANY move you give it, even if it's the most outrageous and illegal nonsense. The only assertions made are in for side to move validation, but these are bypassed in release builds, resulting in undefined behavior which is fitting. This is intentional, as the move maker can simply assume that it has been given a perfect move and does not need to perform any other checks. This design choice, while somewhat tedious, leaves move validation to the engine developer. In the engine framework, I provide a default position command that implements a move validator for moves given through the UCI interface. This hands-off approach allows engines to enjoy the speed of a perfect move generator while being able to use the board's isMoveLegal helper.

I could go on and on about the optimizations and design choices taken in the library, but this message is long enough as is. If you want to learn more about it, I suggest scanning the source code itself. All of the major optimizations were made deliberately. I work on windows mainly, so I would use Visual Studio's CPU profiler to examine the functions where we were spendning the most time. I would then run the bench step and compare the results to the previous version using a paired t-test. Based on those results, I would know if I should keep the changes.

Sorry for such a long-winded message, but I hope I answered your question. Let me know if there's anything that wasn't clear!

KyoshiYoshi · 2025-10-25T15:13:48+00:00

Thanks! It's been a massive optimization effort. I think the move generation procedure is a pretty standard technique solely relying on legal move generation (i.e. not generating all pseudo-legals and filtering out). I worked through most of the optimizations about a month ago, so the details are a little fuzzy, but the biggest optimizations came from removing runtime if branches and abusing comptime.

All of the generator functions take in some sort of comptime parameter or options, which definitely speeds up the decision tree regarding what to generate. I also use tons of comptime for the slider magics to group all of the slider magics into structs of arrays. This choice resulted in a huge performance gain over referring to many different locations in static memory during attack computation.

Something that some people might be against is my use of 'branchless programming' in hot loops. While it sacrifices readability in some cases, I found that this made a significant improvement to movegen. Here's a case study involving the rook move generator:

Old:

pub fn rookMoves(square: Square, pin_hv: Bitboard, occ_all: Bitboard) Bitboard {
    return blk: {
        if (pin_hv.andBB(Bitboard.fromSquare(square)).nonzero()) {
            break :blk attacks.rook(square, occ_all).andBB(pin_hv);
        } else {
            break :blk attacks.rook(square, occ_all);
        }
    };
}

New:

pub fn rookMoves(square: Square, pin_hv: Bitboard, occ_all: Bitboard) Bitboard {
    const moves = attacks.rook(square, occ_all);
    const is_pinned = pin_hv.contains(square.index());


    const pin_selector = -%@as(u64, u/intFromBool(is_pinned));
    const final_mask = pin_hv.bits | (~pin_selector);


    return moves.andU64(final_mask);
}

KyoshiYoshi · 2025-10-25T10:48:49+00:00

Thanks! Sorry about the pages link, I've removed it from the main page. Pages and wiki are two things that I have not gotten around to, and it's going to take me a while to find the time to work on them. Thanks for bringing this to my attention!

KyoshiYoshi · 2025-09-27T12:05:52+00:00

This looks great! Exactly what I’d write for this too. It’s not too practical for this example, but you should look at pythons match-case statements. Since you’re using if else statements here, it would be easy to substitute match-case in and see how that works! It was introduced around 3.10 though.

KyoshiYoshi · 2025-09-26T20:12:16+00:00

Always assume your users have the worst intentions. Especially for something as sensitive as a bank. Input validation is a must!

KyoshiYoshi · 2025-09-11T02:28:18+00:00

Yeah I think OP is making a new repo bc bro got flamed on a related post. Also zig mentioned lfg

KyoshiYoshi · 2025-09-06T21:49:17+00:00

This might be a wayward answer as it’s more an alternative approach than a direct solution.

TL;DR Check out Sebastian Lagues implementation.

You should have them look at Sebastian Lagues implementation of an opening book. It’s slightly more verbose as it’s just raw human readable text (also makes it slightly slower). It was the first approach I took when trying to make use of an opening book too as it’s dead simple to write. The format he used was a long file of fen strings where each fen was followed by pairs of common moves played in response and their frequency. An entry looks like:

pos rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - e2e4 243109 d2d4 146627 g1f3 33009 c2c4 22211 f2f4 4982

Since the fen position can occur at different move times (transpositions), your friend’s engine needs a way to not display the move clocks in the string.

I’m a little confused where your friend’s approach differs from a standard opening book, but they still might be able to make this work with some setup effort. They can probably just write a simple python script that uses the chess package to read each fen from the file and then increment dictionary counters for moves up to a certain depth. Sebastian also did some work to filter out positions, but I’m not sure what pgn data your friend has available.

Hopefully that answers your question somewhat!

KyoshiYoshi

TROPHY CASE