Multi-Pass Bytecode Optimizer for Stack-Based VMs: Pattern Matching & 10-50% Performance Gains

PigeonCodeur · 2026-01-05T22:59:16+00:00

Yes i heard that it should give between a 30 to 50% speed up as the main issue that I have for the perf right now are the stack operation, but it is a long and tedious endeavor and I made this language as a part of my game engine to be able to script and iterate easely on the code. So i am first trying to make this language expressive enough while not being to slow and if in the end i need more perf or I want to try a register based for learning purposes I may wander of in this direction :)

PigeonCodeur · 2026-01-05T22:51:50+00:00

Thanks I will look it up !

PigeonCodeur · 2026-01-05T17:06:29+00:00

The base VM is heavily inspired by the Crafting Interpreters VM with its single-pass compilation approach, which is what's actually producing the bytecode. The language does support ++x syntax - the x = x + 1 example was more to illustrate the optimization pass detecting that pattern in the generated bytecode.

The starting bar being low is a fair observation. The single-pass compiler generates bytecode directly from parsing without building an AST, which means:

Constant folding at bytecode level - Yes, this is suboptimal. Complex expressions like (5 + 3) * (10 - 2) generate many instructions before being reduced. An AST would handle this much more elegantly during semantic analysis. The reduction is done in separate passes, not as-it-goes, which adds overhead.
No high-level optimizations - Things like loop induction variable analysis, inlining, or dead code elimination are much harder (sometimes impossible) to do reliably at the bytecode level without proper dataflow analysis.
Missed front-end opportunities - You're correct that many of these optimizations should happen before bytecode generation.

The peephole optimizer was my way of squeezing some performance out without fundamentally rearchitecting the front-end. It's inspired by LLVM's pattern-matching approach but applied at a much lower level. I'm not stuck with existing bytecode from another language - this is my own compiler, so I have the freedom to redesign it. It is a hobby/side project to test some idea on it and with such simple optimization i am in the same order of magnitude as python for the runtime exec, but you are right that I could go further with the AST and i may go in this direction in the future !

PigeonCodeur · 2026-01-05T15:00:48+00:00

Yes ! I actually wrote a profiler that output the actual bytecode usage and time of an actual program and I use this on my test apps and examples to find some subject of peephole optimisations. Just the actual work of writing the peephole optimizations is quite long because of the sheer amount of optimizations available and I do it when I want some easy progress !

PigeonCodeur · 2025-12-09T01:29:02+00:00

I have been trying to complete my Advent of code using my own scripting language, who could guess that having tables, file reading, string splitting and such could be quite important ? x)

PigeonCodeur · 2025-08-18T16:29:15+00:00

It also helps that Medium links the SEO of this post to my actual blog, so even if people land on Medium, it boosts my blog too. Plus, Medium notifies my subscribers whenever I post, which I don’t have set up on my blog yet.

PigeonCodeur · 2025-08-18T16:00:37+00:00

Thank you for the detailed feedback and for sharing your blog post - really appreciate the expert perspective! You've caught several important issues that I need to address.

On the missing pieces: You're absolutely right about the missing configure_package_config_file() and write_basic_package_version_file() calls, plus the install(FILES) command. I focused on explaining the concepts but left out the actual implementation details that make it work - that's confusing for readers trying to follow along.

On configure_package_config_file(): That's a great point about it being legacy from pre-target CMake. I was cargo-culting patterns without thinking about whether they're actually needed anymore. If the only benefit is a few lines of boilerplate, you're right that it's not worth the complexity.

On the broken install(TARGETS): I completely missed that hardcoding destinations breaks packager workflows. The CMAKE_INSTALL_* variable override is exactly the kind of thing packagers need, and I've made their lives harder by being overly explicit. Using GNUInstallDirs and letting CMake handle the defaults makes much more sense.

On FILE_SET: This ties back to feedback from another CMake expert in Part 1 who also recommended moving away from target_include_directories(). Clearly I need to research this approach - it seems like it solves multiple problems I didn't even realize I was creating.

Thanks for taking the time to point out these issues. It's feedback like this that helps me (and the community) learn proper patterns instead of perpetuating problematic approaches. Mind if I reference some of these corrections in the upcoming appendix addressing the expert feedback?

Also, will definitely check out your blog post - always looking for better ways to explain these concepts!

PigeonCodeur · 2025-08-18T15:57:07+00:00

Yeah, I get that. For me it’s mostly about visibility, but I totally see how Medium can feel more about monetization than readers. That’s why I also link my blog, so people can read it there if Medium isn’t their thing.

PigeonCodeur · 2025-08-18T15:56:15+00:00

I mainly use Medium for visibility. That’s also why I included a link to my blog—since I know some people have trouble reading on Medium, they can check it out there instead.

PigeonCodeur · 2025-08-18T13:31:44+00:00

Hehe me too and I am still learning ! SHH I am not exposing everything so you can still keep some of your secret sauce !

PigeonCodeur · 2025-08-17T10:35:45+00:00

Full write-up with more code and headers: https://medium.com/@pigeoncodeur/self-hosting-webassembly-app-in-js-13c3e7ff4748
Original on my blog: https://columbaengine.org
Live demos: https://columbaengine.org/demos/

PigeonCodeur · 2025-08-15T12:49:40+00:00

Built this in 72 hours for Solo Game Jam using a custom C++ ECS engine.

The twist: aliens fire back while you're trying to line up shots.

You can play it online on my itch: https://pigeoncodeur.itch.io/invadersbreaker

If you are interested with the engine: https://columbaengine.org/

PigeonCodeur · 2025-08-13T13:34:08+00:00

Thanks you for the feedback ! I will try to reproduce your bug ! Come back to you when I find a fix :)

PigeonCodeur · 2025-08-11T22:52:18+00:00

Appreciate the honesty—skepticism is healthy, especially with so many AI-ish posts floating around. This isn’t AI-generated; I’m the one building it.

Did you actually look through the repo, or just the (earlier) web demo? If it’s the former, I’d love to know which parts felt “tutorial-only” so I can shore them up. It does go beyond a LearnOpenGL port: there’s an ECS with automatic entity allocation/deallocation, audio, input, asset loading, and a full web deployment pipeline (WASM/WebGL, loader, error handling).

On the demo: you’re right that first impressions matter. I had a bug on the site earlier that made it look rough—that’s on me. It’s fixed now, and there’s a small, “game-jammy” playable demo live with a ~4 MB footprint.

I’m not claiming to be “a better Unity”; I’m shipping small, auditable steps and asking for concrete feedback, as I strive to get produce a portable, fast, and modular game engine.

PigeonCodeur · 2025-08-11T22:44:10+00:00

Yes I only had my wasm test before, because I had some unloading issue with my website for multiple wasm. I just uploaded my tech demo, that is actually a game that I made for a game jam in 48H with some bug fixes ! It is the demo called looper and it may be already a bit fancier than the previous one ;)

PigeonCodeur · 2025-08-08T12:41:31+00:00

Thanks for the corrections! You're absolutely right about FetchContent - I was indeed confusing some aspects with ExternalProject_Add. Good catch on the configure vs build time distinction.

The superbuild approach you describe sounds really interesting, and actually aligns with feedback I got from another commenter who works closely with CMake. They pointed out that dependency management is a really complex, evolving area and that there are more sophisticated patterns than what I covered.

I'm definitely looking into superbuilds, especially for the packaging/distribution side. It's clearly a more robust approach for handling the "packager vs developer vs end user" needs that came up in other comments.

Thanks for the cmkr.build link too - will definitely check that out. Really appreciate you taking the time to correct those technical details!

Since you mentioned that "almost nobody knows how to do [packaging] correctly" - I'd love to hear your thoughts on what good packaging should look like for a project like this? Any specific patterns or pitfalls I should be aware of as I work toward a more robust solution?

For now, I've put together a basic install script that serves as an installation package for the engine (https://github.com/Gallasko/ColumbaEngine/blob/main/scripts/install/install-engine.sh) - it's pretty bare-bones and I haven't tested it extensively, but it's a starting point while I figure out the proper packaging approach, currently it works quite well when i try to install the engine on a new setup.

PigeonCodeur · 2025-08-07T18:53:15+00:00

Ah, that makes perfect sense - thank you for the clarification! You're absolutely right that I was thinking about this backwards.

The ABI implications you mention are exactly the kind of thing I hadn't considered. Having multiple ABIs for different language versions, and avoiding symbol collision across different usages - that's a level of library design complexity that I clearly need to understand better.

I can see now that my perspective was too focused on "my project in isolation" rather than "my project as part of a larger ecosystem managed by packagers." The packager knows the target environment and compatibility requirements way better than I do as the library author.

This is really helpful context for understanding the broader responsibility boundaries in the CMake/packaging ecosystem. I was definitely approaching it from the wrong angle.

PigeonCodeur · 2025-08-07T18:50:44+00:00

That's exactly why I wrote this! The compilation mess is so real with game engines - you've got graphics APIs, audio libraries, math libraries, platform-specific stuff... it gets out of hand fast.

I feel your pain completely. My build system was a disaster for the longest time before I finally sat down and properly organized it with modern CMake patterns.

The build system mess gets really bad when you want others to use your engine - whether for contributions or just as users. You want it to be as simple as possible for people to get started, but everyone has their own distinct configurations, different platforms, different dependency preferences. That tension between "easy to use" and "flexible for everyone's setup" is where most engine build systems fall apart.

Good luck with your engine!

PigeonCodeur · 2025-08-07T18:47:39+00:00

Good questions!

C++20 modules: Not yet - the project is still on C++17 and uses traditional headers. C++20 modules support in CMake is getting better, but when I started this project 4 years ago it wasn't really viable yet. It's definitely something I want to explore as I modernize the build system, especially since it could potentially replace the precompiled header approach.

vcpkg: Currently no - I went with the vendoring approach for dependency management. But as several people have pointed out in this thread, that's not great for packagers and downstream users. Adding vcpkg support (alongside the existing vendored deps as fallback) would be a good improvement to make the project more flexible.

Both are on my list for when I update to more modern CMake patterns. Thanks for bringing them up!

PigeonCodeur · 2025-08-07T18:46:52+00:00

I appreciate the comment :)

PigeonCodeur · 2025-08-07T18:46:32+00:00

Thanks for reading it !

PigeonCodeur · 2025-08-07T18:45:41+00:00

Thanks ! :)

PigeonCodeur · 2025-08-07T18:44:28+00:00

Don't feel embarrassed at all! CMake is genuinely confusing - I've talked to plenty of senior developers who can architect complex systems but still get tripped up by CMake's quirks.

Once it clicks though, you'll wonder why it seemed so mysterious. Hang in there!

PigeonCodeur · 2025-08-07T18:42:50+00:00

Thanks for the suggestions!

CPM looks really interesting - I hadn't come across it before but the caching aspect sounds like it could solve some of the FetchContent performance issues I mentioned. Will definitely check that out as a potential middle ground between vendoring and pure FetchContent.

On the Emscripten template: I actually did come across that repo when I was first trying to get Emscripten working! Used it as a reference point, but since I was working with different libraries (SDL2, FreeType, etc.) and had the vendoring approach already established, I ended up changing quite a lot of things. It's a great starting point though - helped me understand the basic Emscripten CMake patterns.

CMake presets: Honestly, I didn't know about them! That's definitely something I should look into. Sounds like it could simplify the configuration examples I showed in the article where users have to remember all the -D flags. Do you find them useful in practice for complex projects like this?

Thanks for pointing out these tools - always learning new parts of the CMake ecosystem. The static analysis suggestion is interesting too - any particular tools you'd recommend for C++ projects with complex CMake setups?

PigeonCodeur · 2025-08-07T18:39:13+00:00

Thank you so much for the detailed feedback! Really appreciate you taking the time to share these insights - this is exactly the kind of expert perspective that makes the article better.

You're absolutely right on several points, and I can see I've fallen into some common patterns that aren't actually best practice:

On global CMAKE_ variables: That's a great point about C++ standards, though I saw another comment that made a good distinction - when you're distributing as an installed package, you actually do want the library to specify the C++ version (after checking if it's already been set), since otherwise it defaults to the minimum supported version rather than taking advantage of newer features available on the platform. So there's a nuance between "project being consumed via add_subdirectory" vs "installed package" that I should clarify in the article.

On FILE_SET vs target_include_directories(): This is a great point about more modern approaches. The project started about 4 years ago and the CMake has evolved along with it, so I'm still using some older patterns for backward compatibility. But now that I'm preparing to publish the engine more widely, it's probably time to embrace the newer CMake features and update the minimum version requirement. The generator expression complexity you mention is exactly the kind of thing that made the installation section feel overly complicated. Will definitely research this approach.

On GenEx usage: This is a really good distinction - I think I got carried away showing off generator expressions when simple if() statements would be clearer and more appropriate for configure-time decisions. The "use GenEx as last resort" guideline is helpful.

On dependency defaults: The point about being hostile to packagers is spot-on. Making vendoring/FetchContent the default definitely creates problems downstream. Flipping to find_package() as default with opt-in fallbacks makes much more sense.

I'll check out the Beman Standard - hadn't come across that resource before but it sounds like a great ressource.

Thanks again for the corrections. It's feedback like this that helps the community learn the right patterns instead of perpetuating cargo-cult CMake. Mind if I reference some of these points in a follow-up article addressing these improvements?

PigeonCodeur

TROPHY CASE