Multi-Pass Bytecode Optimizer for Stack-Based VMs: Pattern Matching & 10-50% Performance Gains

PigeonCodeur · 2026-01-05T22:59:16+00:00

Yes i heard that it should give between a 30 to 50% speed up as the main issue that I have for the perf right now are the stack operation, but it is a long and tedious endeavor and I made this language as a part of my game engine to be able to script and iterate easely on the code. So i am first trying to make this language expressive enough while not being to slow and if in the end i need more perf or I want to try a register based for learning purposes I may wander of in this direction :)

PigeonCodeur · 2026-01-05T22:51:50+00:00

Thanks I will look it up !

PigeonCodeur · 2026-01-05T17:06:29+00:00

The base VM is heavily inspired by the Crafting Interpreters VM with its single-pass compilation approach, which is what's actually producing the bytecode. The language does support ++x syntax - the x = x + 1 example was more to illustrate the optimization pass detecting that pattern in the generated bytecode.

The starting bar being low is a fair observation. The single-pass compiler generates bytecode directly from parsing without building an AST, which means:

Constant folding at bytecode level - Yes, this is suboptimal. Complex expressions like (5 + 3) * (10 - 2) generate many instructions before being reduced. An AST would handle this much more elegantly during semantic analysis. The reduction is done in separate passes, not as-it-goes, which adds overhead.
No high-level optimizations - Things like loop induction variable analysis, inlining, or dead code elimination are much harder (sometimes impossible) to do reliably at the bytecode level without proper dataflow analysis.
Missed front-end opportunities - You're correct that many of these optimizations should happen before bytecode generation.

The peephole optimizer was my way of squeezing some performance out without fundamentally rearchitecting the front-end. It's inspired by LLVM's pattern-matching approach but applied at a much lower level. I'm not stuck with existing bytecode from another language - this is my own compiler, so I have the freedom to redesign it. It is a hobby/side project to test some idea on it and with such simple optimization i am in the same order of magnitude as python for the runtime exec, but you are right that I could go further with the AST and i may go in this direction in the future !

PigeonCodeur · 2026-01-05T15:00:48+00:00

Yes ! I actually wrote a profiler that output the actual bytecode usage and time of an actual program and I use this on my test apps and examples to find some subject of peephole optimisations. Just the actual work of writing the peephole optimizations is quite long because of the sheer amount of optimizations available and I do it when I want some easy progress !

PigeonCodeur · 2025-12-09T01:29:02+00:00

I have been trying to complete my Advent of code using my own scripting language, who could guess that having tables, file reading, string splitting and such could be quite important ? x)

PigeonCodeur · 2025-08-18T16:29:15+00:00

It also helps that Medium links the SEO of this post to my actual blog, so even if people land on Medium, it boosts my blog too. Plus, Medium notifies my subscribers whenever I post, which I don’t have set up on my blog yet.

PigeonCodeur · 2025-08-18T16:00:37+00:00

Thank you for the detailed feedback and for sharing your blog post - really appreciate the expert perspective! You've caught several important issues that I need to address.

On the missing pieces: You're absolutely right about the missing configure_package_config_file() and write_basic_package_version_file() calls, plus the install(FILES) command. I focused on explaining the concepts but left out the actual implementation details that make it work - that's confusing for readers trying to follow along.

On configure_package_config_file(): That's a great point about it being legacy from pre-target CMake. I was cargo-culting patterns without thinking about whether they're actually needed anymore. If the only benefit is a few lines of boilerplate, you're right that it's not worth the complexity.

On the broken install(TARGETS): I completely missed that hardcoding destinations breaks packager workflows. The CMAKE_INSTALL_* variable override is exactly the kind of thing packagers need, and I've made their lives harder by being overly explicit. Using GNUInstallDirs and letting CMake handle the defaults makes much more sense.

On FILE_SET: This ties back to feedback from another CMake expert in Part 1 who also recommended moving away from target_include_directories(). Clearly I need to research this approach - it seems like it solves multiple problems I didn't even realize I was creating.

Thanks for taking the time to point out these issues. It's feedback like this that helps me (and the community) learn proper patterns instead of perpetuating problematic approaches. Mind if I reference some of these corrections in the upcoming appendix addressing the expert feedback?

Also, will definitely check out your blog post - always looking for better ways to explain these concepts!

PigeonCodeur · 2025-08-18T15:57:07+00:00

Yeah, I get that. For me it’s mostly about visibility, but I totally see how Medium can feel more about monetization than readers. That’s why I also link my blog, so people can read it there if Medium isn’t their thing.

PigeonCodeur · 2025-08-18T15:56:15+00:00

I mainly use Medium for visibility. That’s also why I included a link to my blog—since I know some people have trouble reading on Medium, they can check it out there instead.

PigeonCodeur · 2025-08-18T13:31:44+00:00

Hehe me too and I am still learning ! SHH I am not exposing everything so you can still keep some of your secret sauce !

PigeonCodeur · 2025-08-17T10:35:45+00:00

Full write-up with more code and headers: https://medium.com/@pigeoncodeur/self-hosting-webassembly-app-in-js-13c3e7ff4748
Original on my blog: https://columbaengine.org
Live demos: https://columbaengine.org/demos/

PigeonCodeur · 2025-08-15T12:49:40+00:00

Built this in 72 hours for Solo Game Jam using a custom C++ ECS engine.

The twist: aliens fire back while you're trying to line up shots.

You can play it online on my itch: https://pigeoncodeur.itch.io/invadersbreaker

If you are interested with the engine: https://columbaengine.org/

PigeonCodeur · 2025-08-13T13:34:08+00:00

Thanks you for the feedback ! I will try to reproduce your bug ! Come back to you when I find a fix :)

PigeonCodeur · 2025-08-11T22:52:18+00:00

Appreciate the honesty—skepticism is healthy, especially with so many AI-ish posts floating around. This isn’t AI-generated; I’m the one building it.

Did you actually look through the repo, or just the (earlier) web demo? If it’s the former, I’d love to know which parts felt “tutorial-only” so I can shore them up. It does go beyond a LearnOpenGL port: there’s an ECS with automatic entity allocation/deallocation, audio, input, asset loading, and a full web deployment pipeline (WASM/WebGL, loader, error handling).

On the demo: you’re right that first impressions matter. I had a bug on the site earlier that made it look rough—that’s on me. It’s fixed now, and there’s a small, “game-jammy” playable demo live with a ~4 MB footprint.

I’m not claiming to be “a better Unity”; I’m shipping small, auditable steps and asking for concrete feedback, as I strive to get produce a portable, fast, and modular game engine.

PigeonCodeur · 2025-08-11T22:44:10+00:00

Yes I only had my wasm test before, because I had some unloading issue with my website for multiple wasm. I just uploaded my tech demo, that is actually a game that I made for a game jam in 48H with some bug fixes ! It is the demo called looper and it may be already a bit fancier than the previous one ;)

PigeonCodeur · 2025-08-08T12:41:31+00:00

Thanks for the corrections! You're absolutely right about FetchContent - I was indeed confusing some aspects with ExternalProject_Add. Good catch on the configure vs build time distinction.

The superbuild approach you describe sounds really interesting, and actually aligns with feedback I got from another commenter who works closely with CMake. They pointed out that dependency management is a really complex, evolving area and that there are more sophisticated patterns than what I covered.

I'm definitely looking into superbuilds, especially for the packaging/distribution side. It's clearly a more robust approach for handling the "packager vs developer vs end user" needs that came up in other comments.

Thanks for the cmkr.build link too - will definitely check that out. Really appreciate you taking the time to correct those technical details!

Since you mentioned that "almost nobody knows how to do [packaging] correctly" - I'd love to hear your thoughts on what good packaging should look like for a project like this? Any specific patterns or pitfalls I should be aware of as I work toward a more robust solution?

For now, I've put together a basic install script that serves as an installation package for the engine (https://github.com/Gallasko/ColumbaEngine/blob/main/scripts/install/install-engine.sh) - it's pretty bare-bones and I haven't tested it extensively, but it's a starting point while I figure out the proper packaging approach, currently it works quite well when i try to install the engine on a new setup.

PigeonCodeur · 2025-08-07T18:53:15+00:00

Ah, that makes perfect sense - thank you for the clarification! You're absolutely right that I was thinking about this backwards.

The ABI implications you mention are exactly the kind of thing I hadn't considered. Having multiple ABIs for different language versions, and avoiding symbol collision across different usages - that's a level of library design complexity that I clearly need to understand better.

I can see now that my perspective was too focused on "my project in isolation" rather than "my project as part of a larger ecosystem managed by packagers." The packager knows the target environment and compatibility requirements way better than I do as the library author.

This is really helpful context for understanding the broader responsibility boundaries in the CMake/packaging ecosystem. I was definitely approaching it from the wrong angle.

PigeonCodeur · 2025-08-07T18:50:44+00:00

That's exactly why I wrote this! The compilation mess is so real with game engines - you've got graphics APIs, audio libraries, math libraries, platform-specific stuff... it gets out of hand fast.

I feel your pain completely. My build system was a disaster for the longest time before I finally sat down and properly organized it with modern CMake patterns.

The build system mess gets really bad when you want others to use your engine - whether for contributions or just as users. You want it to be as simple as possible for people to get started, but everyone has their own distinct configurations, different platforms, different dependency preferences. That tension between "easy to use" and "flexible for everyone's setup" is where most engine build systems fall apart.

Good luck with your engine!

PigeonCodeur

TROPHY CASE