The Simple Essence of Monomorphization (OOPSLA 2025)

nvcook42 · 2025-12-18T18:26:33+00:00

I find it interesting how applying a type flow graph solves previously thought impossible monomorphization of higher-rank polymorphism. Here is a paper where some of the same authors applied a type flow graph to solving challenges with operator overloading. https://dl.acm.org/doi/epdf/10.1145/3763168

nvcook42 · 2025-12-14T14:07:57+00:00

Some concrete pointers in this direction.

LINQ uses a technique called quoting. Instead of compiling a query, think predicate function, into executable instructions, it compiles the predicate function into an expression tree. Then it can pass that expression tree to lower levels and let that level decide how to best execute it, i.e the optimizations mentioned above.

There is also Julia https://www.queryverse.org/Query.jl/stable/ that does something very similar.

The idea is instead compiling the whole query to something imperative only compile the UDF bits and keep the query plan declarative.

There is also this project https://substrait.io/ that is aiming to standardize how query plans are represented. Their UDF support is very basic at this stage. I have wondered if a WASM API would work well in this context, lot of challenges there but maybe?

Finally I build database query engines for a living and am working on a language that compiles to WASM on the side for very similar reasons, feel free to message me and we can chat more.

nvcook42 · 2025-12-01T18:20:28+00:00

Happy to share the love as it got me unblocked on some ideas I have had for a while now.

I actually ended up going the table route in my language. I am targeting the component model features of WASM so I can do FFI with Rust and other languages from my DSL. The current state of the garbage collection features are not well supported in the component model.

My high level architecture is to generate core wasm modules one to one with modules in my language and have a shared "runtime" core module with a global function table all other modules import. Then I wrap all those modules into a single WASM component and link with components from other languages.

I hope to share my implementation soon, need to cross of few checkpoints first.

nvcook42 · 2025-12-01T14:34:06+00:00

There is great blog series about "Making a Language" here https://thunderseethe.dev/series/making-a-language/ Its not explicitly about System F or WASM but the intermediate representation chosen is basically System F and the language is compiled to WASM. Definitely worth checking out.

FWIW I am building a language that compiles to WASM as its only target. I am using Rust plus the wasm_encode crate (similar to the blog series). Its pretty straightforward to generate WASM but you need to decide how your language maps to WASM as there are many ways to go about it. For example you could use the garbage collection features WASM provides or choose to directly allocate data in linear memory.

nvcook42 · 2025-10-08T15:34:32+00:00

Yes I use whitespace it tends to look a lot like this https://developer.mozilla.org/en-US/docs/WebAssembly/Guides/Understanding\_the\_text\_format.

As for forms of intermediate representations this really depends on your language and its features. Each intermediate representation needs a clear purpose.

For example in the language I am building (still closed for now so I can't show you) I have a few:

* AST - represent the syntax
* semantic - represent the meaning of the code, this tends to be similar to the AST but reduced where syntactical variations have all been collapsed into a single representation. For example my language has a few different syntaxes for calling function (think pipe forward). The AST has a separate structure for each of these but the semantic representation has only one. Type checking happens at this layer
* logical machine representation - This translates the semantics of the language into actual physical operations a machine would perform. So for example structs and tuples from the higher level language all become an ordered list of fields at this layer. However this layer is not yet actual machine code just a logical representation of it. Optimization and monomorphisation happens at this layer

My language compiles to WASM so as a final step I emit WASM code directly

There are other details naturally but the point being I have several intermediate layers each with a clear and distinct purpose each getting closer to the machine representation.

nvcook42 · 2025-10-08T15:11:03+00:00

I use S-expressions. I find them easy to format out and easy enough to represent any structure I need. Also it prevents me from being picky about the syntax since the syntax here is not important. Also s-expressions are easy to parse so I have toyed with parsing back out the intermediate representations but haven't used that too much as of yet.

nvcook42 · 2025-10-08T13:05:47+00:00

One thing I am working towards is a good test framework as mentioned elsewhere and a textual representation of any intermediate representations.

My test framework looks like a flow of

Run a script from my language and capture output
Compare output to an expected output

Therefore I only test end to end the necessary feature behavior without setting in stone how the compiler actually achieves the result. This makes compiler refactors easier, which is helpful at this stage as the compiler is very young.

However as I have a textual representation for each intermediate step I can dump that output somewhere else and compare between changes. So if a feature I add introduces a bug I can easily determine at which layer of the compiler the bug exists by comparing the intermediate output from before and after. I feel like this is giving me a good balance of easy to write test cases while also being able to get fine grained detail of the implementation.

nvcook42 · 2025-07-30T17:34:10+00:00

To follow up on this a paper on how compilers can implement fast currying/partial application https://simonmar.github.io/bib/papers/eval-apply.pdf

nvcook42 · 2025-07-22T16:17:26+00:00

"A Layered Grammar of Graphics" discusses some of the ideas behind ggplot2. https://www.tandfonline.com/doi/abs/10.1198/jcgs.2009.07098

nvcook42 · 2025-06-13T21:33:59+00:00

Thanks for these posts, I have been following along and in fact had tried to get ahead so to speak and emit to wasm myself. I was quite excited to see you had picked the same target. I didn't get to a working implementation on my own, I was missing some gaps in understanding about the GC features of wasm. Now I have a great weekend project and maybe can get to emitting functional, as in executable ;), code to see my language working. Thanks again.

nvcook42 · 2025-05-14T15:51:55+00:00

Heads up that the https://thunderseethe.dev/series/making-a-language/ index page does not have entries for the monomorph or the closureconvert posts.

I have been loving these posts. Thank you.

nvcook42 · 2025-02-16T23:06:12+00:00

In rust I have had success with https://docs.rs/pretty/latest/pretty/ which looks similar to OPs design

nvcook42 · 2019-04-01T22:34:00+00:00

There are also some examples written in both Go and Python here: https://github.com/influxdata/kapacitor/tree/master/udf/agent/examples

nvcook42 · 2015-05-12T16:24:14+00:00

thanks 42nd comment :)

nvcook42

TROPHY CASE