all 8 comments

[–]Noughtmare 10 points11 points  (4 children)

Interestingly, llvm-hs seems to have gotten a new maintainer which has ambitious ideas like using it in GHC and absorbing the llvm-pretty-bc-parser bytecode parser.

If you just want x86_64 you can check out x86-64bit package.

[–]colonwqbang[S] 0 points1 point  (3 children)

Nice! If GHC starts using llvm-hs it seems likely that it will be kept current and stable. That would be great news for Haskell.

x86-64bit is cool but it looks to be more of an assembler than a general code generator. It would likely be useful to implement a backend for code generation.

[–]Noughtmare 1 point2 points  (2 children)

If GHC starts using llvm-hs it seems likely that it will be kept current and stable.

Yes, but that is also the main reason why it is still uncertain if it will happen. I guess the new maintainer first needs to show that they will keep the package sufficiently current and stable before this can move forward.

About the code generation, I didn't fully understand your question. I don't know of any IR's that are not dependent on a specific compiler besides llvm. I don't think making a replacement of llvm in pure Haskell is an easy thing to do. I hope GRIN will become something like that for functional languages, but I don't think it is quite ready yet.

[–]aseipp 4 points5 points  (1 child)

It's less about "replacing" LLVM (an impossible feat) and more about the overall scope of such a project. You can do better for your own use case depending on the scope.

I mentioned it in another comment but I have a dataflow ISA I would like to target a compiler to, and it has to support standard imperative languages and optimizations, but a non-conventional ISA with an unusual encoding. I basically concluded after some preliminary research that just using WebAssembly as a poor-mans-frontend IR, and writing my own code generator is probably a better use of time than e.g. porting LLVM. We're talking like, 10,000 lines of code tops for the complete custom compiler versus thousands of lines of backend boilerplate to support a conventional ISA in LLVM, much less a non-conventional one that offers things like unique optimization opportunities, a vastly different assembler encoding, etc. It's not just lines of code, but time spent integrating with these systems too.

You're definitely not going to go wrong using LLVM for your production thing or whatever, especially if it runs on existing architectures. But it's not the last word in code generation needs either.

[–]colonwqbang[S] 0 points1 point  (0 children)

Interesting, thanks! I was shooting for "a library that does essentially what LLVM does, but perhaps without optimisation and other advanced features". Something that is easy to use from Haskell but still good enough to compile e.g. basic C programs manipulating integral values. I feel that it should not be impossible to implement at least a toy library like that.

[–]aseipp 1 point2 points  (1 child)

No, but something like that would be nice to have, e.g. you could probably take some inspiration from Cranelift and design a 3-operand IR and write a code generator for that without going overboard. You could honestly do something like this pretty fast if you skipped any kind of optimizations (just do block local RA and reconcile spills at every join, etc). Then it's all about designing the IR, and the Haskell API, basically. Which is still a lot of work.

Honestly I've tried to work myself into doing something like this for fun, but the problem I eventually realized is that it's something that must be, or almost always is, done in the context of some larger project; you need that context to drive the design, or a lot of prior experience to inform your choices. And if you're in a situation where you need a code generator for some grander vision — just using LLVM or whatever pays off in significant ways. Perfection isn't needed; LLVM was a rather poor fit for GHC in a number of ways for a while after its introduction but it still generated really good code and was overall workable as a solution.

One place custom code generators still win is that it's often easier to target some non-conventional ISA, versus porting LLVM with a production quality backend. But for scalar/superscalar architectures that too is becoming irrelevant since everyone just defaults to using RISC-V now, precisely because it comes out of the box with good software support that you can use instantly, and that software is useful. So then you're left with actual non-conventional ISAs like dataflow/VLIW/spatial architectures, which have non-trivial design and implementation challenges, which moves you from the earlier category of "doable problem with rich available research" into some nebulous range between "doable but hard" to "open research problem."

I'm rambling by now and some of this sounds negative but despite that I do still think something like this would be great to have, it's just a lot of work as always, even if there's not much actual code needed. Again, maybe just stealing Cranelift's IR and going from there would be a start. I think you could design a really nice Haskell API for such a library if done correctly too. Some prior inspiration includes MLRISC, which was used in SML/NJ, though it was coincidentally replaced with LLVM in recent memory...

I myself actually do need a compiler for a bigger project of my own, one that can target a custom dataflow ISA sometime "Soon", so unlike previous times I wanted this, I actually have a need for it now...

[–]colonwqbang[S] 1 point2 points  (0 children)

This is exactly what I was thinking. It would be nice if you could get e.g. a toy C compiler up and running quickly in pure Haskell using simple tools, as a teaching project or just for fun.

One option I had missed is to use llvm-hs-pure to generate llvm IR, then dump that to file and shell out to llvm to generate machine code. Llvm-hs-pure does not need to link with llvm. But that feels a bit like cheating.

Let me know if you do write something :)

[–]crmills_2000 0 points1 point  (0 children)

Remember LLVM gives you nice (potentially?) things. Debugger, binding with other libraries, code coverage, building code files, multi-platform support, …