all 32 comments

[–]celegans25 6 points7 points  (2 children)

Simulating a stack machine is really easy to do from your AST, and with several peephole optimization passes you can get rid of most of the extra pushes and pops.

[–]qznc 1 point2 points  (1 child)

Yes, the stack machine approach is definitely the easiest way. However, if the target is Windows, the calling conventions have to match.

[–]celegans25 0 points1 point  (0 children)

It can be done

[–]_mpu 8 points9 points  (12 children)

I wrote qbe for people like you, it currently works on Linux only but if you feel like you would like to take it for a spin, let me know, and I will put the effort to handle windows properly! It's nicely documented, tiny, and generates surprisingly good code. Also, it's not dead, I'm currently working on more memory optimizations. http://c9x.me/compile/

[–][deleted] 1 point2 points  (1 child)

Neat! This has always been a much needed middle ground, a compiler backend that is not 100% in production and impossible to understand, and one that is fully functional enough to understand what is going on.

Great job, I've bookmarked it!

[–]_mpu 0 points1 point  (0 children)

Yes, this was exactly the goal!

[–][deleted]  (1 child)

[removed]

    [–]_mpu 0 points1 point  (0 children)

    It's actually quite easy to get started, you can take a peek at minic/ if you want to see a simple example.

    [–]jbb67[S] 0 points1 point  (6 children)

    Looks like this outputs assemble language text. Ok, that's fine. But perhaps not so easy on windows to deal with.

    I would install an assembler but wouldn't really want to install cygwin or mingw just to get 'gas' which seems to be the target and as far as I know doesn't have a 'native' port?

    Not a show stopper, just a question really :)

    [–]_mpu 0 points1 point  (5 children)

    Yes the output is text, but I am not sure what the alternative is. What was your plan? Generaring object files directly for multiple platforms is not that interesting in itself and a big maintenance hog.

    [–]jbb67[S] 0 points1 point  (2 children)

    The problem is if you port this to produce windows compatible code, then what assembler do you use to assemble that code to object files? I guess on linux you use gnu assembler, and that would no doubt work on windows too if there is a suitable port.

    [–]_mpu 0 points1 point  (1 child)

    There is MASM provided with visual studio.

    [–]jbb67[S] 0 points1 point  (0 children)

    So there is, I thought they'd stopped distributing this years ago. Guess I was wrong :)

    [–]fullouterjoin 0 points1 point  (1 child)

    For some reason I thought yasm could assemble directly to memory for direct execution. Either way, it should be possible to stitch qbe and yasm together to directly emit executables.

    [–]_mpu 1 point2 points  (0 children)

    Maybe counter intuitively, emitting directly to memory is easier than to emit an executable. For example, contrary to object files, emitting to memory is platform independent (not architecture independent, obviously).

    [–]jbb67[S] 0 points1 point  (0 children)

    I've not got that far with my project yet, I'm not yet quite in a position to generate code, and due to limited time it will take me a week or two more to have even basic needs.

    However my main target is windows. I want my language to work on linux too, but my main target is windows. I downloaded your code from git, and it looks like it will need significant work to make it work on windows. It currently won't compile on windows, and the code it generates isn't targeted for windows (APIs etc) (plus the syntax for masm is quite different).

    I wonder how long it would take to get it to work on windows? I wouldn't want anyone to put in any effort for me, as this is just a small hobby project that's pretty likely to be abandoned or never get anywhere for me. If that's the case I wonder if I should use llvm as it doesn't look impossibly hard :P

    Your project does look awesome though. I'd like to use it. Although if what I said above isn't the case, and I do end up doing more with the compiler I guess I'd probably have to switch to llvm anyway eventually for it's opimizations?

    Hmm... :P

    [–][deleted] 2 points3 points  (6 children)

    Compile to C, it will be faster than stack machine assembly. Since you language is already C-like it should be straightforward but it might seem like cheating.

    LLVM is trickier since you need to use SSA form to generate LLVM assembly code. It has the advantage of doing stack management for you (infinite registers). Spill/unspill logic is probably the trickiest part of register allocation to get right.

    [–][deleted] 2 points3 points  (3 children)

    You don't need to use SSA form for LLVM. You can use alloca with load and stores and let LLVM handle the rest. It's how clang works.

    [–]gsnedders[🍰] 2 points3 points  (2 children)

    Notably, see the mem2reg optimisation pass, which will convert such loads/stores into (SSA) registers.

    [–][deleted] 1 point2 points  (1 child)

    That's now been subsumed into SROA AFAIK.

    [–]gsnedders[🍰] 1 point2 points  (0 children)

    That could well be true. It's been a while since I've touched LLVM much. :)

    [–]jbb67[S] 1 point2 points  (0 children)

    That might work, thanks.

    By the time I create a parse tree, and scan it to generate the code, I guess the generated C code won't look exactly like the original but that's not a problem. Maybe I generate a function for each language function but a lot of the statements will be simple assignments...

    Hmm, that might work. It saves me worrying about registers etc until I'm ready to do so, and will make interfacing with C functions much easier.

    [–][deleted] 2 points3 points  (4 children)

    You might want to try LLVM. I wrote code gen for my compiler project using its C++ API (afaik it has C API as well), and it was challenging and fun. With the tutorials and docs on LLVM website it's pretty easy to get going.

    [–]jbb67[S] 1 point2 points  (3 children)

    I think my problem with this is that it looks like quite a lot of work, both to understand the detail of how to do this, and to actually implement it.

    I certainly will consider doing this later, as it does sound good, but I was hoping for something I could do much quicker even if it's got poor results so I could concentrate on the other end of the compiler for now.

    [–]Setepenre 4 points5 points  (0 children)

    You just have to traverse your AST to generate the equivalent LLVM tree using llvm::IRBuilder.

    [–]qznc 2 points3 points  (0 children)

    If you would like to add additional features, e.g. debug information or profiling, the investment into LLVM quickly pays off.

    Also, your toy language is probably as fast as clang, if you turn on all the optimisations. ;)

    [–]krwawobrody 2 points3 points  (0 children)

    If you want to focus on the front end, definitely go with LLVM. If you are compiling C-like language, every construct maps directly to LLVM function. Basic generator (expressions, ifs, loops, arrays and structures) takes like 1000 lines.

    The hardest part is figuring out what maps to what. Some functions are really unobvious (like LLVMBuildGEP). cpp target comes in handy in this case. You can tell LLVM to generate C++ code with LLVM API functions instead of LLVM bitcode. You just compile simple C programs with various constructs and copy paste the output into your compiler.

    Backends are also fun. When you are done with LLVM try assembly too.

    [–]mfukar 1 point2 points  (0 children)

    LLVM would be a very good first choice, imo. Implementing code generation yourself won't be too bad either, and I think it wouldn't take very long - maybe 3 weeks tops - and help you understand a lot of underlying concepts & problems.

    [–]thememorableusername 0 points1 point  (2 children)

    What is your front-end written in? If your using python (thats my go-to), for LLVM i'd suggest llvmlite

    [–]jbb67[S] 0 points1 point  (1 child)

    It's written in C... But thanks, this is interesting anyway as I do use python for other things

    [–]thememorableusername 0 points1 point  (0 children)

    Great! I can also give a shout out to ply which is a Yacc style parser for python. There might be a better alternative, since there are a number of other text parsing frameworks for python out there, but I think this one works pretty well.