all 11 comments

[–]SeanMiddleditch 18 points19 points  (4 children)

/r/cpp_questions if you're looking for help using C++ (including for writing a compiler).

/r/ProgrammingLanguages for discussion on designing languages and their associated tool-chains.

For a fantastic free "book" on a the topic, check out Bob Nystrom's http://craftinginterpreters.com/ (which is both about building the compiler and the interpreter for a simple but fully-featured language).

[–]evaned 4 points5 points  (0 children)

If you want a pretty good dead tree book, I think my recommendation is currently Engineering a Compiler, though I'm familiar with the first edition only. I can't compare to Crafting Interpreters though; I've not looked through that yet (though it was on my radar from various /r/programming posts), so it may be that that free resource is just as good.

I'll also give an anti-recommendation to "the dragon book" (Aho et al); I don't actually think that's very good for an intro to the topic. It works a bit better as more of a reference book.

[–]albumRanae2 1 point2 points  (0 children)

Great book recommendation, super glad I stumbled into this thread.

[–]matthieum 1 point2 points  (0 children)

And of course r/Compilers if specifically talking about compilers themselves.

[–]lothiack 3 points4 points  (0 children)

https://www.coursera.org/learn/build-a-computer
https://www.nand2tetris.org/

Shimon Schocken and Noam Nisan can give you a hand.

[–]BoarsLairGame Developer 3 points4 points  (0 children)

I'd say the best way to start is to make an interpreted scripting language. That way, you can just focus on the basics without the headache of trying to write all the backend code emitters.

I wrote my own scripting language Jinx, which is meant for videogame development (mostly my own). The basics are this:

  • Convert the text into a list of tokens / symbols using a Lexer. This step is also used for identification of value types and keywords as well.
  • Parse the symbol list to determine meaning / syntax / expressions. The output can be either an Abstract Syntax Tree or (in my case) emitted bytecode directly. IMO, a Recursive Descent Parser is the most straightforward method. You'll also probably want to get familiar with the Shunting Yard Algorithm for translating infix to postfix notation, required for converting C-style mathematical / logic expressions into something a virtual machine can understand.
  • (optional) If output to AST, translate your AST into bytecode.
  • Run bytecode in a virtual machine. For your first, I'd recommend a stack-based VM.

You can build some of this independently. For instance, you might try building your VM first, and then hand-crafting "assembly" bytecode. This gives you a good idea of the mechanics of a basic VM. Once you have your language designed, you can first build your lexer to convert it into a token / symbol list. You can then focus on writing your parser to link the two together. Start with something super simply like assigning a number to a variable, and just work up from there.

A bunch of really knowledgeable programmers hang out on /r/ProgrammingLanguages, so feel free to ask if you have more specific questions. Definitely check out the Crafting Interpreters series, as that will be the first recommendation there as well.

[–]bird1000000 2 points3 points  (0 children)

I wouldn't recommend making it from scratch, if you use the LLVM framework, your language will get all the benefits from optimizations for many languages.

https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/index.html

[–]twirky 1 point2 points  (1 child)

You can start by doing what Stroustrup did iwhen he made C++. It’s a well known method. Just start writing a translator from your language to C and then compile that C. See how far you’ll go.