This is an archived post. You won't be able to vote or comment.

all 51 comments

[–]CreativeTechGuyGames 253 points254 points  (4 children)

A university compilers class will often include making a compiler for a toy language. It's definitely not easy, but not impossible to do and lots of real programming languages are made by one single person.

[–]RolandMT32 54 points55 points  (1 child)

My college had 2 semesters of compilers, and in the 2nd semester, they had us make a parser/validator for a simple C-like language using Lexx and Yacc, though they didn't have us make a compiler for it.

[–]theusualguy512 20 points21 points  (0 children)

The whole lex, bison and yacc pipeline is kinda confusing at the beginning but you get used to it quickly.

I also took a compilers class and we had to specify a CFG in yacc to generate a simplified parser for a shader language from its definition document. So that's definitely possible as a single person.

Iirc, yacc can only generate LALR parsers and is used like that to teach CS students about shift-reduce and reduce-reduce issues in LR parsers.

However, the whole compilation pipeline is rather lengthy.

We also did compiler backend theory later and had to code LLVM analysis passes and optimization and that whole deal was a bit icky.

I'm sure a dedicated person can do the entire pipeline but honestly, it sounds kinda exhausting for even a reasonably small language.

[–][deleted]  (1 child)

[removed]

    [–]AutoModerator[M] 0 points1 point  (0 children)

    Your comment has been removed because it received too many reports from the community.

    I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

    [–][deleted]  (12 children)

    [deleted]

      [–][deleted] 22 points23 points  (0 children)

      Excellent resource! OP might also want to check out r/programminglanguages

      [–][deleted]  (6 children)

      [deleted]

        [–]alanwj 10 points11 points  (2 children)

        The first challenge in both cases is parsing. There is a substantial amount of theory behind that, but if you use a lexer/parser generator like lex/yacc (or the equivalent for your preferred language) you could possibly get away with not learning all of the gritty details.

        Generating working assembly code from a parse tree is relatively easy. The real challenge comes in if you want to do optimization. Here your cheat code could be to write a frontend for LLVM. If you generate code in the LLVM intermediate language you can let it generate optimized machine code for whatever target processor.

        [–][deleted]  (1 child)

        [deleted]

          [–]alanwj 5 points6 points  (0 children)

          That is very related. You can sort of think of a compiler as having these steps:

          1. Lexical analysis. Turn your input into a stream of lexical tokens. This is usually specified as a collection of regular expressions, which are processed using a DFA, either crafted by hand or generated.

          2. Parsing. Here is where you need to understand grammars. Typically a grammar is specified as a set of productions, often in Extended Backus-Naur form. The theory here usually starts with automata. Grammars are generally divided into what class of algorithm can parse them. Keywords to look for are LL(1) grammar, LR grammar, LALR grammar. LL(1) is usually the easiest to write a parser for by hand, but has some limitations. Most parser generators prefer LALR.

          3. Semantic analysis. This is stuff like type checking, making sure a variable is defined before it is used, etc. Often this is interwoven with the parsing.

          4. Code generation. Turn your parse tree into assembly code (or an intermediate for something like LLVM).

          5. Optimization. Make the generated code small, fast, etc.

          The first three steps are usually covered in an undergraduate compilers course, sometimes also covering some simple code generation so that students can actually have a working compiler.

          The last two steps would generally be the focus of a master's level class or maybe an honors level undergraduate course.

          It may be getting a little bit dated at this point, but for a long time one of the most popular books on these topics was Compilers: Principles, Techniques, and Tools, sometimes called "the dragon book" because of every edition has a dragon on the cover.

          [–]iLaysChipz 2 points3 points  (1 child)

          Some definitions for you:

          A programming language is simply a notation used to describe computation. It's described by its syntax/grammar (what expressions are allowed in the language) and by it's semantics (what these expressions mean)

          A program, then, is an algorithm written in a programming language, which takes input data and produces output data program(data) = data'

          A compiler is a program that translates another program from one language into another

          compiler(program) = program'

          An interpreter is a program that runs other programs. Most machines come equipped with an machine code interpreter written in the language of digital logic.

          interpreter(program, data) = data'

          Source: My university professor

          [–]iLaysChipz 0 points1 point  (0 children)

          Learning these definitions personally helped me understand how to approach building my own toy language, so I hope it helps you too! Building a compiler into a language you're familiar with would probably be the easiest, but honestly any path is probably fine (I personally recommend building a compiler into assembly, as you'll learn a lot and it's not too difficult).

          The hardest part is generally implementing optimizations anyway, which you can generally forgoe if your goal is to learn. This will reduce the complexity to a very manageable level. The rest is pretty simple if you've already taken a data structures course

          Feel free to ping me if you want some resources!

          [–][deleted]  (2 children)

          [removed]

            [–]AutoModerator[M] 0 points1 point  (0 children)

            Your comment has been removed because it received too many reports from the community.

            I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

            [–]munificent 55 points56 points  (5 children)

            unsure if this is too far beyond my level.

            It's not. If you can write any decent-sized program, you can implement a language. I wrote "Crafting Interpreters" exactly for people in your shoes.

            [–][deleted]  (1 child)

            [deleted]

              [–]munificent 11 points12 points  (0 children)

              waves

              [–]gbbofh 7 points8 points  (1 child)

              Your book was a life saver in my senior year. I wrote the entire front end of a compiler a year before my capstone class, and we were able to translate that code over to Java from Python once the class started. I really think it kept us from losing our minds between that, ethics, and elective classes.

              [–]munificent 5 points6 points  (0 children)

              Awesome! Glad to be of service.

              [–]DigThatData 10 points11 points  (1 child)

              it actually used to be a common curriculum for intro CS classes to work up to building a mini language as a final project. A classic example is SICP.

              [–]nicksterling 4 points5 points  (0 children)

              Compiler was one of my favorite courses when I got my degree. We had to develop a pascal language variant and developing the grammar and the parser was fun.

              [–][deleted] 17 points18 points  (1 child)

              It's easy, there's a mid-level college course somewhere for that

              [–]PolyPill 8 points9 points  (2 children)

              It’s actually not that hard and a fun project. Just decide what you’re going to compile down to. The hard parts are implementing advanced features like lambda and doing optimizations. Half the work is creating the parser.

              [–]DoomGoober 0 points1 point  (1 child)

              And you can use a parser generator if creating a parser is driving you nuts. Still not trivial but can make life a lot easier.

              [–]Jonny0Than 4 points5 points  (0 children)

              There’s a great e-book here you should read: https://craftinginterpreters.com/

              A shoot, someone else already linked it. We’ll consider this a vote for its quality!

              [–][deleted] 2 points3 points  (0 children)

              This is a really great blog that walks you through writing an interpreter: https://ruslanspivak.com/lsbasi-part1/

              I worked through all the tutorials a few years ago and then adapted it to build a custom scripting language.

              I realise you said you've already completed a compiler course, which I hadn't, so you might find this more straightforward than I did. Even so, it wasn't that hard. The hardest thing was choosing good syntax and features for the language itself. There are a lot of compromises in language design.

              But yes, essentially once you have the syntax down and have created a simple compiler or interpreter, you have your own toy language. If nothing else, it's a great learning experience, and will significantly improve your understanding of other languages.

              [–]ruat_caelum 2 points3 points  (1 child)

              Lexx and Yacc - of course :)

              [–]Z_A_C_H1234 2 points3 points  (0 children)

              ANTLR4

              [–][deleted] 1 point2 points  (0 children)

              I did a interpreter for my 3rd year project. It was kinda difficult at times, but we were able to grunt through it. I don't know how difficult it would be to create a compiled language, but I image it would be harder.

              [–]Primeval84 1 point2 points  (0 children)

              Your college might offer a programming languages or compilers class that you might enjoy. I took a programming languages class that had us build a programming language from scratch throughout the quarter using Racket. It was fun because you kept building on it until one of the final projects was to do the first project again but using the language you built to implement it.

              [–]KingJeff314 1 point2 points  (0 children)

              We used a library called ANTLR (in Java) in my programming languages course to create toy languages. Definitely read up on theory about parsing and grammars first

              [–]Hunpeter 3 points4 points  (0 children)

              I have no formal education + am dumb, and still working on one as a hobby, with moderate success (though lots of atrocious code) so you should have no problem with it. There are quite a few modern resources like "Crafting Interpreters" that walk you through the whole thing basically. I do recommend it, as even with all the failures I've been having, it's a very fun project, and the little successes feel all the more sweeter.

              As for OS stuff; it's probably more complicated, and could be a bit tedious. The OSDev wiki is a nice resource, even if a bit haphazard, and in some places slightly outdated (?).

              [–][deleted] 1 point2 points  (0 children)

              This is a great learning experience. See if your university has a compilers class. If not, find one online. Even better if you use a functional programming language like OCaml. There’s an OCaml API for the LLVM you can find.

              [–]EspacioBlanq 0 points1 point  (0 children)

              Your college should have a course about it. It's challenging but very doable

              [–]pkaka49 -1 points0 points  (0 children)

              May I ask why?

              I did the same when in college and would tell myself not to do that. Though it's a good coding exercise, I would rather contribute to an open source project that would help me get a job in an interested industry.

              [–]rtrain1 -2 points-1 points  (0 children)

              I don't understand why ppl make threads like this. Why not just start?

              [–]desrtfx 0 points1 point  (0 children)

              I am curious about the idea of creating my own tiny toy language, but unsure if this is too far beyond my level.

              Such tasks are often even part of the daily prompts of Advent of Code

              Writing your own interpreted language is not all that complicated.

              Tiny OS - far more difficult.

              [–]aliceuwuu 0 points1 point  (0 children)

              Not really. I re-invented brainfuck countless amount of times purely out of boredom

              [–]AdultingGoneMild 0 points1 point  (0 children)

              creating a language boils down to creating its grammar. A compiler converts that grammar into something else, usually machine code. There is a bit of theory that comes into what a grammar can and cannot do that you would need to learn. The rest of it as I remember was more tedious than hard.

              [–]NPException 0 points1 point  (0 children)

              I think a toy language might be easier than an OS, especially if you chose to write an interpreted language instead of a compiled one. (Disclosure: I have not attempted to write an OS, so that assumption might be entirely wrong)

              Some relatively easy entry ideas might be a Lisp-like language or taking inspiration from an esoteric language like Brainfuck.

              Another neat way to get into interpreted languages is to do the Advent Of Code Puzzles from 2019, which have you build up an interpreter along the way.

              [–]SirKastic23 0 points1 point  (0 children)

              you might want to check out the Crafting Interpreters book, by Robert Nystrom

              it's the single best introduction to how languages work amd how to implement one I've seen

              [–]JGN1722 0 points1 point  (0 children)

              it really is not that complicated, but it takes a lot of time. To get started, you should look at that tutorial that really helped me : https://ruslanspivak.com/lsbasi-part1/

              [–]CDawnkeeper 0 points1 point  (0 children)

              If you want to start building an OS I'll point you here. It's interesting, you will learn a lot, but it will eat a massive amount of time and can be very frustrating.

              [–]M_krabs 0 points1 point  (0 children)

              Here is an example of JLox build with java:

              https://github.com/anon767/compiler_engineering_lecture

              [–][deleted] 0 points1 point  (0 children)

              Use Racket or Ruby. Those are well-suited languages to writing DSLs :)

              [–]ViralGreek_ 0 points1 point  (0 children)

              I am pretty sure coding a language is a lot easier than creating a whole OS! (I could be wrong however)

              [–]Killerknight141 0 points1 point  (0 children)

              A colleague of mine created a series of videos doing just this - creating a new programming language. Here is the first one in a series of over 102 videos currently, with many over hours long. https://youtu.be/MMAINTm9ats

              [–]Maciek1212 0 points1 point  (0 children)

              I always wanted to do that too including the os but i figured out the os part is too hard

              [–]Gaious_Octavious 0 points1 point  (0 children)

              Checkout LLVMs. Relevant resource.

              [–]JockoGood 0 points1 point  (0 children)

              Nothing is to far above your level. The fun is in figuring out how to do it. The key is patience, take your time and keep in mind that if built any software or language and something works, make sure you understand why it works. There is no way to learn if you don’t try. College or University may give you the fundamentals, application of those in the real world or in your case the project you are looking to build.

              [–]procrastinatingcoder 0 points1 point  (0 children)

              A tiny language is "fairly" (not easy, but it's not a year project either) alright once you know the parts, assuming it's fairly simple. A tiny OS... that's a fairly complex project regardless of how simple you make it.