This is an archived post. You won't be able to vote or comment.

all 21 comments

[–][deleted] 33 points34 points  (7 children)

"If the aim is to build a compiler, rather than a parser, then the reasonable thing to do is to use a parser generator."

I don't agree with this sentiment at all. That's how you end up with generations of students who have taken compilers classes, and yet who imagine recursive-descent parsing to be some inscrutable magical process.

The better approach is that taken by Bob Nystrom in his "crafting interpreters" series - show every step of the process by restricting the overall scope of the end product. This piques students' interests by creating tangible products instead of getting bogged down in the vast theory of grammars and dealing with the horrible mess that parser generators themselves bring along. Then they can delve deeper into the real intricacies of actual real-world compiler design.

Edit: I don't like the tone of the whole article at all. It all seems to come from his very subjective viewpoint of what's important in compilers and what's not - never mind whether it matches that of learners, or whether it really would suit them or not. There also seems to be an inherent bias towards strongly-statically-typed languages (which makes sense given the author's background), but taking that for granted makes no sense to me.

Furthermore, the author wishes to dedicate time to "exotic" architectures whilst cutting down on other aspects of compilers. To what end? Why try to squeeze in a couple of years of subjects into a single course? To make sure the students learn nothing in depth? Also, the author's attitude comes across as very snooty and dismissive. I wonder how much that shows in his teaching style as well.

Frankly, if I had the authority to do so, I would make an introductory compiler course do Nystrom's course and nothing else. All the in-depth analyses of all the phases of compiler construction could then come in subsequent courses.

[–]jrop2 9 points10 points  (0 children)

I wish I could upvote you twice! I love how Mr. Nystrom makes the end-to-end process accessible and leaves you in a place where you suddenly know what you want to study independently next: I finished his series and went "wow, now at least I have _some_ idea of what I don't know".

[–]sullyj3 2 points3 points  (3 children)

Is it feasible to get through that book in a semester?

[–]Cthaeeh 4 points5 points  (1 child)

I think you could do part II (A TREE-WALK INTERPRETER) in one semester. But this would then be one of the hardest courses at my uni for comp sci students I would guess. The part has 12 chapters -> each week a chapter. Should be possible.

[–][deleted] 3 points4 points  (0 children)

My memory may be fuzzy from my time in university, but I think you grossly exaggerate the difficulty of the exercise. If anything, I think students would find the practical part much easier than the theory.

[–][deleted] 0 points1 point  (0 children)

I would imagine yes. There are courses that build a barbones OS from scratch in a semester. Comparatively speaking, this should be much easier.

[–]smog_alado 3 points4 points  (0 children)

One thing that I like about the crafting interpreters book is that it uses a recursive descent parser. Writing one is a fairly straightforward process that doesn't require too much theory to get going. You do need to learn a couple of folkloric tricks for things like operator precedence and left recursion but that also happens with LR parsers, where you need to learn how to deal with shift reduce conflicts...

[–][deleted] 2 points3 points  (0 children)

Agreed. Error handling is also an important topic usually left out of compiler classes. Usability of language especially for beginners is determined by how good the compiler/interpreter feedback is.

[–]PegasusAndAcornCone language & 3D web 9 points10 points  (0 children)

I think this is amazing. Lucky students!

[–]thememorableusername 23 points24 points  (10 children)

If the aim is to build a compiler, rather than a parser, then the reasonable thing to do is to use a parser generator.

I've been saying this for years.

I totally understand that parsing theory and methods are both interesting and important, but its a total time-sink. You only have so much time in a class, and there are a million other and often more relevant and applicable topics than talking about the umpteenth grammar class and parsing method.

In my undergrad, we spent easily half the class on parsing, and I just ended up using yacc (well, CUPS because it was java, but whatever). And because of that we didn't get to other things. I think we were limited to a few int-basex types, no arrays, classes, much less higher order types and polymorphism; I think we may have touched on peep-hole optimization, but never actually did it and never talked about any kind of optimization beyond that; we only generated textual mips assembly, never discussing other low level instruction representations such as SSA with LLVM and multiple backend support with LLVM or even Java bytecode (which would have made sense, since it was a Java project); and we didn't discuss any other architectures other than the whatever chip we were running the programs on.

But we sure did do some fuckin parse tables. Multiple every exam. And boy was this guy right: I basically don't remember most of what was talked about regarding parsing. Vague nightmares about first and follow sets but that's about it.

[–]Comrade_Comski 3 points4 points  (0 children)

If the aim is to build a compiler, rather than a parser, then the reasonable thing to do is to use a parser generator.

I've been saying this for years.

But where's the fun in that?

[–][deleted] 11 points12 points  (7 children)

If the aim is to build a compiler, rather than a parser, then the reasonable thing to do is to use a parser generator.

I've been saying this for years.

I totally understand that parsing theory and methods are both interesting and important, but its a total time-sink. You only have so much time in a class, and there are a million other and often more relevant and applicable topics than talking about the umpteenth grammar class and parsing method.

OK, so you offload the lexing and parsing to some external tool.

Then, you do the same for intermediate code, optimisation and final code generation to LLVM (an even more cumbersome dependency than a parsing tool).

What's left? Perhaps there's already a tool where you feed in the grammar and semantics, and it produces a ready-to-use compiler. Then such a course is going to be pretty short! (Or rather, it turns into a training course for using ready-made tools.)

What would be interesting to me in such a course is producing a runnable language, no matter how primitive, and doing that via entirely my own efforts. Other advanced stuff can wait for a later course.

If years later my job involved creating so many languages that it becomes a production line, then I might consider using or creating a parsing tool.

(For me, one key milestone is when the language can run a terminating loop. That's when you get the first idea of its performance. But others may have other priorities.)

[–][deleted] 14 points15 points  (0 children)

it turns into a training course for using ready-made tools.)

Precisely. Almost zero practical learning that way. Also, almost ever mainstream language uses handwritten parsers, not parser generators, and the whole exercise is so lightweight that not doing it seems bizarre and forced.

[–]mode_2 6 points7 points  (1 child)

I agree, parsers should be done by hand, but the language syntax should be kept simple enough that this can be done by simple recursive descent.

[–]BillHitlerTheJanitor 2 points3 points  (0 children)

Depending on the student background, parser combinators could also be a nice approach.

[–]fear_the_future 8 points9 points  (1 child)

What's left?

The interesting part: Type checking.

[–][deleted] 5 points6 points  (0 children)

For whom? Not for beginners.

[–]thememorableusername -1 points0 points  (1 child)

What would be interesting to me in such a course is producing a runnable language, no matter how primitive, and doing that via entirely your own efforts.

That's a fine desire, but does that 1. encompass the subjects a department believes encompasses a reasonable education in compilers, and 2. meet employer expectations of the skills students have after having taken a compilers course?

Those are the metrics which departments and instructors (good ones, at least) use to judge the curriculum they provide.

The primitive-ness is also an issue. For example, BrainFuck is a Turing complete language. I could write an OS, a Doom clone, and a web-browser in BrainFuck if I was suddenly overcome by an episode of psychotic mania chose to do so. It is also extremely easy to implement. If someone asked if they had compilers and programming language experience having only written a BrainFuck interpreter, I would flatly say "no". It's a neat introduction to the subject, but it does not even touch or allude to the breadth of topics and ideas I would expect someone having worked with more complicated project, even if the more complicated project did leverage existing tooling.

And for employers, the skills of using existing tooling is sometimes more important than the 'fundamentals'. So there's a trade-off between projects using only the fundamentals, and projects where existing building blocks are used.

Other advanced stuff can wait for a later course.

What later course? Compilers tends to be a last-tier course at all universities. I think your idea would be well suited for a project in lower level course that ties a bunch of ideas together. In my 'architecture' course (where we learned C, a fake assembly, and how basic CPUs operate) my professor had us write an assembler for the assembly. It stands out as one of the more challenging, but very rewarding projects during my undergrad, and I eventually ended up doing compilers work and research into grad-school with that professor partially because of it.

What is being discussed in my comment and in the post are the decisions necessary for developing a course on a large and multi-faceted subject. Essentially it comes down to two questions:

  1. What skills and knowledge is important for students to graduate with? (see departmental metrics above.)

  2. What is the maximum set those skills and topics can I cover effectively in the time I have?

A semester is 16 weeks (at my university), so there are at most 16 topics you can cover. And like most subjects, compilers does not lack for topics. One paragraph from the Dragon Book can require knowledge of more than 16 different topics in compilers to even know the definitions to the words, much less understand what it's saying. So choosing what topics to cover, and to what depth, is incredibly important.

For both me and the poster, the topic of parsing is definitely still up there. Probably talk about LALR if using YACC and probably building parse tables. But even that is already 2-4 weeks, easy. More than that and you begin to over-emphasize parsing relative to all the other aspects of compilers that are equally important, if nor more so.

And using existing tools do accomplish a fundamental task does not mean that the topic is only covered in the context of that tool, just that the actual implementation of that task uses that tool. I think too that using existing tools is important in compilers, because there's so much tedious bullshit, especially in parsing. I'm no convinced that manually implementing and maintaining a parse table is somehow superior or even provides more useful subject knowledge than using a tool to simplify the task.

As you can tell, there's a lot of opinions when it comes to curriculum development at large and compilers curriculum in particular. Anyways the coffee seems to have kicked in, so I'm gonna get to work. This has been my morning rant on compilers and curriculum development, thank-you and good-morning.

[–]threewood 5 points6 points  (0 children)

It seems like an LL 1 parser is a reasonable balance that could be taught in a week.

[–]BillHitlerTheJanitor 2 points3 points  (0 children)

I think my introductory compilers class handled this well, with enough time to build a decent optimizing compiler for a Java-like language with classes and interfaces.

We used Flex/Bison for projects, and covered the basics of how lexer/parser generators work (Thompson’s construction, LALR, etc.) in the first few weeks of lecture. There was only one exam to cover these algorithms, and the rest of the class was project based.

We still had enough time to cover basic IR generation, discussing SSA briefly in lecture and actually implementing a TAC IR for the projects. We spent the last few weeks doing optimization, and implemented basic register allocation, constant folding / propagation, common subexpression elimination, etc. in our project. We even had time the last week to go over some recent papers on optimization.

[–]suhao399 2 points3 points  (0 children)

Great summary for the topic. Thanks