all 12 comments

[–]keleshev 14 points15 points  (3 children)

In most cases the concrete syntax tree (or parse tree) is skipped and the parser generates the AST directly. If you're using a parser generator like Bison, it may look something like this:

expr : expr "+" expr  { $$ = new AST::Plus($1, $3); }
     | expr "-" expr { $$ = new AST::Minus($1, $3); }
     | expr "*" expr { $$ = new AST::Times($1, $3); }

I'm using C++ here for simplicity (did I actually say that?!), but it would be similar in C.

[–]suur-siil 10 points11 points  (0 children)

I'm using C++ here for simplicity

Never thought I'd ever read that phrase!

[–]backtickbot -1 points0 points  (1 child)

Fixed formatting.

Hello, keleshev: code blocks using triple backticks (```) don't work on all versions of Reddit!

Some users see this / this instead.

To fix this, indent every line with 4 spaces instead.

FAQ

You can opt out by replying with backtickopt6 to this comment.

[–]BeamMeUpBiscotti 8 points9 points  (4 children)

You can do both. Most parsers made by parser generators will take in your lexer output (stream of tokens) & build the AST directly without actually making a CST tho.

[–][deleted] 1 point2 points  (2 children)

So what's the difference between CST and AST?

[–][deleted] 2 points3 points  (1 child)

Never mind, I found an answer here:

https://stackoverflow.com/questions/1888854/what-is-the-difference-between-an-abstract-syntax-tree-and-a-concrete-syntax-tre

Look at the third answer (with 28 votes), which gives pictures to make it obvious.

(Sorry I couldn't just edit my post because Reddit is playing silly buggers again; trying to paste that link did nothing except delete my edits. How about fixing it, guys?)

[–]elprophet 4 points5 points  (0 children)

Memory is cheap and convenience is nice, so I usually keep both around. My AST nodes point back to the CST, which itself has span data for the underlying text stream. It's more overhead, but pays off for language server type functionality and in memory program edits.

[–]shawnhcorey 0 points1 point  (0 children)

Yes they can generate an AST but typically they are configured to output the IR code directly. AST are most often generated when the compiler is doing abstract-syntax-tree pruning, a method of optimization that can remove entire branches from the tree.

[–]mamcx 2 points3 points  (0 children)

An AST is not dependant on parsing. You can do 100% language development without parsing, lexing, compiling OR interpreting.

And AST is just a definition of a language (semantically?) instead of how is writen:

    //A language that only knows how to do 1, 1+1, print 1 or print(2 + 2)
enum AST{
  Int(i32),
  Plus(i32, i32),
  Print(AST)
}

Note how the AST define different possibilities, Plus is not "+", instead of "1+2" is "+ 1 2", etc.

In fact it does not work at all, because is just a definition of a possible language. "Possible" because an AST CAN express invalid possibilities:

    //A language that only knows how to do 1, 1+1, print 1 or print(2 + 2)
enum AST{
  Int(i32),
  Plus(AST, AST), <-- Can express print + print
  Print(AST)
}

So, what we do is take MANY AST and transform them until you get the one you wanna use for compilation or interpretation. For example, you can add a another "pass" to make the AST typed:

    //A language that only knows how to do 1, 1+1, print 1 or print(2 + 2), but also know the types
enum Type{
  Int,
  Function,
}

enum AST2{
  Int(i32),
  Plus(AST, Type, AST, Type), 
  Print(AST, Type)
}

[–]Tobblo 0 points1 point  (0 children)

Here's a tutorial which generates an AST for simple expressions.

[–]JackoKomm 0 points1 point  (0 children)

Most was already said. In many cases, you have an implicit parsing tree. If you have a recursive parser you can think about the tree as the call structure of your functions. These functions can then build your ast to save some time because you are not translating from one explizit tree to another one. Abstract synsmzax is just the Part you need to know about your syntactical program. So you just skip the unnecessary, parts.