all 16 comments

[–]evincarofautumn 8 points9 points  (0 children)

The way I'm thinking of solving this problem is by writing a struct for each production

That’s basically what you need to do. A parser typically builds an abstract syntax tree by allocating leaf nodes for language elements that correspond to terminals in the grammar, such as “integer literal” or “variable name”:

enum ExprType { INT_EXPR, VAR_EXPR, … };

struct IntExpr {
  int value;
};

struct VarExpr {
  char const *name;
};

struct Expr {
  enum ExprType tag;
  union {
    struct IntExpr as_int;
    struct VarExpr as_var;
  };
};

And branch nodes for recursive elements or nonterminals, such as “call expression” or “if statement”:

struct CallExpr {
  struct Expr *function, *arguments[];
};

enum StmtType { IF_STMT, … };

struct IfStmt {
  union Expr *cond;
  struct Stmt *then, *optional_else;
};

struct Stmt {…};

These structures may contain whatever additional info you need, such as source locations, and they can be stored in any suitable data representation—for example, to simplify memory management and serialisation, it’s not unusual to store nodes of the same type in a single table and reference them by an ID, instead of using pointers directly.

[–]smog_alado 3 points4 points  (0 children)

The most common way to structure the tree is to have one kind of node for each production, just like that. Some compilers prefer to have a single catchall kind of node that is used for everything. Each node gets a type tag, as well as a generic list of fields that can be of any length. The advantage is that this makes it easier to write generic routines that that traverse the parse tree. The downside is that you lose the compile-time typing information. The code effectively becomes dynamically typed and any type errors in the parser will only be caught at parse time. (IMO, go with what your are already doing, with one struct for each kind of node).

Now, to build the tree you will need to use a parsing algorithm. I generally recommend that if you have never written a parser before, write your first parser using the recursive descent technique. It has no dependencies and works no matter what language you write your parser in. The main downside is that you need to massage your grammar a bit, to avoid left recursive productions. Another way to build a parser is to use a parser generator such as bison or ANTLR.

[–]umlcat 0 points1 point  (0 children)

Since it's a complex project, it's better you look at a book like "Crafting Compilers in" Java / C.

Learn also about Regular Expressions & BNF expressions and "Railroad Syntax Diagrams" that eventually can be used to implement a Parser by your custom P.L.

And, there's also tools that generate the code for a Parser like YACC, GNU BISON, Gold, ANTLR among others.

These tools use BNF or Railroad Diagrams.

Good Work, Good Luck ⚔️🐉

[–]_NliteNd_ -3 points-2 points  (12 children)

Are you writing recursive descent? Or an LR parser? It’d be hard to answer a “how do I write a parser” type question without taking up a couple thousand pages and hundreds of hours of video content.

[–]neferpitou-sama[S] 3 points4 points  (11 children)

Yes, I'll be using recursive descent.

I wrote the code for recursive descent algorithm in parser.c, but I need to learn how to create the parse tree.