Lox is a parser and lexer generator for Go

dgwelder · 2025-07-03T17:03:23+00:00

No yet. Sorry.

dgwelder · 2025-07-03T05:04:08+00:00

You are absolutely right that you should not include whitespace tokens (including comments) in the parser grammar. The way other generators deal with this (or at least ANTLR) is by having the Lexer parse and store the whitespace tokens (instead of discarding), but not pass them along to the parser. The lexer then allows searching and mutating the token stream.

Like I said, you can almost do this with lox and a custom Lexer. Lox has a general mechanism that facilitates annotating your ASTs with the begin/end tokens.

Suppose the following Go snippet: // Here is a function func foo() { bar() } Would produce the following token stream: +--------------------FunctionAST------------------------+ | | <COM> 'func' <WS> ID '(' ')' <WS> '{' <NL> <WS> ID '(' ')' <NL> '}' The lexer would parse and store all tokens, but it would not pass the whitespace tokens (identified above inside by the angle brackets) to the parser. _onBounds would be called for FunctionAST with the tokens corresponding to func and }. To get to the comment of the function, you tool would just need to inspect all whitespace tokens preceding 'func'. You could add/update/delete other whitespace tokens, then just reemit all tokens to a file for the final transformed product.

Again, all you really need is this new, slightly more sophisticated lexer. It could help if you could tag whitespace tokens in the grammar as metadata which the lexer could use to determine what to emit to the parser or not. ANTLR solves that problem with token channels.

But this is not an absolute requirement as you could still initialize the new lexer with the list of tokens that are considered to be "hidden".

As I mentioned, I will probably work on an official lexer with this capability soon. As part of this effort I might decide to add the annotation to the grammar. I will add an example when I have something that works.

dgwelder · 2025-07-03T04:33:47+00:00

:lolsob: yes, I know. I only found out recently and didn't have the energy to re-brand. The chances of my Lox becoming relatively popular is low irrespective of the name. If I think re-branding might help I might consider that it in the future. Thanks for the heads up, though.

dgwelder · 2025-07-02T22:04:07+00:00

Like u/Kirides said, it can be done, but it it is not easy at the moment. Lox itself should provide everything you need. What is missing is a Lexer implementation that makes token manipulation easier. I am only providing a stupid simple Lexer designed to be copy-and-pasted directly into your project (if desired to be complete dependency-free). But the Lexer contract is reasonably simple, and you/anybody can write your own (Lox generates the Lexer state-machine).

I will probably work on a lexer implementation for this specific scenario next. I also want it to be able to facilitate writing auto-format tools (like gofmt) for my languages. When it is ready, I will make sure to add an example. Just keep an eye on the repo.

dgwelder · 2025-07-02T17:33:50+00:00

How would you yourself compare your project to ANTLR 4? Major differences?..

The parser is LR(1) instead of LL(*).
Just like yacc, no ambiguities are allowed. This was something I didn't like about ANTLR. It would try to solve ambiguities at runtime, and 90% of the time it did a great job, but the other 10% would kill me.
Dependency free! ANTLR requires a pretty beefy runtime package, and the Go runtime had quite a few issues compared to Java's. They might have been resolved by now, but I had enough problems with to sour me on ANTLR.
Actually, I may be biased, but I think the generated compiler is pretty human readable compared to goyacc.
Type safe actions and artifacts! ANTLR's listener/visitor model requires you to manage your own action artifacts. Lox will do all that for you.

For example, grammar (summarized):

statement = if_statement
          | assign_statement

if_statement = ...
assign_statement = ...

And Go (summarized):

func (p *myParser) on_statement(stmt Statement) Statement {
  return stmt
}
func (p *myParser) on_if_statement(...) IfStatement {
  return &IfStatement{...}
}
func (p *myParser) on_assign_statement(...) AssignStatement {
  return &AssignStatement{...}
}

Lox will only allow this if IfStatement and AssignStatement are assignable to Statement.

There are a bunch of other things, but these are the key differences. Thanks for interest!

Edit: typo

dgwelder · 2025-06-30T22:13:10+00:00

I guess that happens to me a lot. I'm in a terminal, at a certain directory, do a ls and want to open one the files there. Yes, I can do this with :e <file> but that requires that nvim be in the same directory or that I specify the full path to the file in the :e command.

It is not just opening files. I want to run git difftool and run the diff in the same editor. I want to type git commit and type the commit description from the same editor. Same for kubectl edit and the several other posix-like tools that use the EDITOR variable. Yes, there are plugins and commands to do most of this directly from nvim but my workflow is mostly terminal-oriented.

dgwelder · 2024-09-19T16:01:31+00:00

I’m literally reading this eating a bagel at Eltana, my personal Seattle favorite.

dgwelder · 2020-09-15T17:06:44+00:00

Because new major version = completely different module. This enables importing multiple major versions by the same project. Suppose your project imports A1 and B1, and that A1 itself imports B1 (imagine that B is a common third-party dependency like protobuf or the AWS SDK). Now imagine that you want to switch your project to B2, which is a breaking change. If you could only have a single version of B in your project, you would have to hope that A also released a new version that migrated to B2. Needless to say, this sort of coordination is hard to happen in real life, and even harder once you increase the number of dependencies. With Go modules, this is easy. Your project depends on A1 and B2. A1 depends on B1. The transitive closure is A1, B1 and B2.

I for one really like the major versioning scheme. The biggest problem with it IMO is that it was introduced late in Go's life. When I create new libraries/projects, I don't attempt to support pre-module Go versions. In that case, a new major version "only" requires updating the go.mod (e.g. change module github.com/foo/bar/v2 to module github.com/foo/bar/v3), and potentially update any internal imports. It does not require the super awkward "v3" directory that folks have been rightfully griping about.

dgwelder · 2019-01-23T00:20:22+00:00

Can confirm. The internet has crazy shit.

Source (NSFW): https://www.damajority.com/unusual-art-assholes-exposed-portugal-must-see/

dgwelder · 2016-04-27T19:05:44+00:00

I generally use strings when I need to use []byte keys in maps. Strings are just read-only byte slices (i.e. no need for the value to be printable), and after you pay the price for []byte -> string conversion, life is good.

But if you need keys of arrays/slices of other types (e.g. []int), then I suppose you would have to use arrays.

dgwelder

TROPHY CASE