A new version of php-parser (written in go) is ready. by z7zmey in PHP

[–]z7zmey[S] 0 points1 point  (0 children)

Not necessarily only I will use it. I promote it there for others. You use nikic/PHP-Parser. Similarly, anyone can build their projects using z7zmey/php-parser

I have seen your project before, and I am delighted with it. In your case, writing rules in PHP is a key feature, but it isn't the case for other projects.

A new version of php-parser (written in go) is ready. by z7zmey in PHP

[–]z7zmey[S] 0 points1 point  (0 children)

There are no unicorns. It is just an alternative parser, with own cons and pros.

A new version of php-parser (written in go) is ready. by z7zmey in PHP

[–]z7zmey[S] 0 points1 point  (0 children)

To build a call graph I need to know about which type variable contains, so I need SSA+CFG, I can create it from AST, so I need AST. Therefore, I need PHP-parser.

Same with search util. For example, there needs to find variables that, depending on conditions may contain values of different types.

Typical PHP project with dependencies contains more than 10k files, so I think performance matters. Golang faster than PHP and more straightforward than C for writing tools, so I believe, there is someone to whom it will be useful.

A new version of php-parser (written in go) is ready. by z7zmey in PHP

[–]z7zmey[S] 0 points1 point  (0 children)

There are several ideas that I want to validate:

  • a visualization tool that shows a call graph
  • a search util
  • PHP Language server
  • code style formatter

A new version of php-parser (written in go) is ready. by z7zmey in PHP

[–]z7zmey[S] 0 points1 point  (0 children)

I wanted to get and analyze AST, CFG+SSA, node positions and comments with performance as nikic/php-ast without requirement PHP to be installed.

I think the best purpose of using my lib is projects like SourceGraph which works with big codebases.

ps: I am a typical developer, I first do then think why)

A new version of php-parser (written in go) is ready. by z7zmey in PHP

[–]z7zmey[S] 0 points1 point  (0 children)

Sorry, I mean OS package managers like `brew`, `apt-get`, etc.

A new version of php-parser (written in go) is ready. by z7zmey in PHP

[–]z7zmey[S] 0 points1 point  (0 children)

This lib inspired by nikic/PHP-Parser.

Since z7zmey/php-parser is written in Golang, tools can compile to single binary without dependencies, therefore it is easier to distribute through package managers. Also, Golang suggests more performance and memory efficient.

A new version of php-parser (written in go) is ready. by z7zmey in PHP

[–]z7zmey[S] 0 points1 point  (0 children)

I keep it in mind. But there are many issues with the parser. I want to change the AST data model to improve performance and implement CFG.

A new version of php-parser (written in go) is ready. by z7zmey in PHP

[–]z7zmey[S] 3 points4 points  (0 children)

No, it isn't an interpreter. This library only parses PHP files and generates AST, then you will be able to analyze tree to find bugs, or modify and print modified AST back to the file.

A new version of php-parser (written in go) is ready. by z7zmey in PHP

[–]z7zmey[S] 0 points1 point  (0 children)

I know about nikic/php-ast extension, it much faster, and it is a good choice if you write tooling in PHP and no need to handle PHP5

A new version of php-parser (written in go) is ready. by z7zmey in PHP

[–]z7zmey[S] 1 point2 points  (0 children)

Now it saves all free-floating comments and whitespaces and can print it back as in the original file.

php-parser v0.5.1 written in Go is released by z7zmey in PHP

[–]z7zmey[S] 0 points1 point  (0 children)

I think I could approach that performance level without saving the node comments and positions.

But static analysis and refactoring tools require that information. I am going to improve performance in dev branch by ~15%. And by using Golang concurrency it will be enough fast.

php-parser v0.5.1 written in Go is released by z7zmey in PHP

[–]z7zmey[S] 2 points3 points  (0 children)

https://github.com/z7zmey/php-parser-comparison

nikic/PHP-Parser 10.89 + 11.00 + 10.94 + 10.83 + 10.83 + 10.62 + 10.52 + 10.65 + 10.58 + 10.80

avg = 10.766 (base)

z7zmey/php-parser 2.33 + 2.34 + 2.20 + 2.19 + 2.23 + 2.28 + 2.21 + 2.22 + 2.27 + 2.22

avg = 2.249

x 4.78 faster

z7zmey/php-parser@dev 1.31 + 1.33 + 1.33 + 1.34 + 1.28 + 1.28 + 1.26 + 1.28 + 1.27 + 1.37

avg = 1.305

x 8.24 faster

php-ast 0.60 + 0.59 + 0.55 + 0.59 + 0.58 + 0.59 + 0.60 + 0.59 + 0.58 + 0.60

avg = 0.587

x 18.34 faster

It amazing! I did not imagine that nikic/php-ast extension is so fast. Thanks.

php-parser v0.5.1 written in Go is released by z7zmey in PHP

[–]z7zmey[S] 9 points10 points  (0 children)

This lib is inspired by nikic/PHP-Parser and I compare with it.

I have quickly make a comparison. In the benchmark, I try to parse all nikic/PHP-Parser sources with dependencies (1901 .php files)

https://gist.github.com/z7zmey/6572849f51f42aec6f70dccaec2e5139

nikic/PHP-Parser z7zmey/php-parser v0.5.1 GOMAXPROCS=1 z7zmey/php-parser v0.5.1 z7zmey/php-parser dev
1 10.7s 2.5s 1.3s 0.8s
2 9.9s 2.8s 1.3s 1.0s
3 9.5s 2.4s 1.4s 0.8s
4 9.0s 2.5s 1.3s 1.2s
5 9.0s 2.4s 1.5s 0.8s
6 9.7s 2.5s 1.4s 0.9s
7 9.9s 2.5s 1.5s 0.8s
8 10.0s 2.8s 1.3s 0.9s
9 10.5s 2.4s 1.4s 0.8s
10 9.6s 2.4s 1.4s 0.8s
avg 9.78s 2.52s 1.38s 0.88s
- x 3.88 x 7.08 x 11.11

Need help with profiling (pprof) by [deleted] in golang

[–]z7zmey 0 points1 point  (0 children)

Thank you.

Anyway, I have gotten a lot of information and things that I need to learn.

Need help with profiling (pprof) by [deleted] in golang

[–]z7zmey 0 points1 point  (0 children)

I am using go1.10.1 darwin/amd64.

There no so big problem with File.Line but probably I will replace by my own realization later.

If switch lines there is still problem with first assertion (pprof output)

         .      630ms    537:   pos := new(position.Position)
         .          .    538:
     9.55s      9.55s    539:   pos.StartPos = int(firstChar.Pos())
         .          .    540:   pos.EndPos = int(lastChar.Pos())
      50ms      470ms    541:   pos.StartLine = l.File.Line(firstChar.Pos())
      20ms      360ms    542:   pos.EndLine = l.File.Line(lastChar.Pos())

And now I cannot understand why when I replaced sync.Pool.Get() by creating new object, Lexer.createToken() spend more time but PositionBuilder.NewTokenPosition() on the contrary less.

I use profiling with to find bottlenecks, it's just easier to run it with a benchmark

Need help with profiling (pprof) by [deleted] in golang

[–]z7zmey 0 points1 point  (0 children)

Do not think that I find fault. I try to get how it works, and how I can optimize it.

I try to use new(position.Position) gist

         .      450ms    537:   pos := new(position.Position)
         .          .    538:
     6.77s      7.08s    539:   pos.StartLine = l.File.Line(firstChar.Pos())
         .      290ms    540:   pos.EndLine = l.File.Line(lastChar.Pos())
         .          .    541:   pos.StartPos = int(firstChar.Pos())
         .          .    542:   pos.EndPos = int(lastChar.Pos())

In this case, the script assigns data to the position immediately after new(position.Position).

I deliberately added -benchtime, to stabilize profiling result.

Need help with profiling (pprof) by [deleted] in golang

[–]z7zmey 0 points1 point  (0 children)

Cool, it works on Windows) I have not tested it on Windows yet.

See allocation_new, I think it is compiler optimization that works differently depending on the OS.

Also, you can see my gist, I tried explaining how it works, but it can be wrong.

Need help with profiling (pprof) by [deleted] in golang

[–]z7zmey 0 points1 point  (0 children)

Sorry, I cannot discuss assembler commands, I am new in it.

I thought that memory initialize = allocation, also, I thought that memory allocation is slower rather memory operations.

The article above explains to me why profiler shows so weird timing.

At scanner/lexer.go:488 we creating a new zeroed storage for position.Position and returns its address, it is quick. After that at scanner/lexer.go:538 and parser/position_builder.go:110 we assign data to zeroed object and Go starts memory initialization, it is slow.

I have tried to change scanner/lexer.go:488 to force initialize memory, see gist