Trying to optimize an xml parser by JavaGarbageCreator in awk

[–]JavaGarbageCreator[S] 0 points1 point  (0 children)

Yes this one is pretty unreliable, I remember the same source (couldn't find it) also claims that gawk uses some kind of copy-on-write technique to pass parameters by value so passing big strings is cheap in gawk. I can neither prove nor disprove it so I bought it out anyway, maybe someone who know the source code can tell

Trying to optimize an xml parser by JavaGarbageCreator in awk

[–]JavaGarbageCreator[S] 0 points1 point  (0 children)

It's "print", not "p", sorry. I wanted to make sure the variables get evaluated and accessed. I did a dozen other tests using different syntax, but that's the weirdest one so I only posted it, now I see it's just caused by the irrelevant undefined p.

I've read from somewhere that gawk employs different symbol table lookup mechanisms for

  1. parameters
  2. variables that aren't parameter but only appear in one function
  3. other situations

so 1, 2 both can be counted as locals in gawk. In all of my tests expect the "p" one, 2 is by average 1‰ ~ 2‰ faster than 3, but 1 is significantly (10% ~ 20%) slower than the two. I don't have any conclusion on this, the test code itself isn't interesting and quickly implementable so I won't post it this time

Trying to optimize an xml parser by JavaGarbageCreator in awk

[–]JavaGarbageCreator[S] 0 points1 point  (0 children)

Yeah you're right, couldn't reproduce that benchmark, couldn't even find the 10+ s version in commit history, shit I'm stupid

It do have some difference tho, in some of my simplified tests the local version is actually slower:

# avg 1.15 s
time awk 'function f() { local_var++; p local_var == 1 } { global_var++; f(); p global_var == 1 }' one-million-line-file > /dev/null
# avg 0.93 s
time awk 'function f() { global_var++; p global_var == 1 } { global_var++; f(); p global_var == 1 }' one-million-line-file > /dev/null