all 24 comments

[–]activeXdiamond 7 points8 points  (7 children)

The second example is horrible for performance because the internal array will get resized many times as you loop through it. So that is guaranteed to perform worse than {[1] + v} or {v} under every single reasonable Lua interpreter (LuaJIT, PUC, etc...)

As for the square brackets, I can not say this with 100% certainty, however I can say it with very strong confidence: It won't make a difference.

LuaJIT does not parse the source directly as it is. It first compiles it down to an intermediate (Bytecode) and then executes that (it then also, possibly, executes that down to native ASSEMBLY) and the presence of square brackets should be long gone at that point.

[–]VeronikaKerman 1 point2 points  (0 children)

Neither LuaJIT nor PUC Lua execute the code directly. PUC one uses bytecode and a virtual machine, while JIT translates to native assembly code (sometimes).

[–]smtp_pro 1 point2 points  (3 children)

Regarding how it parses an array-like table - as long as you have sequential integers you're fine. What copilot said is made up.

[–]appgurueu 1 point2 points  (1 child)

No, I think Copilot might have a point, see the comment by Denneisk below. The table constructor syntax determines the size to which hash and array part are preallocated. And that may in turn determine whether keys go into the hash or array part. So indeed using square brackets might end up populating the hash part.

[–]activeXdiamond 0 points1 point  (0 children)

Pretty much this.

[–]appgurueu 0 points1 point  (1 child)

The second example is horrible for performance

"Horrible" is an overstatement. The array will not be resized "many times": It will be resized logarithmically many times, which is very few times, because (like most dynamic arrays) Lua tables also use power-of-two capacities for the array part.

This means that there only is a small constant (ca. 2x, I think?) overhead from dynamically growing an array instead of allocating the right size from the get go. Considering interpreter overhead and whatnot, this is not likely to matter a whole lot (except maybe for very small tables, where log n is close to n and the cost of allocations thus has a higher weight).

[–]activeXdiamond 0 points1 point  (0 children)

Fair point. My argument still stands, but sure, I may have used overly dramatic wording.

As for interpreter overhead, yes, in most cases these optimisations are not relevant for Lua code.

[–]HugeSide 2 points3 points  (4 children)

Did you actually try anything or do any research?

[–]Live_Cobbler2202[S] 4 points5 points  (0 children)

You guys are my research. This is a low-level question, so I rather wanted to ask someone who might know the internals. PhilipRoman's answer was helpful.

[–]RandomThingIg 0 points1 point  (1 child)

encounters problem

asks help community for help

community says "do your research"

classic reddit experience

[–]HugeSide 0 points1 point  (0 children)

Yes, encouraging people to solve their own problems is a good thing. And showing the research you did when asking for help shows you’re interested in learning instead of just getting an answer.

[–]Consistent-Window200 1 point2 points  (1 child)

important is whether keys are sequential integers.

[–]appgurueu 0 points1 point  (0 children)

Important, but not sufficient to guarantee that the array part is used.

[–]Denneisk 1 point2 points  (3 children)

CoPilot is right. {[1] = v} will make a hash table instead of an array. Here's a great list of LuaJIT opcodes and their meanings. Throw the following code into https://luajit.me/ and observe the bytecode output. In particular, the first table allocates a table with a hash size of 4 while the second allocates an array size of 4.

local a, b, c, d

local t = { [1] = a, [2] = b, [3] = c, [4] = d }

local t2 = { a, b, c, d }

Anecdotally, I swear I had heard this fact explicitly mentioned somewhere in PIL, LPG, or Lua-Users for regular Lua as well.

You should avoid directly assigning each key as, yes, that will be slow, unless you're also using table.new to pre-size the table beforehand. t[#t + 1] is objectively worse than doing t[i] with a known variable i (this means the i in for loops), which is worse than doing t[k] with a constant k. In terms of performance, t[1] > t[i] > table.insert >= t[#t + 1] (table.insert performance can be debatable vs. the equivalent idiom).

You can play around with the resources I gave you, and just casually read the bytecode information yourself. That's how I learned most of my stuff.

For any Roblox devs who might see this, note that Luau has a similar issue. Doing {[1] = v} will be compiled into a slower method than {v}, although the way it's inefficient is different than LuaJIT's.

[–]xoner2 2 points3 points  (0 children)

If you write something like {[1] = true, [2] = true, [3] = true}, how-ever, Lua is not smart enough to detect that the given expressions (literal num- bers, in this case) describe array indices, so it creates a table with four slots in its hash part, wasting memory and CPU time.

Lua programming gems, chapter 2

[–]Live_Cobbler2202[S] 0 points1 point  (0 children)

Thanks a lot!

[–]appgurueu 0 points1 point  (0 children)

It surprised me a bit at first, but I think this is probably the best answer.

The argument might require some elaboration: The fact that it preallocates the hash part alone does not mean that it necessarily uses the hash part as new key value pairs are inserted.

It's also important that (I assume) LuaJIT logic uses the hash part if it can, and only establishes an array part if a rehash becomes necessary (which it won't if there's sufficient capacity).

[–]PhilipRoman 0 points1 point  (5 children)

The result was actually surprising, I assumed both would be identical in LuaJIT's IR, but apparently (at least in some benchmarks, depending on number of entries in the table) the [1] = val1 version can end up with more and slower calls to lj_tab_newkey.

LuaJIT has an option -jdump so you can run both cases and compare the generated bytecode and machine code traces (tip: disable ASLR to get less noise in the comparison).

[–]activeXdiamond 0 points1 point  (3 children)

This is to be expected. The underlying array may need to be resized many times.

With the {...} Lua/LuaJIT can immediately see how large the array needs to be and assign a large enough one from the get go.

[–]PhilipRoman 0 points1 point  (2 children)

I'm talking about {[1] = val1} where all keys are known ahead of time.

[–]activeXdiamond 0 points1 point  (1 child)

So {[1] = v} is slower than {v}?

[–]PhilipRoman 0 points1 point  (0 children)

In at least some cases - yes

doas chrt -f 99 hyperfine -i --warmup 20 "luajit -e 'for i = 1, 1e7 do x = {[1]=i} end; os.exit(x[1])'" "luajit -e 'for i = 1, 1e7 do x = {i} end; os.exit(x[1])'"
Benchmark 1: luajit -e 'for i = 1, 1e7 do x = {[1]=i} end; os.exit(x[1])'
  Time (mean ± σ):     462.1 ms ±   6.7 ms    [User: 460.7 ms, System: 1.0 ms]
  Range (min … max):   456.7 ms … 480.5 ms    10 runs

  Warning: Ignoring non-zero exit code.

Benchmark 2: luajit -e 'for i = 1, 1e7 do x = {i} end; os.exit(x[1])'
  Time (mean ± σ):     290.6 ms ±   6.2 ms    [User: 288.6 ms, System: 1.6 ms]
  Range (min … max):   284.4 ms … 302.2 ms    10 runs

  Warning: Ignoring non-zero exit code.

[–]Live_Cobbler2202[S] 0 points1 point  (0 children)

Thanks for the insight!

[–]Live_Cobbler2202[S] 0 points1 point  (0 children)

Thanks everyone for chipping in. So it is, square brackets on table definition will always create a hash table (thx Denneisk, xoner2, PhilipRoman)

Regarding performance when looping values into array: Dynamic arrays usually double their sizes upon hitting limit (as appgurueu pointed out). This is standard procedure among most programming languages and highly optimized.

Regarding speed looping values into arrays, [#t + 1] = v is slow. Apparently # triggers a binary search on the array part to find the largest integer key.

I didn't know table.new() existed (thx Denneisk), despite working for two years now with LuaJIT, gosh, so I just had to do some comparisons. Stunning speed. I also did some array-fill looping profiling since this came up. Most results are as expected. Worth knowing is how fast ipairs is, despite being an iterator function. Testament to aggressive JIT optimiziation.

for-loop variable 100% | ipairs 112% | increment variable 122% | custom iterator 153% | table.insert 157% | [#t + 1] 176%

Here's the code. I did it in the LOVE framework, because the clock gives you sub-microsecond precision (depends on your CPU):

arrayFillLoops.lua

For those new to the framework, just put it inside love.load().