Benchmarking Lua

Just out of curiosity, I decided to see how much I had to worry about optimization in Lua. I downloaded a Lua binary from http://code.google.com/p/luaforwindows/ and ran a few tests.

Fresh boot, AMD dual core 2.6ghz. In case you're curious, that's a pretty old computer; it was at the price-performance sweet spot when I bought it, but parts are going to start failing soon.

local time = os.clock(); for i=1, 10^7 do if i==0 then break end end; print(os.clock()-time)
0.422

Note that this basic loop consists of 1 increment, 2 compares, and 1 jump. Every loop that follows is going to be the same, except sometimes they have 1 less compare (the if in the middle of the loop). I can run it more than 20 million times in a second. If I was doing this in ToME, I wouldn't notice any difference in framerate until I did it around 2*10^4 times every frame. That would drop my framerate from 30 to 29.

local time = os.clock(); local p; for i=1, 10^7 do p = math.sqrt(i) end; print(os.clock()-time)
1.766
Takes about four times longer to do a square root loop. I could only do 5000 square roots every frame before affecting Lua framerate.

local time = os.clock(); local p; for i=1, 10^7 do p = i^0.5 end; print(os.clock()-time)
1.11
Huh, that's funny. For the non math inclined, i^0.5 is the exact same thing as sqrt(i). I repeated this several times to be sure-- sure enough, ^0.5 is faster. And, I might add, easier to type, and easier on the eyes. So adjust that square estimate to 9000 a frame until we affect framerate.

local time = os.clock(); local p; for i=1, 10^7 do p = math.sin(i) end; print(os.clock()-time)
2.281
Slowest yet. Looks like I could get away with 4000 sines. I'm going to assume tan and cos work at similar speed (maybe not a safe assumption in light of the surprising square root discovery!)

local time = os.clock(); local p; for i=1, 10^7 do p = math.exp(i) end; print(os.clock()-time)
2.484
About the same as trig. Can't imagine wanting to do a lot of this.

local time = os.clock()local p = "test"; for i=1, 10^7 do if p~="test" then break end end; print(os.clock()-time)
0.422
Comparison of strings isn't any slower than comparison of numbers. That's because of how Lua handles strings. p is a pointer to a string; "test" is a pointer to the exact same string.

TL;DR: Unless you're writing display code or pathfinding code, there is almost nothing you can do that will noticeably impact the performance of ToME. Don't worry about speed-- worry about having your code do what it's supposed to do, and do it clearly.

Reference implementation? Meh.

See title.

LuaJIT is your friend (regarding performance), DarkGod. :)

Repeated with JIT

Using http://sourceforge.net/p/safelua/wiki/LuaJIT%20binaries/ since I'm a lowly windows user and amateur modder :)

Respective times: 0.015, 0.094, 0.093, 0.469, 0.279, 0.015

Analysis: about ten times faster with JIT in general, but only 5x faster with math.sin; difference between math.sqrt and ^0.5 disappears; comparison of number and string act the same, fluctuate between 0 and 0.32

So, rough rule, you get ten times as many instructions with JIT :)

edit: oh, and no longer a fresh boot, which probably matters; i have little doubt that I am part of multiple botnets since I'm the kind of guy that just downloads binaries instead of compiling from source :)

Looking at the number of brutally basic instructions in the first loop, and doing a little math, it appears that I'm processing about 2.6 billion instructions per second with Lua JIT. Which makes sense, since I'm running a 2.6gHz processor, I'm just surprised to see perfect efficiency. 0.016 appears to be the limit of precision returned from os.clock()-- that's about a 60th of a second, so that's a good amount of precision.

memoization test

math.randomseed(1234); local time = os.clock(); local p; for i=1, 10^7 do p = math.sin(math.random()*math.pi*2) end; print(os.clock()-time)
0.766

math.randomseed(1234); local sintable = {};local sin = function(x) if not sintable[x] then sintable[x] = math.sin(x) end
return sintable[x] end; local time = os.clock(); local p; for i=1, 10^7 do p = sin(math.random()*math.pi*2) end; print(os.clock()-time)
7.36

Conclusion: random is too good, we're not memoizing anything :)

math.randomseed(1234); local time = os.clock(); local p; for i=1, 10^7 do a = 360*math.floor(math.random()*math.pi*2*360);p = math.sin(a) end; print(os.clock()-time)
0.9529

math.randomseed(1234); local sintable = {};local sin = function(x) if not sintable[x] then sintable[x] = math.sin(x) end return sintable[x] end
local time = os.clock(); local p; for i=1, 10^7 do a = 360*math.floor(math.random()*math.pi*2*360);p = sin(a) end;print(os.clock()-time)
0.922

Conclusion: don't bother memoizing sin tables, not worth it.

Rule number 1 of

Rule number 1 of optimization: Premature optimization is the root of all evil :)

Oh and ToME uses luajit2 already

test of algebraic simplification

for i = 1, 10^8 do p = 2*5 end
0.078

for i = 1, 10^8 do p = 2*5*8 end
0.078

for i = 1, 10^8 do p = 2*5*8*math.pi end
0.078