Execution speed varies wildly (47x) between (but not during) runs

Questions about the LÖVE API, installing LÖVE and other support related questions go here.
Forum rules
Before you make a thread asking for help, read this.
User avatar
pgimeno
Party member
Posts: 3550
Joined: Sun Oct 18, 2015 2:58 pm

Re: Execution speed varies wildly (47x) between (but not during) runs

Post by pgimeno »

nameless tee wrote: Tue Jan 07, 2020 2:58 ampgimeno: After reading your post I tried to understand the output of luajit -jdump and luajit -jv but without much success.
It appears that -jv is like a summary version of -jdump. -jdump lists, for each trace, the Lua opcodes that it reads; if the trace succeeds, it additionally shows the intermediate representation in 3AC (used in optimization) and the final machine code. The Lua opcodes are the same you get when you use -b -l; they are the disassembled version of the binary code that the interpreter uses. I don't understand the IR much (haven't bothered to try), but the final compiled code is sometimes useful (if you understand the assembly code for the target processor).

In this case, the output of -jv is probably enough. The preprocessor of course has a lot of NYI instructions that break the traces, that's expected.

Funnily, I get a lot of different traces. It really looks as if LuaJIT introduces some kind of randomness on purpose; another hypothesis is that minor variations of the CPU timings (which are somewhat random as they are influenced by e.g. what's running in the background) make a difference. Here are two sample runs, omitting the preprocessor related lines:

Code: Select all

[TRACE   2 hateful.lua:20 loop]
[TRACE   3 (2/3) hateful.lua:19 -> 2]
[TRACE --- 0x414a3970:25 -- leaving loop in root trace at 0x414a3970:28]
[TRACE   4 0x414a3970:25 loop]
[TRACE   5 (4/3) 0x414a3970:28 -> 4]
[TRACE --- 0x414ac050:194 -- leaving loop in root trace at 0x414ac050:198]
[TRACE   6 (5/2) hateful.lua:28 -> 4]
[TRACE   7 hateful.lua:25 -> 4]
[TRACE   8 (3/1) hateful.lua:24 -> 7]
[TRACE   9 (6/5) hateful.lua:28 -> 2]
[TRACE  10 0x414ac050:194 loop]

Code: Select all

[TRACE --- hateful.lua:20 -- leaving loop in root trace at hateful.lua:19]
[TRACE   2 hateful.lua:20 loop]
[TRACE   3 (2/3) hateful.lua:19 -> 2]
[TRACE   4 0x41e2e328:25 loop]
[TRACE   5 (4/3) 0x41e2e328:28 -> 4]
[TRACE   6 (5/2) hateful.lua:28 -> 4]
[TRACE   7 hateful.lua:25 -> 4]
[TRACE --- 0x41e2e9c8:194 -- leaving loop in root trace at 0x41e2e9c8:198]
[TRACE   8 (3/1) hateful.lua:24 -> 7]
[TRACE   9 (6/5) hateful.lua:28 -> 2]
[TRACE --- 0x41e2e9c8:194 -- leaving loop in root trace at 0x41e2e9c8:198]
15022020.071668
nameless tee wrote: Tue Jan 07, 2020 2:58 am Apparently LuaJIT really hates loops longer than two that use ctypes.
One thing I'd advise is to use numeric indices instead of structs, e.g. v[1], v[2], v[3] (or v[0], v[1], v[2] if you don't want compatibility with tables) instead of v.x, v.y, v.z. AFAIK structs still use hashing.
User avatar
raidho36
Party member
Posts: 2063
Joined: Mon Jun 17, 2013 12:00 pm

Re: Execution speed varies wildly (47x) between (but not during) runs

Post by raidho36 »

pgimeno wrote: Tue Jan 07, 2020 9:34 am One thing I'd advise is to use numeric indices instead of structs, e.g. v[1], v[2], v[3] (or v[0], v[1], v[2] if you don't want compatibility with tables) instead of v.x, v.y, v.z. AFAIK structs still use hashing.
I've specifically tested against this before committing to a specific C type for a math library, in FFI mode there is no performance difference between using array indices and struct fields. In plain Lua(JIT), integer-mode indices are marginally faster than hashtable-mode indices (integer indices might be put into hashtable depending on specific circumstances).

I didn't test against stuff like "return struct[ {'x','y','z'}[1] ]" (inline table is implied to be a reference) because using expressions like that already tanks performance substantially, in this kind of scenario worrying about the speed of fetching values from a struct is moot. If you gonna microoptimize, you gotta go all the way - it's not worth ruining your codebase over some minuscule performance boost.
User avatar
slime
Solid Snayke
Posts: 3132
Joined: Mon Aug 23, 2010 6:45 am
Location: Nova Scotia, Canada
Contact:

Re: Execution speed varies wildly (47x) between (but not during) runs

Post by slime »

This guide says to avoid small inner loops when possible (unroll instead), which matches with the performance results here I guess: http://wiki.luajit.org/Numerical-Comput ... ance-Guide

I still wonder if there's something else that can be tuned to make this example better instead of having to unroll loops though, since the difference is so extreme.
SyntheticDreamCorp
Prole
Posts: 3
Joined: Sun Nov 26, 2017 3:55 am

Re: Execution speed varies wildly (47x) between (but not during) runs

Post by SyntheticDreamCorp »

I think you might be experiencing what Cloudflare did when they tried to accurately benchmark LuaJIT.

I used LuaJIT's dump module to get a full dump of the traces and the resulting IR. Of course, given the nature of LuaJIT, I imagine every run of the program will dump something unique. What's common between dumps, however, is that dumps of the program when it runs slowly all show consistent trace exiting, whereas dumps of the program running fast do not. Examples here:
fast dump
slow dump

meta.aadd seems to be the function where the exit-happy trace comes from. Isolating the traces that contain that function, in the fast-dump as well as the slow-dump, the results are pretty telling: in the slow variants, meta.aadd gets its own trace for the for-loop contained within meta.aadd, resulting (I think) in an occasional dip back into the interpreter in the first test2 loop, whereas in the fast-variant (this can also be noted in the fast dump) you find that meta.aadd is integrated into one trace, and no exits occur (and the loop is kept entirely compiled with no return to the interpreter).
Here are the isolated traces:
fast
slow

My best guess as to what's happening, is that LuaJIT tries to perform automatic loop unrolling when it compiles a trace, but the code (for whatever reason) sometimes doesn't seem to trigger the optimization, and instead it tries to compile the loop contained in meta.aadd, then bytecode between that loop and the rest of the main loop ends up going un-compiled overall (which is why you end up with so many trace exits, and hence slow downs). I think this is consistent with the finding that unrolling the loop by hand eliminates the slow-variants of program execution.

Along with the info mentioned in the cloudflare article, a lot of LuaJIT's optimizations happen on a probabilistic basis based on runtime analysis of the traces, which is probably one (of at least two) reasons why the performance in the original code is undeterministic. It's possible that you could see the function get traced properly if it were used outside of a sterile test program. The joys of JIT compilation.

My understanding of LuaJIT isn't the tightest, so take what I said here with a grain of salt. Optimizing LuaJIT code is much more of an art than a science IMO, so I don't think you'll ever be able to find universal solutions to problems like these, except staying vigilant.
User avatar
pgimeno
Party member
Posts: 3550
Joined: Sun Oct 18, 2015 2:58 pm

Re: Execution speed varies wildly (47x) between (but not during) runs

Post by pgimeno »

That was a very interesting read, thank you SyntheticDreamCorp.
loops are compiled non-deterministically based on the particular address in memory they happen to have been loaded at.
That together with ASLR explains the variable results.
User avatar
raidho36
Party member
Posts: 2063
Joined: Mon Jun 17, 2013 12:00 pm

Re: Execution speed varies wildly (47x) between (but not during) runs

Post by raidho36 »

In the same article they say that the patch that fixes it doesn't actually improves stability - it makes some of it better and some of it even worse. It kind of makes sense considering that it's trace exit problem and not bad hotloop detection problem, but I might be misreading this.
Post Reply

Who is online

Users browsing this forum: Yolwoocle and 126 guests