ljx

FORK: LuaJIT with native 5.2 and 5.3 support
git clone https://git.neptards.moe/neptards/ljx.git
Log | Files | Refs | README

ext_profiler.html (13135B)


      1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
      2 <html>
      3 <head>
      4 <title>Profiler</title>
      5 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
      6 <meta name="Author" content="Mike Pall">
      7 <meta name="Copyright" content="Copyright (C) 2005-2016, Mike Pall">
      8 <meta name="Language" content="en">
      9 <link rel="stylesheet" type="text/css" href="bluequad.css" media="screen">
     10 <link rel="stylesheet" type="text/css" href="bluequad-print.css" media="print">
     11 </head>
     12 <body>
     13 <div id="site">
     14 <a href="http://luajit.org"><span>Lua<span id="logo">JIT</span></span></a>
     15 </div>
     16 <div id="head">
     17 <h1>Profiler</h1>
     18 </div>
     19 <div id="nav">
     20 <ul><li>
     21 <a href="luajit.html">LuaJIT</a>
     22 <ul><li>
     23 <a href="http://luajit.org/download.html">Download <span class="ext">&raquo;</span></a>
     24 </li><li>
     25 <a href="install.html">Installation</a>
     26 </li><li>
     27 <a href="running.html">Running</a>
     28 </li></ul>
     29 </li><li>
     30 <a href="extensions.html">Extensions</a>
     31 <ul><li>
     32 <a href="ext_ffi.html">FFI Library</a>
     33 <ul><li>
     34 <a href="ext_ffi_tutorial.html">FFI Tutorial</a>
     35 </li><li>
     36 <a href="ext_ffi_api.html">ffi.* API</a>
     37 </li><li>
     38 <a href="ext_ffi_semantics.html">FFI Semantics</a>
     39 </li></ul>
     40 </li><li>
     41 <a href="ext_jit.html">jit.* Library</a>
     42 </li><li>
     43 <a href="ext_c_api.html">Lua/C API</a>
     44 </li><li>
     45 <a class="current" href="ext_profiler.html">Profiler</a>
     46 </li></ul>
     47 </li><li>
     48 <a href="status.html">Status</a>
     49 <ul><li>
     50 <a href="changes.html">Changes</a>
     51 </li></ul>
     52 </li><li>
     53 <a href="faq.html">FAQ</a>
     54 </li><li>
     55 <a href="http://luajit.org/performance.html">Performance <span class="ext">&raquo;</span></a>
     56 </li><li>
     57 <a href="http://wiki.luajit.org/">Wiki <span class="ext">&raquo;</span></a>
     58 </li><li>
     59 <a href="http://luajit.org/list.html">Mailing List <span class="ext">&raquo;</span></a>
     60 </li></ul>
     61 </div>
     62 <div id="main">
     63 <p>
     64 LuaJIT has an integrated statistical profiler with very low overhead. It
     65 allows sampling the currently executing stack and other parameters in
     66 regular intervals.
     67 </p>
     68 <p>
     69 The integrated profiler can be accessed from three levels:
     70 </p>
     71 <ul>
     72 <li>The <a href="#hl_profiler">bundled high-level profiler</a>, invoked by the
     73 <a href="#j_p"><tt>-jp</tt></a> command line option.</li>
     74 <li>A <a href="#ll_lua_api">low-level Lua API</a> to control the profiler.</li>
     75 <li>A <a href="#ll_c_api">low-level C API</a> to control the profiler.</li>
     76 </ul>
     77 
     78 <h2 id="hl_profiler">High-Level Profiler</h2>
     79 <p>
     80 The bundled high-level profiler offers basic profiling functionality. It
     81 generates simple textual summaries or source code annotations. It can be
     82 accessed with the <a href="#j_p"><tt>-jp</tt></a> command line option
     83 or from Lua code by loading the underlying <tt>jit.p</tt> module.
     84 </p>
     85 <p>
     86 To cut to the chase &mdash; run this to get a CPU usage profile by
     87 function name:
     88 </p>
     89 <pre class="code">
     90 luajit -jp myapp.lua
     91 </pre>
     92 <p>
     93 It's <em>not</em> a stated goal of the bundled profiler to add every
     94 possible option or to cater for special profiling needs. The low-level
     95 profiler APIs are documented below. They may be used by third-party
     96 authors to implement advanced functionality, e.g. IDE integration or
     97 graphical profilers.
     98 </p>
     99 <p>
    100 Note: Sampling works for both interpreted and JIT-compiled code. The
    101 results for JIT-compiled code may sometimes be surprising. LuaJIT
    102 heavily optimizes and inlines Lua code &mdash; there's no simple
    103 one-to-one correspondence between source code lines and the sampled
    104 machine code.
    105 </p>
    106 
    107 <h3 id="j_p"><tt>-jp=[options[,output]]</tt></h3>
    108 <p>
    109 The <tt>-jp</tt> command line option starts the high-level profiler.
    110 When the application run by the command line terminates, the profiler
    111 stops and writes the results to <tt>stdout</tt> or to the specified
    112 <tt>output</tt> file.
    113 </p>
    114 <p>
    115 The <tt>options</tt> argument specifies how the profiling is to be
    116 performed:
    117 </p>
    118 <ul>
    119 <li><tt>f</tt> &mdash; Stack dump: function name, otherwise module:line.
    120 This is the default mode.</li>
    121 <li><tt>F</tt> &mdash; Stack dump: ditto, but dump module:name.</li>
    122 <li><tt>l</tt> &mdash; Stack dump: module:line.</li>
    123 <li><tt>&lt;number&gt;</tt> &mdash; stack dump depth (callee &larr;
    124 caller). Default: 1.</li>
    125 <li><tt>-&lt;number&gt;</tt> &mdash; Inverse stack dump depth (caller
    126 &rarr; callee).</li>
    127 <li><tt>s</tt> &mdash; Split stack dump after first stack level. Implies
    128 depth&nbsp;&ge;&nbsp;2 or depth&nbsp;&le;&nbsp;-2.</li>
    129 <li><tt>p</tt> &mdash; Show full path for module names.</li>
    130 <li><tt>v</tt> &mdash; Show VM states.</li>
    131 <li><tt>z</tt> &mdash; Show <a href="#jit_zone">zones</a>.</li>
    132 <li><tt>r</tt> &mdash; Show raw sample counts. Default: show percentages.</li>
    133 <li><tt>a</tt> &mdash; Annotate excerpts from source code files.</li>
    134 <li><tt>A</tt> &mdash; Annotate complete source code files.</li>
    135 <li><tt>G</tt> &mdash; Produce raw output suitable for graphical tools.</li>
    136 <li><tt>m&lt;number&gt;</tt> &mdash; Minimum sample percentage to be shown.
    137 Default: 3%.</li>
    138 <li><tt>i&lt;number&gt;</tt> &mdash; Sampling interval in milliseconds.
    139 Default: 10ms.<br>
    140 Note: The actual sampling precision is OS-dependent.</li>
    141 </ul>
    142 <p>
    143 The default output for <tt>-jp</tt> is a list of the most CPU consuming
    144 spots in the application. Increasing the stack dump depth with (say)
    145 <tt>-jp=2</tt> may help to point out the main callers or callees of
    146 hotspots. But sample aggregation is still flat per unique stack dump.
    147 </p>
    148 <p>
    149 To get a two-level view (split view) of callers/callees, use
    150 <tt>-jp=s</tt> or <tt>-jp=-s</tt>. The percentages shown for the second
    151 level are relative to the first level.
    152 </p>
    153 <p>
    154 To see how much time is spent in each line relative to a function, use
    155 <tt>-jp=fl</tt>.
    156 </p>
    157 <p>
    158 To see how much time is spent in different VM states or
    159 <a href="#jit_zone">zones</a>, use <tt>-jp=v</tt> or <tt>-jp=z</tt>.
    160 </p>
    161 <p>
    162 Combinations of <tt>v/z</tt> with <tt>f/F/l</tt> produce two-level
    163 views, e.g. <tt>-jp=vf</tt> or <tt>-jp=fv</tt>. This shows the time
    164 spent in a VM state or zone vs. hotspots. This can be used to answer
    165 questions like "Which time consuming functions are only interpreted?" or
    166 "What's the garbage collector overhead for a specific function?".
    167 </p>
    168 <p>
    169 Multiple options can be combined &mdash; but not all combinations make
    170 sense, see above. E.g. <tt>-jp=3si4m1</tt> samples three stack levels
    171 deep in 4ms intervals and shows a split view of the CPU consuming
    172 functions and their callers with a 1% threshold.
    173 </p>
    174 <p>
    175 Source code annotations produced by <tt>-jp=a</tt> or <tt>-jp=A</tt> are
    176 always flat and at the line level. Obviously, the source code files need
    177 to be readable by the profiler script.
    178 </p>
    179 <p>
    180 The high-level profiler can also be started and stopped from Lua code with:
    181 </p>
    182 <pre class="code">
    183 require("jit.p").start(options, output)
    184 ...
    185 require("jit.p").stop()
    186 </pre>
    187 
    188 <h3 id="jit_zone"><tt>jit.zone</tt> &mdash; Zones</h3>
    189 <p>
    190 Zones can be used to provide information about different parts of an
    191 application to the high-level profiler. E.g. a game could make use of an
    192 <tt>"AI"</tt> zone, a <tt>"PHYS"</tt> zone, etc. Zones are hierarchical,
    193 organized as a stack.
    194 </p>
    195 <p>
    196 The <tt>jit.zone</tt> module needs to be loaded explicitly:
    197 </p>
    198 <pre class="code">
    199 local zone = require("jit.zone")
    200 </pre>
    201 <ul>
    202 <li><tt>zone("name")</tt> pushes a named zone to the zone stack.</li>
    203 <li><tt>zone()</tt> pops the current zone from the zone stack and
    204 returns its name.</li>
    205 <li><tt>zone:get()</tt> returns the current zone name or <tt>nil</tt>.</li>
    206 <li><tt>zone:flush()</tt> flushes the zone stack.</li>
    207 </ul>
    208 <p>
    209 To show the time spent in each zone use <tt>-jp=z</tt>. To show the time
    210 spent relative to hotspots use e.g. <tt>-jp=zf</tt> or <tt>-jp=fz</tt>.
    211 </p>
    212 
    213 <h2 id="ll_lua_api">Low-level Lua API</h2>
    214 <p>
    215 The <tt>jit.profile</tt> module gives access to the low-level API of the
    216 profiler from Lua code. This module needs to be loaded explicitly:
    217 <pre class="code">
    218 local profile = require("jit.profile")
    219 </pre>
    220 <p>
    221 This module can be used to implement your own higher-level profiler.
    222 A typical profiling run starts the profiler, captures stack dumps in
    223 the profiler callback, adds them to a hash table to aggregate the number
    224 of samples, stops the profiler and then analyzes all of the captured
    225 stack dumps. Other parameters can be sampled in the profiler callback,
    226 too. But it's important not to spend too much time in the callback,
    227 since this may skew the statistics.
    228 </p>
    229 
    230 <h3 id="profile_start"><tt>profile.start(mode, cb)</tt>
    231 &mdash; Start profiler</h3>
    232 <p>
    233 This function starts the profiler. The <tt>mode</tt> argument is a
    234 string holding options:
    235 </p>
    236 <ul>
    237 <li><tt>f</tt> &mdash; Profile with precision down to the function level.</li>
    238 <li><tt>l</tt> &mdash; Profile with precision down to the line level.</li>
    239 <li><tt>i&lt;number&gt;</tt> &mdash; Sampling interval in milliseconds (default
    240 10ms).</br>
    241 Note: The actual sampling precision is OS-dependent.
    242 </li>
    243 </ul>
    244 <p>
    245 The <tt>cb</tt> argument is a callback function which is called with
    246 three arguments: <tt>(thread, samples, vmstate)</tt>. The callback is
    247 called on a separate coroutine, the <tt>thread</tt> argument is the
    248 state that holds the stack to sample for profiling. Note: do
    249 <em>not</em> modify the stack of that state or call functions on it.
    250 </p>
    251 <p>
    252 <tt>samples</tt> gives the number of accumulated samples since the last
    253 callback (usually 1).
    254 </p>
    255 <p>
    256 <tt>vmstate</tt> holds the VM state at the time the profiling timer
    257 triggered. This may or may not correspond to the state of the VM when
    258 the profiling callback is called. The state is either <tt>'N'</tt>
    259 native (compiled) code, <tt>'I'</tt> interpreted code, <tt>'C'</tt>
    260 C&nbsp;code, <tt>'G'</tt> the garbage collector, or <tt>'J'</tt> the JIT
    261 compiler.
    262 </p>
    263 
    264 <h3 id="profile_stop"><tt>profile.stop()</tt>
    265 &mdash; Stop profiler</h3>
    266 <p>
    267 This function stops the profiler.
    268 </p>
    269 
    270 <h3 id="profile_dump"><tt>dump = profile.dumpstack([thread,] fmt, depth)</tt>
    271 &mdash; Dump stack </h3>
    272 <p>
    273 This function allows taking stack dumps in an efficient manner. It
    274 returns a string with a stack dump for the <tt>thread</tt> (coroutine),
    275 formatted according to the <tt>fmt</tt> argument:
    276 </p>
    277 <ul>
    278 <li><tt>p</tt> &mdash; Preserve the full path for module names. Otherwise
    279 only the file name is used.</li>
    280 <li><tt>f</tt> &mdash; Dump the function name if it can be derived. Otherwise
    281 use module:line.</li>
    282 <li><tt>F</tt> &mdash; Ditto, but dump module:name.</li>
    283 <li><tt>l</tt> &mdash; Dump module:line.</li>
    284 <li><tt>Z</tt> &mdash; Zap the following characters for the last dumped
    285 frame.</li>
    286 <li>All other characters are added verbatim to the output string.</li>
    287 </ul>
    288 <p>
    289 The <tt>depth</tt> argument gives the number of frames to dump, starting
    290 at the topmost frame of the thread. A negative number dumps the frames in
    291 inverse order.
    292 </p>
    293 <p>
    294 The first example prints a list of the current module names and line
    295 numbers of up to 10 frames in separate lines. The second example prints
    296 semicolon-separated function names for all frames (up to 100) in inverse
    297 order:
    298 </p>
    299 <pre class="code">
    300 print(profile.dumpstack(thread, "l\n", 10))
    301 print(profile.dumpstack(thread, "lZ;", -100))
    302 </pre>
    303 
    304 <h2 id="ll_c_api">Low-level C API</h2>
    305 <p>
    306 The profiler can be controlled directly from C&nbsp;code, e.g. for
    307 use by IDEs. The declarations are in <tt>"luajit.h"</tt> (see
    308 <a href="ext_c_api.html">Lua/C API</a> extensions).
    309 </p>
    310 
    311 <h3 id="luaJIT_profile_start"><tt>luaJIT_profile_start(L, mode, cb, data)</tt>
    312 &mdash; Start profiler</h3>
    313 <p>
    314 This function starts the profiler. <a href="#profile_start">See
    315 above</a> for a description of the <tt>mode</tt> argument.
    316 </p>
    317 <p>
    318 The <tt>cb</tt> argument is a callback function with the following
    319 declaration:
    320 </p>
    321 <pre class="code">
    322 typedef void (*luaJIT_profile_callback)(void *data, lua_State *L,
    323                                         int samples, int vmstate);
    324 </pre>
    325 <p>
    326 <tt>data</tt> is available for use by the callback. <tt>L</tt> is the
    327 state that holds the stack to sample for profiling. Note: do
    328 <em>not</em> modify this stack or call functions on this stack &mdash;
    329 use a separate coroutine for this purpose. <a href="#profile_start">See
    330 above</a> for a description of <tt>samples</tt> and <tt>vmstate</tt>.
    331 </p>
    332 
    333 <h3 id="luaJIT_profile_stop"><tt>luaJIT_profile_stop(L)</tt>
    334 &mdash; Stop profiler</h3>
    335 <p>
    336 This function stops the profiler.
    337 </p>
    338 
    339 <h3 id="luaJIT_profile_dumpstack"><tt>p = luaJIT_profile_dumpstack(L, fmt, depth, len)</tt>
    340 &mdash; Dump stack </h3>
    341 <p>
    342 This function allows taking stack dumps in an efficient manner.
    343 <a href="#profile_dump">See above</a> for a description of <tt>fmt</tt>
    344 and <tt>depth</tt>.
    345 </p>
    346 <p>
    347 This function returns a <tt>const&nbsp;char&nbsp;*</tt> pointing to a
    348 private string buffer of the profiler. The <tt>int&nbsp;*len</tt>
    349 argument returns the length of the output string. The buffer is
    350 overwritten on the next call and deallocated when the profiler stops.
    351 You either need to consume the content immediately or copy it for later
    352 use.
    353 </p>
    354 <br class="flush">
    355 </div>
    356 <div id="foot">
    357 <hr class="hide">
    358 Copyright &copy; 2005-2016 Mike Pall
    359 <span class="noprint">
    360 &middot;
    361 <a href="contact.html">Contact</a>
    362 </span>
    363 </div>
    364 </body>
    365 </html>