You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

550 lines
22 KiB
Plaintext

LJClang -- A LuaJIT-based interface to libclang
===============================================
Philipp Kutin
:max-width: 56em
Introduction
------------
:LuaJIT: http://luajit.org/
:libclang: http://clang.llvm.org/doxygen/group__CINDEX.html
:luaclang-parser: https://github.com/mkottman/luaclang-parser
LJClang is an interface to {libclang}[libclang] for {LuaJIT}[LuaJIT], modeled
after and mostly API-compatible with {luaclang-parser}[luaclang-parser] by
Michal Kottman.
Requirements
------------
:LJDownload: http://luajit.org/download.html
* {LJDownload}[LuaJIT 2.0] (latest Git HEAD of the master branch recommended)
* LLVM/Clang -- read the http://clang.llvm.org/get_started.html[getting
started] guide to find out how to obtain Clang from source. `libclang` is
built and installed along with the Clang compiler.
Building and usage
------------------
:Clang-Win32: http://www.ishani.org/web/articles/code/clang-win32/
Most of LJClang is written in Lua (extensively using LuaJIT's FFI), but due
to currently existing limitations, a support C library has to be built.
In the provided `Makefile`, adjust the libclang include path, and issue `make`
to build `libljclang_support.so`.
NOTE: LJClang has been tested on Ubuntu Linux and Windows (using
{Clang-Win32}[Clang-Win32]), but only minor modifications to the build process
should be necessary to get it working with other OSes or configurations.
From here on, LJClang can be used with LuaJIT by issuing a `require` for
`"ljclang"`. One likely wants to use LJClang from its development directory
without installing it to a system-wide path. Because it expects to find
`libljclang_support.so` and several supporting Lua files, one approach is to
wrap client programs into scripts starting LuaJIT with an environment
containing appropriate `LD_LIBRARY_PATH` and `LUA_PATH` entries. For example,
given the following function in `.bashrc`,
----------
# "LuaJIT with added path of the script directory"
ljwp ()
{
local scriptdir=$(cd `dirname $1`; pwd)
LUA_PATH=";;$scriptdir/?.lua" LD_LIBRARY_PATH="$scriptdir" luajit "$@"
}
----------
and assuming that LJClang resides in `~/dl/ljclang`, the `extractdecls.lua`
program described below could be run from anywhere like this:
----------
$~/some/other/dir: ljwp ~/dl/ljclang/extractdecls.lua [args...]
----------
Overview
--------
LJClang provides a cursor-based, callback-driven API to the abstract syntax
tree (AST) of C/C++ source files. These are the main classes:
* `Index` -- represents a set of translation units that could be linked together
* `TranslationUnit` -- a source file together with everything included by it
either directly or transitively
* `Cursor` -- an element in the AST in a translation unit such as a `typedef`
declaration or a statement
* `Type` -- the type of an element (for example, that of a variable, structure
member, or a function's input argument or return value)
To make something interesting happen, you usually create a single `Index`
object, parse into it one or many translation units, and define a callback
function to be invoked on each visit of a `Cursor` by libclang.
Example program
---------------
:CXCursorKind: http://clang.llvm.org/doxygen/group__CINDEX.html#gaaccc432245b4cd9f2d470913f9ef0013
The `extractdecls.lua` script accompanied by LJClang can be used to extract
various kinds of C declarations from (usually) headers and print them in
various forms usable as FFI C declarations or descriptive tables with LuaJIT.
----------
Usage: ./extractdecls.lua [our_options...] <file.h> [clang_options...]
-p <filterPattern>
-x <excludePattern1> [-x <excludePattern2>] ...
-s <stripPattern>
-1 <string to print before everything>
-2 <string to print after everything>
-C: print lines like
static const int membname = 123; (enums/macros only)
-R: reverse mapping, only if one-to-one. Print lines like
[123] = "membname"; (enums/macros only)
-f <formatFunc>: user-provided body for formatting function (enums/macros only)
Accepts args `k', `v'; `f' is string.format. Must return a formatted line.
Example: "return f('%s = %s%s,', k, k:find('KEY_') and '65536+' or '', v)"
Incompatible with -C or -R.
-Q: be quiet
-w: extract what? Can be
EnumConstantDecl (default), TypedefDecl, FunctionDecl, MacroDefinition
----------
In fact, the file `ljclang_cursor_kind.lua` is generated by this program and is
used by LJClang to map values of the enumeration {CXCursorKind}[`enum
CXCursorKind`] to their names. The `bootstrap` target in the `Makefile`
extracts the relevant information using these options:
----------
-R -p '^CXCursor_' -x '_First' -x '_Last' -x '_GCCAsmStmt' -x '_MacroInstantiation' -s '^CXCursor_' \
-1 'return { name={' -2 '}, }' -Q
----------
Thus, the `typedef` declarations are filtered to begin with ``++CXCursor_++''
and all ``secondary'' names aliasing the one considered the main one are
rejected. (For example, `CXCursor_AsmStmt` and `CXCursor_GCCAsmStmt` have the
same value.) Finally, the prefix is stripped (`-s`) to yield lines like
----------
[215] = "AsmStmt";
----------
Reference
---------
:clang_createIndex: http://clang.llvm.org/doxygen/group__CINDEX.html#func-members
:CXChildVisitResult: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__TRAVERSAL.html#ga99a9058656e696b622fbefaf5207d715
:clang_parseTranslationUnit: http://clang.llvm.org/doxygen/group__CINDEX__TRANSLATION__UNIT.html#ga2baf83f8c3299788234c8bce55e4472e
:clang_createTranslationUnit: http://clang.llvm.org/doxygen/group__CINDEX__TRANSLATION__UNIT.html#gaa2e74f6e28c438692fd4f5e3d3abda97
The module returned by `require("ljclang")` contains the following:
`createIndex([excludePch : boolean [, showDiagnostics : boolean]])` -> `Index`::
Binding for {clang_createIndex}[clang_createIndex]. Will create an `Index` into
which you can parse ++TranslationUnit++s. Both input arguments are optional and
default to *false*.
+
NOTE: Loading pre-compiled translation units in not implemented.
[[ChildVisitResult]]
`ChildVisitResult`::
An object containing a mapping of names to values permissible as values
{CXChildVisitResult}[returned] from cursor visitor callbacks: `Break`,
`Continue`, `Recurse`.
[[regCursorVisitor]]
`regCursorVisitor(visitorfunc)` -> `vf_handle`::
Registers a child visitor callback function `visitorfunc` with LJClang,
returning a handle which can be passed to `Cursor:children()`. The callback
function receives two input arguments, `(cursor, parent)` -- with the cursors
of the currently visited entity as well as its parent, and must return a value
from the `ChildVisitResult` enumeration to indicate whether or how libclang
should carry on AST visiting.
+
CAUTION: The `cursor` passed to the visitor callback is only valid during one
particular callback invocation. If it is to be used after the function has
returned, it *must* be copied using the `Cursor` constructor mentioned below.
`Cursor([cur : Cursor])` -> `Cursor`::
A constructor to create a permanent cursor from that received by the visitor
callback.
`Index`
-------
:TUFlags: http://clang.llvm.org/doxygen/group__CINDEX__TRANSLATION__UNIT.html#enum-members
`Index:parse(sourceFile : string, args : table [, opts : table])` -> `TranslationUnit`::
Binding for {clang_parseTranslationUnit}[clang_parseTranslationUnit]. This will
parse a given source file `sourceFile` with the command line arguments `args`,
which would be given to the compiler for compilation, containing e.g. include
paths or defines.
If `sourceFile` is the empty string, the source file is expected to be named in
`args`.
+
The last optional argument `opts` is expected to be a sequence containing
{TUFlags}[`CXTranslationUnit_*`] enum names without the `"CXTranslationUnit_"`
prefix, for example `{ "DetailedPreprocessingRecord" }`.
+
NOTE: Both `args` and `opts` (if given) must not contain an element at index 0.
//////////
`Index:load(astFile : string)` -> `TranslationUnit`::
Binding for
{clang_createTranslationUnit}[clang_createTranslationUnit]. This will load
the translation unit from an AST file which was constructed using `clang
-emit-ast`. Useful when repeatedly processing large sets of files (like
frameworks).
//////////
`TranslationUnit`
-----------------
:clang_getTranslationUnitCursor: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__MANIP.html#gaec6e69127920785e74e4a517423f4391
:clang_getFile: http://clang.llvm.org/doxygen/group__CINDEX__FILES.html#gaa0554e2ea48ecd217a29314d3cbd2085
:clang_getDiagnostic: http://clang.llvm.org/doxygen/group__CINDEX__DIAG.html#ga3f54a79e820c2ac9388611e98029afe5
:code_completion_API: http://clang.llvm.org/doxygen/group__CINDEX__CODE__COMPLET.html
:clang_visitChildren: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__TRAVERSAL.html#ga5d0a813d937e1a7dcc35f206ad1f7a91
`TranslationUnit:cursor()` -> `Cursor`::
Binding for
{clang_getTranslationUnitCursor}[clang_getTranslationUnitCursor]. Returns
the `Cursor` representing a given translation unit, which provides access
to information about e.g. functions and types defined in a given file.
//////////
`TranslationUnit:file(fileName : string)` -> `string, number`::
//////////
`TranslationUnit:file(fileName : string)` -> `string`::
Binding for {clang_getFile}[clang_getFile]. Returns the absolute file path
of `fileName`.
+
NOTE: The last modification date is currently not returned as in
luaclang-parser.
//////////
and a `time_t` last modification time
//////////
`TranslationUnit:diagnostics()` -> `{ Diagnostic* }`::
Binding for {clang_getDiagnostic}[clang_getDiagnostic]. Returns a table
array of `Diagnostic`, which represent warnings and errors. Each diagnostic
is a table indexable by these keys: `text` -- the diagnostic message, and
`category` -- a diagnostic category (also a string).
//////////
`TranslationUnit:codeCompleteAt(file : string, line : number, column : number)` -> `{ Completion* }, { Diagnostics* }`::
Binding for {code_completion_API}[code completion API]. Returns the
available code completion options at a given location using prior
content. Each `Completion` is a table consisting of several chunks, each of
which has a text and a {chunk kind}[chunk kind] without the
`CXCompletionChunk_` prefix. If there are any annotations, the
`annotations` key is a table of strings:
completion = {
priority = number, priority of given completion
chunks = {
kind = string, chunk kind
text = string, chunk text
},
[annotations = { string* }]
}
//////////
`Cursor`
--------
:clang_getCursorSemanticParent: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__MANIP.html#gabc327b200d46781cf30cb84d4af3c877
:clang_getCursorLexicalParent: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__MANIP.html#gace7a423874d72b3fdc71d6b0f31830dd
:clang_getCursorSpelling: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__XREF.html#gaad1c9b2a1c5ef96cebdbc62f1671c763
:clang_getCursorDisplayName: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__XREF.html#gac3eba3224d109a956f9ef96fd4fe5c83
:cursor_kind: http://clang.llvm.org/doxygen/group__CINDEX.html#gaaccc432245b4cd9f2d470913f9ef0013
:clang_Cursor_getArgument: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga673c5529d33eedd0b78aca5ac6fc1d7c
:clang_getCursorResultType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga6995a2d6352e7136868574b299005a63
:clang_getCursorExtent: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__SOURCE.html#ga79f6544534ab73c78a8494c4c0bc2840
:clang_getCursorReferenced: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__XREF.html#gabf059155921552e19fc2abed5b4ff73a
:clang_getCursorDefinition: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__XREF.html#gafcfbec461e561bf13f1e8540bbbd655b
:clang_getSpellingLocation: http://clang.llvm.org/doxygen/group__CINDEX__LOCATIONS.html#ga01f1a342f7807ea742aedd2c61c46fa0
:clang_getPresumedLocation: http://clang.llvm.org/doxygen/group__CINDEX__LOCATIONS.html#ga03508d9c944feeb3877515a1b08d36f9
:clang_getEnumConstantDeclValue: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga6b8585818420e7512feb4c9d209b4f4d
:clang_getEnumConstantUnsignedDeclValue: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#gaf7cbd4f2d371dd93e8bc997c951a1aef
:clang_getTypedefDeclUnderlyingType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga8de899fc18dc859b6fe3b97309f4fd52
:clang_Cursor_getTranslationUnit: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__MANIP.html#ga529f1504710a41ce358d4e8c3161848d
:clang_isCursorDefinition: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__XREF.html#ga6ad05634a73e693217088eaa693f0010
You can compare whether two ++Cursor++s represent the same element using the
standard `==` Lua operator. Comparisons with any other type yield *false*.
`Cursor:children()` -> `{ Cursor* }`::
`Cursor:children(vf_handle)` -> `boolean`::
Binding over {clang_visitChildren}[clang_visitChildren]. This is the main
function for AST traversal. The first form collects the direct descendants of
the given cursor in a table, returning an empty one if none are found. The
second, preferred form accepts a handle of a visitor function previously
registered with <<regCursorVisitor,`regCursorVisitor()`>> instead. Here, the
returned value indicates whether the traversal was aborted prematurely due to
the callback returning +<<ChildVisitResult,ChildVisitResult>>.Break+.
+
NOTE: Currently, the recommended procedure is to encapsulate the logic of one
particular ``analysis'' into one visitor callback, which may run different
portions of code e.g. conditional on the cursor's kind. (Instead of calling
`Cursor:children(visitor_function_handle)` with a different visitor function
while another invocation of it is active.)
//////////
Traverses the direct descendants of a given
cursor and collects them in a table. If no child cursors are found, returns
an empty table.
//////////
`Cursor:parent()` -> `Cursor`::
Binding for
{clang_getCursorSemanticParent}[clang_getCursorSemanticParent]. Returns a
cursor to the semantic parent of a given element. For example, for a method
cursor, returns its class. For a global declaration, returns the
translation unit cursor.
`Cursor:lexicalParent()` -> `Cursor`::
Binding for
{clang_getCursorLexicalParent}[clang_getCursorLexicalParent]. Returns a
cursor to the lexical parent of a given element.
`Cursor:name()` -> `string`::
Binding over {clang_getCursorSpelling}[clang_getCursorSpelling]. Returns
the name of the entity referenced by cursor. `Cursor` also has `__tostring`
set to this method.
`Cursor:displayName()` -> `string`::
Binding over
{clang_getCursorDisplayName}[clang_getCursorDisplayName]. Returns the
display name of the entity, which for example is a function signature.
`Cursor:kind()` -> `string`::
Returns the {cursor_kind}[cursor kind] without the `CXCursor_` prefix,
e.g. `"FunctionDecl"`.
`Cursor:haskind(kind : string)` -> `boolean`::
Checks whether the cursor has kind given by `kind`, which must be a string
of {CXCursorKind}[`enum CXCursorKind`] names without the `CXCursor_`
prefix. For instance, `if (cur:haskind("TypedefDecl")) then --[[ do
something ]] end` .
//////////
kindnum
//////////
`Cursor:arguments()` -> `{ Cursor* }`::
Binding of {clang_Cursor_getArgument}[clang_Cursor_getArgument]. Returns a
table array of ++Cursor++s representing arguments of a function or a
method. Returns an empty table if a cursor is not a method or function.
`Cursor:translationUnit()` -> `TranslationUnit`::
Binding for
{clang_Cursor_getTranslationUnit}[clang_Cursor_getTranslationUnit]. Returns
the translation unit that a cursor originated from.
`Cursor:resultType()` -> `Type`::
Binding for {clang_getCursorResultType}[clang_getCursorResultType]. For a
function or a method cursor, returns the return type of the function.
`Cursor:typedefType()` -> `Type`::
If the cursor references a typedef declaration, returns its
{clang_getTypedefDeclUnderlyingType}[underlying type].
//////////
XXX: Make error instead?
Otherwise, returns *nil*.
//////////
`Cursor:type()` -> `Type`::
Returns the `Type` of a given element or *nil* if not available.
`Cursor:location([linesfirst : boolean])` -> `string, number, number, number, number [, number, number]`::
Binding for {clang_getCursorExtent}[clang_getCursorExtent] and
{clang_getSpellingLocation}[clang_getSpellingLocation]. Returns the _file
name_, _starting line_, _starting column_, _ending line_ and _ending
column_ of the given cursor. If the optional argument `linesfirst` is true,
the numbers are ordered like _starting line_, _ending line_, _starting
column_, _ending column_, _starting offset_, _ending offset_ instead. If
`linesfirst` has the string value `'offset'`, only _starting offset_,
_ending offset_ are returned.
`Cursor:presumedLocation([linesfirst : boolean])` -> `string, number, number, number, number
Binding for {clang_getCursorExtent}[clang_getCursorExtent] and
{clang_getPresumedLocation}[clang_getPresumedLocation].
//////////
XXX: Better provide an API around CXSourceRange.
This can be used to look up the text a cursor consists of.
//////////
`Cursor:definition()` -> `Cursor`::
Binding for {clang_getCursorDefinition}[clang_getCursorDefinition]. For a
reference or declaration, returns a cursor to the definition of the entity,
otherwise returns *nil*.
`Cursor:referenced()` -> `Cursor`::
Binding for {clang_getCursorReferenced}[clang_getCursorReferenced]. For a
reference type, returns a cursor to the element it references, otherwise
returns *nil*.
`Cursor:access()` -> `string`::
When cursor kind is `"AccessSpecifier"`, returns one of `"private"`,
`"protected"` and `"public"`.
`Cursor:isDefinition()` -> `boolean`::
Binding for {clang_isCursorDefinition}[clang_isCursorDefinition]. Determine
whether the declaration pointed to by this cursor is also a definition of
that entity.
`Cursor:isVirtual()` -> `boolean`::
For a C++ method, returns whether the method is virtual.
`Cursor:isStatic()` -> `boolean`::
For a C++ method, returns whether the method is static.
`Cursor:enumValue([unsigned : boolean])` -> `enum cdata`::
If the cursor represents an enumeration constant (`CXCursor_EnumConstantDecl`),
returns its numeric value as a {clang_getEnumConstantDeclValue}[signed] 64-bit
signed integer, or a 64-bit {clang_getEnumConstantUnsignedDeclValue}[unsigned]
integer if `unsigned` is true.
+
NOTE: In C99, an enumeration constant must be in the range of values
representable by an `int` (6.7.2.2#2). LJClang does not check for this
constraint.
`Cursor:enumval([unsigned : boolean])` -> `number`::
Returns the cdata obtained from `enumValue()` as a Lua number, converted
using `tonumber()`. Again, no checking of any kind is carried out.
`Type`
------
:clang_getTypeKindSpelling: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga6bd7b366d998fc67f4178236398d0666
:clang_getCanonicalType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#gaa9815d77adc6823c58be0a0e32010f8c
:clang_getPointeeType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#gaafa3eb34932d8da1358d50ed949ff3ee
:clang_isPODType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga3e7fdbe3d246ed03298bd074c5b3703e
:clang_isConstQualifiedType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga8c3f8029254d5862bcd595d6c8778e5b
:clang_getTypeDeclaration: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga0aad74ea93a2f5dea58fd6fc0db8aad4
:clang_getArrayElementType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga718591f4b07d9d4861557a3ed8b29713
:clang_getArraySize: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga91521260817054f153b5f1295056192d
:CXTypeKind: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#gaad39de597b13a18882c21860f92b095a
You can compare whether two ++Type++s represent the same type using the standard
`==` Lua operator. Comparisons with any other type yield *false*.
`Type:name()` -> `string`::
Binding of {clang_getTypeKindSpelling}[clang_getTypeKindSpelling]. Returns
one of {CXTypeKind}[`CXTypeKind`] as a string without the `CXType_`
prefix. `Type` also has `__tostring` set to this method.
`Type:canonical()` -> `Type`::
Binding of {clang_getCanonicalType}[clang_getCanonicalType]. Returns
underlying type with all typedefs removed.
+
NOTE: Unlike luaclang-parser, LJClang does *not* dispatch to
`clang_getPointeeType()` for pointer types.
//////////
XXX: What was the intention of that? Test out stuff...
//////////
`Type:pointee()` -> `Type`::
Binding of {clang_getPointeeType}[clang_getPointeeType]. For pointer type
returns the type of the pointee.
`Type:isPod()` -> `boolean`::
Binding of {clang_isPODType}[clang_isPODType]. Returns true if the type is
a ``Plain Old Data'' type.
`Type:isConst()` -> `boolean`::
`Type:isConstQualified()` -> `boolean`::
Binding of
{clang_isConstQualifiedType}[clang_isConstQualifiedType]. Returns true if
the type has a `const` qualifier.
`Type:declaration()` -> `Cursor`::
Binding of {clang_getTypeDeclaration}[clang_getTypeDeclaration]. Returns a
`Cursor` to the declaration of a given type, or *nil*.
`Type:arrayElementType()` -> `Type`::
Binding of {clang_getArrayElementType}[clang_getArrayElementType].
`Type:arraySize()` -> `Type`::
Binding of {clang_getArraySize}[clang_getArraySize].
License
-------
Copyright (C) 2013 Philipp Kutin
(Portions of the documentation copied or adapted from luaclang-parser, Copyright
(C) 2012 Michal Kottman)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.