A LuaJIT-based interface to libclang / fork
You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
u3shit 19cc313a1c use llvm-config to figure out compilation flags (where possible) 8 years ago
.gitignore ignore generated .so 8 years ago
COPYRIGHT.TXT Initial commit of LJClang. 11 years ago
LLVM_LICENSE.TXT Initial commit of LJClang. 11 years ago
Makefile use llvm-config to figure out compilation flags (where possible) 8 years ago
README.adoc mgrep: also consider struct tags for <typeName>, fix Makefile. 10 years ago
class.lua Add a convenience module 'class', use it to refactor mgrep's File*(). 10 years ago
createheader.lua Update generated files ljclang_Index_h.lua and ljclang_cursor_kind.lua 10 years ago
extractdecls.lua extractdecls.lua: print a separate error message if file could not be opened. 10 years ago
ljclang.lua add type:isNoexcept 8 years ago
ljclang_Index_h.lua add type:isNoexcept 8 years ago
ljclang_cursor_kind.lua do not break on unknown cursor value, update cursor types to clang 3.9 8 years ago
ljclang_support.c do not break on unknown cursor value, update cursor types to clang 3.9 8 years ago
mgrep.lua mgrep: also handle class members. 10 years ago
mgrep.sh.in mgrep: pass compilation DB name with '-d' option, not as positional argument. 10 years ago
parsecmdline_pk.lua ability to actually specify clang arguments to extractdecls.lua 8 years ago
test.lua Allow creating TranslationUnit_t object that map to the same CXTranslationUnit. 11 years ago


LJClang -- A LuaJIT-based interface to libclang
Philipp Kutin
:max-width: 56em


:LuaJIT: http://luajit.org/
:libclang: http://clang.llvm.org/doxygen/group__CINDEX.html
:luaclang-parser: https://github.com/mkottman/luaclang-parser

LJClang is an interface to {libclang}[libclang] for {LuaJIT}[LuaJIT], modeled
after and mostly API-compatible with {luaclang-parser}[luaclang-parser] by
Michal Kottman.


:LJDownload: http://luajit.org/download.html

* {LJDownload}[LuaJIT 2.0] (latest Git HEAD of the master branch recommended)
* LLVM/Clang -- read the http://clang.llvm.org/get_started.html[getting
  started] guide to find out how to obtain Clang from source. `libclang` is
  built and installed along with the Clang compiler.

Building and usage

:Clang-Win32: http://www.ishani.org/web/articles/code/clang-win32/

Most of LJClang is written in Lua (extensively using LuaJIT's FFI), but due
to currently existing limitations, a support C library has to be built.

In the provided `Makefile`, adjust the libclang include path, and issue `make`
to build `libljclang_support.so`.

NOTE: LJClang has been tested on Ubuntu Linux and Windows (using
{Clang-Win32}[Clang-Win32]), but only minor modifications to the build process
should be necessary to get it working with other OSes or configurations.

From here on, LJClang can be used with LuaJIT by issuing a `require` for
`"ljclang"`. One likely wants to use LJClang from its development directory
without installing it to a system-wide path. Because it expects to find
`libljclang_support.so` and several supporting Lua files, one approach is to
wrap client programs into scripts starting LuaJIT with an environment
containing appropriate `LD_LIBRARY_PATH` and `LUA_PATH` entries. For example,
given the following function in `.bashrc`,

# "LuaJIT with added path of the script directory"
ljwp ()
    local scriptdir=$(cd `dirname $1`; pwd)
    LUA_PATH=";;$scriptdir/?.lua" LD_LIBRARY_PATH="$scriptdir" luajit "$@"

and assuming that LJClang resides in `~/dl/ljclang`, the `extractdecls.lua`
program described below could be run from anywhere like this:

$~/some/other/dir: ljwp ~/dl/ljclang/extractdecls.lua [args...]


LJClang provides a cursor-based, callback-driven API to the abstract syntax
tree (AST) of C/C++ source files. These are the main classes:

* `Index` -- represents a set of translation units that could be linked together
* `TranslationUnit` -- a source file together with everything included by it
  either directly or transitively
* `Cursor` -- an element in the AST in a translation unit such as a `typedef`
  declaration or a statement
* `Type` -- the type of an element (for example, that of a variable, structure
  member, or a function's input argument or return value)

To make something interesting happen, you usually create a single `Index`
object, parse into it one or many translation units, and define a callback
function to be invoked on each visit of a `Cursor` by libclang.

Example program

:CXCursorKind: http://clang.llvm.org/doxygen/group__CINDEX.html#gaaccc432245b4cd9f2d470913f9ef0013

The `extractdecls.lua` script accompanied by LJClang can be used to extract
various kinds of C declarations from (usually) headers and print them in
various forms usable as FFI C declarations or descriptive tables with LuaJIT.

Usage: ./extractdecls.lua [our_options...] <file.h> [clang_options...]
  -p <filterPattern>
  -x <excludePattern1> [-x <excludePattern2>] ...
  -s <stripPattern>
  -1 <string to print before everything>
  -2 <string to print after everything>
  -C: print lines like
       static const int membname = 123;  (enums/macros only)
  -R: reverse mapping, only if one-to-one. Print lines like
       [123] = "membname";  (enums/macros only)
  -f <formatFunc>: user-provided body for formatting function (enums/macros only)
       Accepts args `k', `v'; `f' is string.format. Must return a formatted line.
       Example: "return f('%s = %s%s,', k, k:find('KEY_') and '65536+' or '', v)"
       Incompatible with -C or -R.
  -Q: be quiet
  -w: extract what? Can be
       EnumConstantDecl (default), TypedefDecl, FunctionDecl, MacroDefinition

In fact, the file `ljclang_cursor_kind.lua` is generated by this program and is
used by LJClang to map values of the enumeration {CXCursorKind}[`enum
CXCursorKind`] to their names. The `bootstrap` target in the `Makefile`
extracts the relevant information using these options:

-R -p '^CXCursor_' -x '_First' -x '_Last' -x '_GCCAsmStmt' -x '_MacroInstantiation' -s '^CXCursor_' \
    -1 'return { name={' -2 '}, }' -Q

Thus, the `typedef` declarations are filtered to begin with ``++CXCursor_++''
and all ``secondary'' names aliasing the one considered the main one are
rejected. (For example, `CXCursor_AsmStmt` and `CXCursor_GCCAsmStmt` have the
same value.) Finally, the prefix is stripped (`-s`) to yield lines like

[215] = "AsmStmt";


:clang_createIndex: http://clang.llvm.org/doxygen/group__CINDEX.html#func-members
:CXChildVisitResult: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__TRAVERSAL.html#ga99a9058656e696b622fbefaf5207d715
:clang_parseTranslationUnit: http://clang.llvm.org/doxygen/group__CINDEX__TRANSLATION__UNIT.html#ga2baf83f8c3299788234c8bce55e4472e
:clang_createTranslationUnit: http://clang.llvm.org/doxygen/group__CINDEX__TRANSLATION__UNIT.html#gaa2e74f6e28c438692fd4f5e3d3abda97

The module returned by `require("ljclang")` contains the following:

`createIndex([excludePch : boolean [, showDiagnostics : boolean]])` -> `Index`::

Binding for {clang_createIndex}[clang_createIndex]. Will create an `Index` into
which you can parse ++TranslationUnit++s. Both input arguments are optional and
default to *false*.
NOTE: Loading pre-compiled translation units in not implemented.


    An object containing a mapping of names to values permissible as values
    {CXChildVisitResult}[returned] from cursor visitor callbacks: `Break`,
    `Continue`, `Recurse`.

`regCursorVisitor(visitorfunc)` -> `vf_handle`::

Registers a child visitor callback function `visitorfunc` with LJClang,
returning a handle which can be passed to `Cursor:children()`. The callback
function receives two input arguments, `(cursor, parent)` -- with the cursors
of the currently visited entity as well as its parent, and must return a value
from the `ChildVisitResult` enumeration to indicate whether or how libclang
should carry on AST visiting.

CAUTION: The `cursor` passed to the visitor callback is only valid during one
particular callback invocation. If it is to be used after the function has
returned, it *must* be copied using the `Cursor` constructor mentioned below.

`Cursor([cur : Cursor])` -> `Cursor`::

A constructor to create a permanent cursor from that received by the visitor


:TUFlags: http://clang.llvm.org/doxygen/group__CINDEX__TRANSLATION__UNIT.html#enum-members

`Index:parse(sourceFile : string, args : table [, opts : table])` -> `TranslationUnit`::

Binding for {clang_parseTranslationUnit}[clang_parseTranslationUnit]. This will
parse a given source file `sourceFile` with the command line arguments `args`,
which would be given to the compiler for compilation, containing e.g. include
paths or defines.
If `sourceFile` is the empty string, the source file is expected to be named in
The last optional argument `opts` is expected to be a sequence containing
{TUFlags}[`CXTranslationUnit_*`] enum names without the `"CXTranslationUnit_"`
prefix, for example `{ "DetailedPreprocessingRecord" }`.
NOTE: Both `args` and `opts` (if given) must not contain an element at index 0.

`Index:load(astFile : string)` -> `TranslationUnit`::

    Binding for
    {clang_createTranslationUnit}[clang_createTranslationUnit]. This will load
    the translation unit from an AST file which was constructed using `clang
    -emit-ast`. Useful when repeatedly processing large sets of files (like


:clang_getTranslationUnitCursor: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__MANIP.html#gaec6e69127920785e74e4a517423f4391
:clang_getFile: http://clang.llvm.org/doxygen/group__CINDEX__FILES.html#gaa0554e2ea48ecd217a29314d3cbd2085
:clang_getDiagnostic: http://clang.llvm.org/doxygen/group__CINDEX__DIAG.html#ga3f54a79e820c2ac9388611e98029afe5
:code_completion_API: http://clang.llvm.org/doxygen/group__CINDEX__CODE__COMPLET.html
:clang_visitChildren: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__TRAVERSAL.html#ga5d0a813d937e1a7dcc35f206ad1f7a91

`TranslationUnit:cursor()` -> `Cursor`::

    Binding for
    {clang_getTranslationUnitCursor}[clang_getTranslationUnitCursor]. Returns
    the `Cursor` representing a given translation unit, which provides access
    to information about e.g. functions and types defined in a given file.

`TranslationUnit:file(fileName : string)` -> `string, number`::
`TranslationUnit:file(fileName : string)` -> `string`::

Binding for {clang_getFile}[clang_getFile]. Returns the absolute file path
of `fileName`.
NOTE: The last modification date is currently not returned as in
and a `time_t` last modification time

`TranslationUnit:diagnostics()` -> `{ Diagnostic* }`::

    Binding for {clang_getDiagnostic}[clang_getDiagnostic]. Returns a table
    array of `Diagnostic`, which represent warnings and errors. Each diagnostic
    is a table indexable by these keys: `text` -- the diagnostic message, and
    `category` -- a diagnostic category (also a string).

`TranslationUnit:codeCompleteAt(file : string, line : number, column : number)` -> `{ Completion* }, { Diagnostics* }`::

    Binding for {code_completion_API}[code completion API]. Returns the
    available code completion options at a given location using prior
    content. Each `Completion` is a table consisting of several chunks, each of
    which has a text and a {chunk kind}[chunk kind] without the
    `CXCompletionChunk_` prefix. If there are any annotations, the
    `annotations` key is a table of strings:

        completion = {
             priority = number, priority of given completion
             chunks = {
                 kind = string, chunk kind
                 text = string, chunk text
             [annotations = { string* }]


:clang_getCursorSemanticParent: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__MANIP.html#gabc327b200d46781cf30cb84d4af3c877
:clang_getCursorLexicalParent: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__MANIP.html#gace7a423874d72b3fdc71d6b0f31830dd
:clang_getCursorSpelling: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__XREF.html#gaad1c9b2a1c5ef96cebdbc62f1671c763
:clang_getCursorDisplayName: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__XREF.html#gac3eba3224d109a956f9ef96fd4fe5c83
:cursor_kind: http://clang.llvm.org/doxygen/group__CINDEX.html#gaaccc432245b4cd9f2d470913f9ef0013
:clang_Cursor_getArgument: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga673c5529d33eedd0b78aca5ac6fc1d7c
:clang_getCursorResultType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga6995a2d6352e7136868574b299005a63
:clang_getCursorExtent: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__SOURCE.html#ga79f6544534ab73c78a8494c4c0bc2840
:clang_getCursorReferenced: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__XREF.html#gabf059155921552e19fc2abed5b4ff73a
:clang_getCursorDefinition: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__XREF.html#gafcfbec461e561bf13f1e8540bbbd655b
:clang_getSpellingLocation: http://clang.llvm.org/doxygen/group__CINDEX__LOCATIONS.html#ga01f1a342f7807ea742aedd2c61c46fa0
:clang_getPresumedLocation: http://clang.llvm.org/doxygen/group__CINDEX__LOCATIONS.html#ga03508d9c944feeb3877515a1b08d36f9

:clang_getEnumConstantDeclValue: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga6b8585818420e7512feb4c9d209b4f4d
:clang_getEnumConstantUnsignedDeclValue: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#gaf7cbd4f2d371dd93e8bc997c951a1aef
:clang_getTypedefDeclUnderlyingType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga8de899fc18dc859b6fe3b97309f4fd52

:clang_Cursor_getTranslationUnit: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__MANIP.html#ga529f1504710a41ce358d4e8c3161848d
:clang_isCursorDefinition: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__XREF.html#ga6ad05634a73e693217088eaa693f0010

You can compare whether two ++Cursor++s represent the same element using the
standard `==` Lua operator. Comparisons with any other type yield *false*.

`Cursor:children()` -> `{ Cursor* }`::
`Cursor:children(vf_handle)` -> `boolean`::

Binding over {clang_visitChildren}[clang_visitChildren]. This is the main
function for AST traversal. The first form collects the direct descendants of
the given cursor in a table, returning an empty one if none are found. The
second, preferred form accepts a handle of a visitor function previously
registered with <<regCursorVisitor,`regCursorVisitor()`>> instead. Here, the
returned value indicates whether the traversal was aborted prematurely due to
the callback returning +<<ChildVisitResult,ChildVisitResult>>.Break+.
NOTE: Currently, the recommended procedure is to encapsulate the logic of one
particular ``analysis'' into one visitor callback, which may run different
portions of code e.g. conditional on the cursor's kind. (Instead of calling
`Cursor:children(visitor_function_handle)` with a different visitor function
while another invocation of it is active.)

    Traverses the direct descendants of a given
    cursor and collects them in a table. If no child cursors are found, returns
    an empty table.

`Cursor:parent()` -> `Cursor`::

	Binding for
	{clang_getCursorSemanticParent}[clang_getCursorSemanticParent]. Returns a
	cursor to the semantic parent of a given element. For example, for a method
	cursor, returns its class. For a global declaration, returns the
	translation unit cursor.

`Cursor:lexicalParent()` -> `Cursor`::

	Binding for
	{clang_getCursorLexicalParent}[clang_getCursorLexicalParent]. Returns a
	cursor to the lexical parent of a given element.

`Cursor:name()` -> `string`::

    Binding over {clang_getCursorSpelling}[clang_getCursorSpelling]. Returns
    the name of the entity referenced by cursor. `Cursor` also has `__tostring`
    set to this method.

`Cursor:displayName()` -> `string`::

    Binding over
    {clang_getCursorDisplayName}[clang_getCursorDisplayName]. Returns the
    display name of the entity, which for example is a function signature.

`Cursor:kind()` -> `string`::

	Returns the {cursor_kind}[cursor kind] without the `CXCursor_` prefix,
	e.g. `"FunctionDecl"`.

`Cursor:haskind(kind : string)` -> `boolean`::

    Checks whether the cursor has kind given by `kind`, which must be a string
    of {CXCursorKind}[`enum CXCursorKind`] names without the `CXCursor_`
    prefix. For instance, `if (cur:haskind("TypedefDecl")) then --[[ do
    something ]] end` .


`Cursor:arguments()` -> `{ Cursor* }`::

	Binding of {clang_Cursor_getArgument}[clang_Cursor_getArgument]. Returns a
	table array of ++Cursor++s representing arguments of a function or a
	method. Returns an empty table if a cursor is not a method or function.

`Cursor:translationUnit()` -> `TranslationUnit`::

    Binding for
    {clang_Cursor_getTranslationUnit}[clang_Cursor_getTranslationUnit]. Returns
    the translation unit that a cursor originated from.

`Cursor:resultType()` -> `Type`::

	Binding for {clang_getCursorResultType}[clang_getCursorResultType]. For a
	function or a method cursor, returns the return type of the function.

`Cursor:typedefType()` -> `Type`::

    If the cursor references a typedef declaration, returns its
    {clang_getTypedefDeclUnderlyingType}[underlying type].

XXX: Make error instead?
Otherwise, returns *nil*.

`Cursor:type()` -> `Type`::

	Returns the `Type` of a given element or *nil* if not available.

`Cursor:location([linesfirst : boolean])` -> `string, number, number, number, number [, number, number]`::

	Binding for {clang_getCursorExtent}[clang_getCursorExtent] and
	{clang_getSpellingLocation}[clang_getSpellingLocation]. Returns the _file
	name_, _starting line_, _starting column_, _ending line_ and _ending
	column_ of the given cursor. If the optional argument `linesfirst` is true,
	the numbers are ordered like _starting line_, _ending line_, _starting
	column_, _ending column_, _starting offset_, _ending offset_ instead. If
	`linesfirst` has the string value `'offset'`, only _starting offset_,
	_ending offset_ are returned.

`Cursor:presumedLocation([linesfirst : boolean])` -> `string, number, number, number, number

Binding for {clang_getCursorExtent}[clang_getCursorExtent] and

XXX: Better provide an API around CXSourceRange.
This can be used to look up the text a cursor consists of.

`Cursor:definition()` -> `Cursor`::

	Binding for {clang_getCursorDefinition}[clang_getCursorDefinition]. For a
	reference or declaration, returns a cursor to the definition of the entity,
	otherwise returns *nil*.

`Cursor:referenced()` -> `Cursor`::

	Binding for {clang_getCursorReferenced}[clang_getCursorReferenced]. For a
	reference type, returns a cursor to the element it references, otherwise
	returns *nil*.

`Cursor:access()` -> `string`::

	When cursor kind is `"AccessSpecifier"`, returns one of `"private"`,
	`"protected"` and `"public"`.

`Cursor:isDefinition()` -> `boolean`::

    Binding for {clang_isCursorDefinition}[clang_isCursorDefinition]. Determine
    whether the declaration pointed to by this cursor is also a definition of
    that entity.

`Cursor:isVirtual()` -> `boolean`::

	For a C++ method, returns whether the method is virtual.

`Cursor:isStatic()` -> `boolean`::

	For a C++ method, returns whether the method is static.

`Cursor:enumValue([unsigned : boolean])` -> `enum cdata`::

If the cursor represents an enumeration constant (`CXCursor_EnumConstantDecl`),
returns its numeric value as a {clang_getEnumConstantDeclValue}[signed] 64-bit
signed integer, or a 64-bit {clang_getEnumConstantUnsignedDeclValue}[unsigned]
integer if `unsigned` is true.
NOTE: In C99, an enumeration constant must be in the range of values
representable by an `int` ( LJClang does not check for this

`Cursor:enumval([unsigned : boolean])` -> `number`::

    Returns the cdata obtained from `enumValue()` as a Lua number, converted
    using `tonumber()`. Again, no checking of any kind is carried out.


:clang_getTypeKindSpelling: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga6bd7b366d998fc67f4178236398d0666
:clang_getCanonicalType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#gaa9815d77adc6823c58be0a0e32010f8c
:clang_getPointeeType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#gaafa3eb34932d8da1358d50ed949ff3ee
:clang_isPODType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga3e7fdbe3d246ed03298bd074c5b3703e
:clang_isConstQualifiedType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga8c3f8029254d5862bcd595d6c8778e5b
:clang_getTypeDeclaration: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga0aad74ea93a2f5dea58fd6fc0db8aad4
:clang_getArrayElementType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga718591f4b07d9d4861557a3ed8b29713
:clang_getArraySize: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga91521260817054f153b5f1295056192d

:CXTypeKind: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#gaad39de597b13a18882c21860f92b095a

You can compare whether two ++Type++s represent the same type using the standard
`==` Lua operator. Comparisons with any other type yield *false*.

`Type:name()` -> `string`::

	Binding of {clang_getTypeKindSpelling}[clang_getTypeKindSpelling]. Returns
	one of {CXTypeKind}[`CXTypeKind`] as a string without the `CXType_`
	prefix. `Type` also has `__tostring` set to this method.

`Type:canonical()` -> `Type`::

Binding of {clang_getCanonicalType}[clang_getCanonicalType]. Returns
underlying type with all typedefs removed.
NOTE: Unlike luaclang-parser, LJClang does *not* dispatch to
`clang_getPointeeType()` for pointer types.

XXX: What was the intention of that? Test out stuff...

`Type:pointee()` -> `Type`::

	Binding of {clang_getPointeeType}[clang_getPointeeType]. For pointer type
	returns the type of the pointee.

`Type:isPod()` -> `boolean`::

	Binding of {clang_isPODType}[clang_isPODType]. Returns true if the type is
	a ``Plain Old Data'' type.

`Type:isConst()` -> `boolean`::
`Type:isConstQualified()` -> `boolean`::

	Binding of
	{clang_isConstQualifiedType}[clang_isConstQualifiedType]. Returns true if
	the type has a `const` qualifier.

`Type:declaration()` -> `Cursor`::

	Binding of {clang_getTypeDeclaration}[clang_getTypeDeclaration]. Returns a
	`Cursor` to the declaration of a given type, or *nil*.

`Type:arrayElementType()` -> `Type`::

	Binding of {clang_getArrayElementType}[clang_getArrayElementType].

`Type:arraySize()` -> `Type`::

	Binding of {clang_getArraySize}[clang_getArraySize].


Copyright (C) 2013 Philipp Kutin

(Portions of the documentation copied or adapted from luaclang-parser, Copyright
(C) 2012 Michal Kottman)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.