forked from mirror/qemu
				
			
			You cannot select more than 25 topics
			Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
		
		
		
		
		
			
		
			
				
	
	
		
			315 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			Plaintext
		
	
			
		
		
	
	
			315 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			Plaintext
		
	
Hexagon is Qualcomm's very long instruction word (VLIW) digital signal
 | 
						|
processor(DSP).  We also support Hexagon Vector eXtensions (HVX).  HVX
 | 
						|
is a wide vector coprocessor designed for high performance computer vision,
 | 
						|
image processing, machine learning, and other workloads.
 | 
						|
 | 
						|
The following versions of the Hexagon core are supported
 | 
						|
    Scalar core: v67
 | 
						|
    https://developer.qualcomm.com/downloads/qualcomm-hexagon-v67-programmer-s-reference-manual
 | 
						|
    HVX extension: v66
 | 
						|
    https://developer.qualcomm.com/downloads/qualcomm-hexagon-v66-hvx-programmer-s-reference-manual
 | 
						|
 | 
						|
We presented an overview of the project at the 2019 KVM Forum.
 | 
						|
    https://kvmforum2019.sched.com/event/Tmwc/qemu-hexagon-automatic-translation-of-the-isa-manual-pseudcode-to-tiny-code-instructions-of-a-vliw-architecture-niccolo-izzo-revng-taylor-simpson-qualcomm-innovation-center
 | 
						|
 | 
						|
*** Tour of the code ***
 | 
						|
 | 
						|
The qemu-hexagon implementation is a combination of qemu and the Hexagon
 | 
						|
architecture library (aka archlib).  The three primary directories with
 | 
						|
Hexagon-specific code are
 | 
						|
 | 
						|
    qemu/target/hexagon
 | 
						|
        This has all the instruction and packet semantics
 | 
						|
    qemu/target/hexagon/imported
 | 
						|
        These files are imported with very little modification from archlib
 | 
						|
        *.idef                  Instruction semantics definition
 | 
						|
        macros.def              Mapping of macros to instruction attributes
 | 
						|
        encode*.def             Encoding patterns for each instruction
 | 
						|
        iclass.def              Instruction class definitions used to determine
 | 
						|
                                legal VLIW slots for each instruction
 | 
						|
    qemu/linux-user/hexagon
 | 
						|
        Helpers for loading the ELF file and making Linux system calls,
 | 
						|
        signals, etc
 | 
						|
 | 
						|
We start with scripts that generate a bunch of include files.  This
 | 
						|
is a two step process.  The first step is to use the C preprocessor to expand
 | 
						|
macros inside the architecture definition files.  This is done in
 | 
						|
target/hexagon/gen_semantics.c.  This step produces
 | 
						|
    <BUILD_DIR>/target/hexagon/semantics_generated.pyinc.
 | 
						|
That file is consumed by the following python scripts to produce the indicated
 | 
						|
header files in <BUILD_DIR>/target/hexagon
 | 
						|
        gen_opcodes_def.py              -> opcodes_def_generated.h.inc
 | 
						|
        gen_op_regs.py                  -> op_regs_generated.h.inc
 | 
						|
        gen_printinsn.py                -> printinsn_generated.h.inc
 | 
						|
        gen_op_attribs.py               -> op_attribs_generated.h.inc
 | 
						|
        gen_helper_protos.py            -> helper_protos_generated.h.inc
 | 
						|
        gen_shortcode.py                -> shortcode_generated.h.inc
 | 
						|
        gen_tcg_funcs.py                -> tcg_funcs_generated.c.inc
 | 
						|
        gen_tcg_func_table.py           -> tcg_func_table_generated.c.inc
 | 
						|
        gen_helper_funcs.py             -> helper_funcs_generated.c.inc
 | 
						|
 | 
						|
Qemu helper functions have 3 parts
 | 
						|
    DEF_HELPER declaration indicates the signature of the helper
 | 
						|
    gen_helper_<NAME> will generate a TCG call to the helper function
 | 
						|
    The helper implementation
 | 
						|
 | 
						|
Here's an example of the A2_add instruction.
 | 
						|
    Instruction tag        A2_add
 | 
						|
    Assembly syntax        "Rd32=add(Rs32,Rt32)"
 | 
						|
    Instruction semantics  "{ RdV=RsV+RtV;}"
 | 
						|
 | 
						|
By convention, the operands are identified by letter
 | 
						|
    RdV is the destination register
 | 
						|
    RsV, RtV are source registers
 | 
						|
 | 
						|
The generator uses the operand naming conventions (see large comment in
 | 
						|
hex_common.py) to determine the signature of the helper function.  Here are the
 | 
						|
results for A2_add
 | 
						|
 | 
						|
helper_protos_generated.h.inc
 | 
						|
    DEF_HELPER_3(A2_add, s32, env, s32, s32)
 | 
						|
 | 
						|
tcg_funcs_generated.c.inc
 | 
						|
    static void generate_A2_add(
 | 
						|
                    CPUHexagonState *env,
 | 
						|
                    DisasContext *ctx,
 | 
						|
                    Insn *insn,
 | 
						|
                    Packet *pkt)
 | 
						|
    {
 | 
						|
        TCGv RdV = tcg_temp_local_new();
 | 
						|
        const int RdN = insn->regno[0];
 | 
						|
        TCGv RsV = hex_gpr[insn->regno[1]];
 | 
						|
        TCGv RtV = hex_gpr[insn->regno[2]];
 | 
						|
        gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
 | 
						|
        gen_log_reg_write(RdN, RdV);
 | 
						|
        ctx_log_reg_write(ctx, RdN);
 | 
						|
        tcg_temp_free(RdV);
 | 
						|
    }
 | 
						|
 | 
						|
helper_funcs_generated.c.inc
 | 
						|
    int32_t HELPER(A2_add)(CPUHexagonState *env, int32_t RsV, int32_t RtV)
 | 
						|
    {
 | 
						|
        uint32_t slot __attribute__((unused)) = 4;
 | 
						|
        int32_t RdV = 0;
 | 
						|
        { RdV=RsV+RtV;}
 | 
						|
        return RdV;
 | 
						|
    }
 | 
						|
 | 
						|
Note that generate_A2_add updates the disassembly context to be processed
 | 
						|
when the packet commits (see "Packet Semantics" below).
 | 
						|
 | 
						|
The generator checks for fGEN_TCG_<tag> macro.  This allows us to generate
 | 
						|
TCG code instead of a call to the helper.  If defined, the macro takes 1
 | 
						|
argument.
 | 
						|
    C semantics (aka short code)
 | 
						|
 | 
						|
This allows the code generator to override the auto-generated code.  In some
 | 
						|
cases this is necessary for correct execution.  We can also override for
 | 
						|
faster emulation.  For example, calling a helper for add is more expensive
 | 
						|
than generating a TCG add operation.
 | 
						|
 | 
						|
The gen_tcg.h file has any overrides. For example, we could write
 | 
						|
    #define fGEN_TCG_A2_add(GENHLPR, SHORTCODE) \
 | 
						|
        tcg_gen_add_tl(RdV, RsV, RtV)
 | 
						|
 | 
						|
The instruction semantics C code relies heavily on macros.  In cases where the
 | 
						|
C semantics are specified only with macros, we can override the default with
 | 
						|
the short semantics option and #define the macros to generate TCG code.  One
 | 
						|
example is L2_loadw_locked:
 | 
						|
    Instruction tag        L2_loadw_locked
 | 
						|
    Assembly syntax        "Rd32=memw_locked(Rs32)"
 | 
						|
    Instruction semantics  "{ fEA_REG(RsV); fLOAD_LOCKED(1,4,u,EA,RdV) }"
 | 
						|
 | 
						|
In gen_tcg.h, we use the shortcode
 | 
						|
#define fGEN_TCG_L2_loadw_locked(SHORTCODE) \
 | 
						|
    SHORTCODE
 | 
						|
 | 
						|
There are also cases where we brute force the TCG code generation.
 | 
						|
Instructions with multiple definitions are examples.  These require special
 | 
						|
handling because qemu helpers can only return a single value.
 | 
						|
 | 
						|
For HVX vectors, the generator behaves slightly differently.  The wide vectors
 | 
						|
won't fit in a TCGv or TCGv_i64, so we pass TCGv_ptr variables to pass the
 | 
						|
address to helper functions.  Here's an example for an HVX vector-add-word
 | 
						|
istruction.
 | 
						|
    static void generate_V6_vaddw(
 | 
						|
                    CPUHexagonState *env,
 | 
						|
                    DisasContext *ctx,
 | 
						|
                    Insn *insn,
 | 
						|
                    Packet *pkt)
 | 
						|
    {
 | 
						|
        const int VdN = insn->regno[0];
 | 
						|
        const intptr_t VdV_off =
 | 
						|
            ctx_future_vreg_off(ctx, VdN, 1, true);
 | 
						|
        TCGv_ptr VdV = tcg_temp_local_new_ptr();
 | 
						|
        tcg_gen_addi_ptr(VdV, cpu_env, VdV_off);
 | 
						|
        const int VuN = insn->regno[1];
 | 
						|
        const intptr_t VuV_off =
 | 
						|
            vreg_src_off(ctx, VuN);
 | 
						|
        TCGv_ptr VuV = tcg_temp_local_new_ptr();
 | 
						|
        const int VvN = insn->regno[2];
 | 
						|
        const intptr_t VvV_off =
 | 
						|
            vreg_src_off(ctx, VvN);
 | 
						|
        TCGv_ptr VvV = tcg_temp_local_new_ptr();
 | 
						|
        tcg_gen_addi_ptr(VuV, cpu_env, VuV_off);
 | 
						|
        tcg_gen_addi_ptr(VvV, cpu_env, VvV_off);
 | 
						|
        TCGv slot = tcg_constant_tl(insn->slot);
 | 
						|
        gen_helper_V6_vaddw(cpu_env, VdV, VuV, VvV, slot);
 | 
						|
        tcg_temp_free(slot);
 | 
						|
        gen_log_vreg_write(ctx, VdV_off, VdN, EXT_DFL, insn->slot, false);
 | 
						|
        ctx_log_vreg_write(ctx, VdN, EXT_DFL, false);
 | 
						|
        tcg_temp_free_ptr(VdV);
 | 
						|
        tcg_temp_free_ptr(VuV);
 | 
						|
        tcg_temp_free_ptr(VvV);
 | 
						|
    }
 | 
						|
 | 
						|
Notice that we also generate a variable named <operand>_off for each operand of
 | 
						|
the instruction.  This makes it easy to override the instruction semantics with
 | 
						|
functions from tcg-op-gvec.h.  Here's the override for this instruction.
 | 
						|
    #define fGEN_TCG_V6_vaddw(SHORTCODE) \
 | 
						|
        tcg_gen_gvec_add(MO_32, VdV_off, VuV_off, VvV_off, \
 | 
						|
                         sizeof(MMVector), sizeof(MMVector))
 | 
						|
 | 
						|
Finally, we notice that the override doesn't use the TCGv_ptr variables, so
 | 
						|
we don't generate them when an override is present.  Here is what we generate
 | 
						|
when the override is present.
 | 
						|
    static void generate_V6_vaddw(
 | 
						|
                    CPUHexagonState *env,
 | 
						|
                    DisasContext *ctx,
 | 
						|
                    Insn *insn,
 | 
						|
                    Packet *pkt)
 | 
						|
    {
 | 
						|
        const int VdN = insn->regno[0];
 | 
						|
        const intptr_t VdV_off =
 | 
						|
            ctx_future_vreg_off(ctx, VdN, 1, true);
 | 
						|
        const int VuN = insn->regno[1];
 | 
						|
        const intptr_t VuV_off =
 | 
						|
            vreg_src_off(ctx, VuN);
 | 
						|
        const int VvN = insn->regno[2];
 | 
						|
        const intptr_t VvV_off =
 | 
						|
            vreg_src_off(ctx, VvN);
 | 
						|
        fGEN_TCG_V6_vaddw({ fHIDE(int i;) fVFOREACH(32, i) { VdV.w[i] = VuV.w[i] + VvV.w[i] ; } });
 | 
						|
        gen_log_vreg_write(ctx, VdV_off, VdN, EXT_DFL, insn->slot, false);
 | 
						|
        ctx_log_vreg_write(ctx, VdN, EXT_DFL, false);
 | 
						|
    }
 | 
						|
 | 
						|
In addition to instruction semantics, we use a generator to create the decode
 | 
						|
tree.  This generation is also a two step process.  The first step is to run
 | 
						|
target/hexagon/gen_dectree_import.c to produce
 | 
						|
    <BUILD_DIR>/target/hexagon/iset.py
 | 
						|
This file is imported by target/hexagon/dectree.py to produce
 | 
						|
    <BUILD_DIR>/target/hexagon/dectree_generated.h.inc
 | 
						|
 | 
						|
*** Key Files ***
 | 
						|
 | 
						|
cpu.h
 | 
						|
 | 
						|
This file contains the definition of the CPUHexagonState struct.  It is the
 | 
						|
runtime information for each thread and contains stuff like the GPR and
 | 
						|
predicate registers.
 | 
						|
 | 
						|
macros.h
 | 
						|
mmvec/macros.h
 | 
						|
 | 
						|
The Hexagon arch lib relies heavily on macros for the instruction semantics.
 | 
						|
This is a great advantage for qemu because we can override them for different
 | 
						|
purposes.  You will also notice there are sometimes two definitions of a macro.
 | 
						|
The QEMU_GENERATE variable determines whether we want the macro to generate TCG
 | 
						|
code.  If QEMU_GENERATE is not defined, we want the macro to generate vanilla
 | 
						|
C code that will work in the helper implementation.
 | 
						|
 | 
						|
translate.c
 | 
						|
 | 
						|
The functions in this file generate TCG code for a translation block.  Some
 | 
						|
important functions in this file are
 | 
						|
 | 
						|
    gen_start_packet - initialize the data structures for packet semantics
 | 
						|
    gen_commit_packet - commit the register writes, stores, etc for a packet
 | 
						|
    decode_and_translate_packet - disassemble a packet and generate code
 | 
						|
 | 
						|
genptr.c
 | 
						|
gen_tcg.h
 | 
						|
 | 
						|
These files create a function for each instruction.  It is mostly composed of
 | 
						|
fGEN_TCG_<tag> definitions followed by including tcg_funcs_generated.c.inc.
 | 
						|
 | 
						|
op_helper.c
 | 
						|
 | 
						|
This file contains the implementations of all the helpers.  There are a few
 | 
						|
general purpose helpers, but most of them are generated by including
 | 
						|
helper_funcs_generated.c.inc.  There are also several helpers used for debugging.
 | 
						|
 | 
						|
 | 
						|
*** Packet Semantics ***
 | 
						|
 | 
						|
VLIW packet semantics differ from serial semantics in that all input operands
 | 
						|
are read, then the operations are performed, then all the results are written.
 | 
						|
For exmaple, this packet performs a swap of registers r0 and r1
 | 
						|
    { r0 = r1; r1 = r0 }
 | 
						|
Note that the result is different if the instructions are executed serially.
 | 
						|
 | 
						|
Packet semantics dictate that we defer any changes of state until the entire
 | 
						|
packet is committed.  We record the results of each instruction in a side data
 | 
						|
structure, and update the visible processor state when we commit the packet.
 | 
						|
 | 
						|
The data structures are divided between the runtime state and the translation
 | 
						|
context.
 | 
						|
 | 
						|
During the TCG generation (see translate.[ch]), we use the DisasContext to
 | 
						|
track what needs to be done during packet commit.  Here are the relevant
 | 
						|
fields
 | 
						|
 | 
						|
    reg_log            list of registers written
 | 
						|
    reg_log_idx        index into ctx_reg_log
 | 
						|
    pred_log           list of predicates written
 | 
						|
    pred_log_idx       index into ctx_pred_log
 | 
						|
    store_width        width of stores (indexed by slot)
 | 
						|
 | 
						|
During runtime, the following fields in CPUHexagonState (see cpu.h) are used
 | 
						|
 | 
						|
    new_value             new value of a given register
 | 
						|
    reg_written           boolean indicating if register was written
 | 
						|
    new_pred_value        new value of a predicate register
 | 
						|
    pred_written          boolean indicating if predicate was written
 | 
						|
    mem_log_stores        record of the stores (indexed by slot)
 | 
						|
 | 
						|
For Hexagon Vector eXtensions (HVX), the following fields are used
 | 
						|
    VRegs                       Vector registers
 | 
						|
    future_VRegs                Registers to be stored during packet commit
 | 
						|
    tmp_VRegs                   Temporary registers *not* stored during commit
 | 
						|
    VRegs_updated               Mask of predicated vector writes
 | 
						|
    QRegs                       Q (vector predicate) registers
 | 
						|
    future_QRegs                Registers to be stored during packet commit
 | 
						|
    QRegs_updated               Mask of predicated vector writes
 | 
						|
 | 
						|
*** Debugging ***
 | 
						|
 | 
						|
You can turn on a lot of debugging by changing the HEX_DEBUG macro to 1 in
 | 
						|
internal.h.  This will stream a lot of information as it generates TCG and
 | 
						|
executes the code.
 | 
						|
 | 
						|
To track down nasty issues with Hexagon->TCG generation, we compare the
 | 
						|
execution results with actual hardware running on a Hexagon Linux target.
 | 
						|
Run qemu with the "-d cpu" option.  Then, we can diff the results and figure
 | 
						|
out where qemu and hardware behave differently.
 | 
						|
 | 
						|
The stacks are located at different locations.  We handle this by changing
 | 
						|
env->stack_adjust in translate.c.  First, set this to zero and run qemu.
 | 
						|
Then, change env->stack_adjust to the difference between the two stack
 | 
						|
locations.  Then rebuild qemu and run again. That will produce a very
 | 
						|
clean diff.
 | 
						|
 | 
						|
Here are some handy places to set breakpoints
 | 
						|
 | 
						|
    At the call to gen_start_packet for a given PC (note that the line number
 | 
						|
        might change in the future)
 | 
						|
        br translate.c:602 if ctx->base.pc_next == 0xdeadbeef
 | 
						|
    The helper function for each instruction is named helper_<TAG>, so here's
 | 
						|
        an example that will set a breakpoint at the start
 | 
						|
        br helper_A2_add
 | 
						|
    If you have the HEX_DEBUG macro set, the following will be useful
 | 
						|
        At the start of execution of a packet for a given PC
 | 
						|
            br helper_debug_start_packet if env->gpr[41] == 0xdeadbeef
 | 
						|
        At the end of execution of a packet for a given PC
 | 
						|
            br helper_debug_commit_end if env->this_PC == 0xdeadbeef
 |