Class JitTypeModel

java.lang.Object
ghidra.pcode.emu.jit.analysis.JitTypeModel

public class JitTypeModel extends Object
The type analysis for JIT-accelerated emulation.

This implements the Type Assignment phase of the JitCompiler using a very basic "voting" algorithm. The result is an assignment of type to each JitVal. To be clear, at this phase, we're assigning types to variables (and constants) in the use-def graph, not varnodes. Later we do another bit of "voting" to determine the type of each JVM local allocated to a varnode. Perhaps we could be more direct, but in anticipation of future optimizations, we keep this analysis at the per-variable level. This is partly an artifact of exploration before deciding to allocate by varnode instead of by variable.

Types in P-code and the JVM

P-code (and Sleigh) is a relatively type free language. Aside from size, variables have no type; they are just bit vectors. The operators are typed and cast the bits as required. This aligns well with most machine architectures. Registers are just bit vectors, and the instructions interpret them according to some type. In contrast, JVM variables have a type: int, long, float, double, or a reference. Conversions between JVM types must be explicit, so we must attend to certain aspects of p-code types when consuming operands allocated in JVM locals. There are three aspects to consider when translating p-code types to the JVM: behavior, size, and signedness.

Behavior: Integer vs. Float

The JVM has two integral types int and long of 4 and 8 bytes respectively. P-code has one integral type of no specified size. Or rather, it has as many integral types: 1-byte int, 2-byte int, 3-byte int, and so on. We thus describe p-code operands as having a type behavior: integral or floating-point. Note there are two ancillary behaviors any and copy to describe the operands of truly typeless operators, like JitCopyOp.

Size

When paired with a varnode's size, we have enough information to start mapping p-code types to JVM types. For float types, p-code only supports specific sizes defined by IEEE 754: 2-byte half-precision, 4-byte single-precision, 8-byte double-precision, 10-byte extended-precision, 16-byte quadruple-precision, and 32-byte octuple-precision. Some p-code types map precisely to JVM counterparts: The 4- and 8-byte integer types map precisely to the JVM's int and long types. Similarly, the 4- and 8-byte float types map precisely to float and double. TODO: The JIT translator does not currently support integral types greater than 8 bytes (64 bits) in size nor floating-point types other than 4 and 8 bytes (single and double precision) in size.

Signedness

All floating-point types are signed, whether in p-code or in the JVM, so there's little to consider in terms of mapping. Some p-code operators have signed operands, some have unsigned operands, and others have no signedness at all. In contrast, no JVM bytecodes are strictly unsigned. They are either signed or have no signedness. It was a choice of the Java language designers that all variables would be signed, and this is consequence of that choice. In time, "unsigned" operations were introduced in the form of static methods, e.g., Integer.compareUnsigned(int, int) and Long.divideUnsigned(long, long). Note that at the bit level, unsigned multiplication is the same as signed, and so no "unsigned multiply" method was provided. This actually aligns well with p-code in that, for this aspect of signedness, the variables are all the same. Instead the operations apply the type interpretation. Thus, we need not consider signedness when allocating JVM locals.

Conversions and Casts

Conversions between JVM primitive types must be explicit in the emitted bytecode, even if the intent is just to re-cast the bits. This is not the case for p-code. Conversions in p-code need only be explicit when they mutate the actual bits. Consider the following p-code:

 $U00:4 = FLOAT_ADD r0, r1
 r2     = INT_2COMP $U00:4
 

The native translation to bytecode:

 FLOAD  1 # r0
 FLOAD  2 # r1
 FADD
 FSTORE 3 # $U00:4
 LDC    0
 ILOAD  3 # $U00:4
 ISUB
 ISTORE 4 # r2
 

Will cause an error when loading the class. This is because the local variable 3 must be one of int or float, and the bytecode must declare which, so either the FSTORE 3 or the ILOAD 3 will fail the JVM's type checker. To resolve this, we could assign the type float to local variable 3, and change the erroneous ILOAD 3 to:

 FLOAD  3
 INVOKESTATIC Float.floatToRawIntBits(float)
 

At this point, the bit-vector contents of $U00:4 are on the stack, but for all the JVM cares, they are now an int. We must assigned a JVM type to each local we allocate and place bitwise type casts wherever the generated bytecodes would cause type disagreement. We would like to assign JVM types in a way that reduces the number of INVOKESTATIC bytecodes emitted. One could argue that we should instead seek to reduce the number of INVOKESTATIC bytecodes actually executed, but I pray the JVM's JIT compiler can recognize calls to Float.floatToRawIntBits(float) and similar and emit no native code for them, i.e., they ought to have zero run-time cost.

Size conversions cause a similar need for explicit conversions, for two reasons: 1) Any conversion between JVM int and long still requires specific bytecodes. Neither platform supports implicit conversion between float and double. 2) We allocate the smaller JVM integral type to accommodate each p-code integral type, so we must apply masks in some cases to assure values to do not exceed their p-code varnode size. Luckily, p-code also requires explicit conversions between sizes, e.g., using zext. However, we often have to perform temporary conversions in order to meet the type/size requirements of JVM bytecodes.

Consider r2 = INT_MULT r0, r1 where the registers are all 5 bytes. Thus, the registers are allocated as JVM locals of type long. We load r0 and r1 onto the stack, and then we emit an Opcodes.LMUL. Technically, the result is another JVM long, which maps to an 8-byte p-code integer. Thus, we must apply a mask to "convert" the result to a 5-byte p-code integer before storing the result in r2's JVM local.

Type Assignment

Given that only behavior and size require any explicit conversions, we omit signedness from the formal definition of p-code type. It is just the behavior applied to a size, e.g., int3.

We use a fairly straightforward voting algorithm that examines how each variable definition is used. The type of an operand is trivially determined by examining the behavior of each operand, as specified by the p-code opcode; and the size of the input varnode, specified by the specific p-code op instance. For example, the p-code op $U00:4 = FLOAT_ADD r0, r1 has an output operand of float4. Thus, it casts a vote that $U00:4 should be that type. However, the subsequent op r2 = INT_2COMP $U00 casts a vote for int4. We prefer an int when tied, so we assign $U00:4 the type int4.

This become complicated in the face of typeless ops, namely copy and phi. Again, we'd like to reduce the number of casts we have to emit in the bytecode. Consider the op r1 = COPY r0. This should emit a load followed immediately by a store, but The JVM will require both the source and destination locals to have the same type. Otherwise, a cast is necessary. The votes regarding r0 will thus need to incorporate the votes regarding r1 and vice versa.

Our algorithm is a straightforward queued traversal of the use-def graph until convergence. First, we initialize a queue with all values (variables and constants) in the graph and initialize all type assignments to any. We then process each value in the queue until it is empty. A value receives votes from its uses as required by each operand. integer and float behaviors count as 1 vote for that behavior. The any behavior contributes 0 votes. If the behavior is copy, then we know the use is either a copy or phi op, so we fetch its output value. The op casts its vote for the tentative type of that output value. Similar is done for the value's defining op, if applicable. If it's a copy or phi, we start a sub contest where each input/option casts a vote for its tentative type. The defining op's vote is cast according to the winner of the sub contest. Ties favor integer. The final winner is computed and the tentative type assignment is updated. If there are no votes, the tentative assignment is JitTypeBehavior.ANY.

When an update changes the tentative type assignment of a value, then all its neighbors are added back to the queue. Neighbors are those values connected to this one via a copy or phi. When the queue is empty, the tentative type assignments are made final. Any assignment that remains any is treated as if int. TODO: Prove that this algorithm always terminates.

  • Constructor Details

    • JitTypeModel

      public JitTypeModel(JitDataFlowModel dfm)
      Construct the type model
      Parameters:
      dfm - the data flow model whose use-def graph to process
  • Method Details

    • computeNewAssignment

      protected JitTypeBehavior computeNewAssignment(JitVal v)
      Compute the new tentative assignment for the given value

      As discussed in the "voting" section of JitTypeModel, this tallies up the votes among the values's uses and defining op then selects the winner.

      Parameters:
      v - the value
      Returns:
      the new assignment
    • queueNeighbors

      protected void queueNeighbors(JitVal v)
      Re-add the given value's neighbors to the processing queue.

      Neighbors are any values connected to the given one via JitCopyOp or JitPhiOp — or any op with an operand requiring JitTypeBehavior.COPY if additional ones should appear in the future. This is necessary because those ops may change their vote now that this value's tentative type has changed. Note if the value is already in the queue, it need not be added again. Thus, the queue is a SequencedSet.

      Parameters:
      v - the value whose neighbors to re-process
    • analyze

      protected void analyze()
      Perform the analysis

      This queues every value up to be processed at least once and then runs the algorithm to termination. Each value in the queue is removed and a voting contest run to update its type assignment. If the new assignment differs from its old assignment, its neighbors (if any) are re-added to the queue.

    • typeOf

      public JitType typeOf(JitVal v)
      Get the final type assignment for the given value
      Parameters:
      v - the value
      Returns:
      the value's assigned type