Class JitTypeModel
This implements the Type Assignment phase of the JitCompiler using a very basic "voting"
algorithm. The result is an assignment of type to each JitVal. To be clear, at this
phase, we're assigning types to variables (and constants) in the use-def graph, not varnodes.
Later we do another bit of "voting" to determine the type of each JVM local allocated to a
varnode. Perhaps we could be more direct, but in anticipation of future optimizations, we keep
this analysis at the per-variable level. This is partly an artifact of exploration before
deciding to allocate by varnode instead of by variable.
Types in P-code and the JVM
P-code (and Sleigh) is a relatively type free language. Aside from size, variables have no type;
they are just bit vectors. The operators are typed and cast the bits as required. This aligns
well with most machine architectures. Registers are just bit vectors, and the instructions
interpret them according to some type. In contrast, JVM variables have a type: int,
long, float, double, or a reference. Conversions between JVM types must
be explicit, so we must attend to certain aspects of p-code types when consuming operands
allocated in JVM locals. There are three aspects to consider when translating p-code types to the
JVM: behavior, size, and signedness.
Behavior: Integer vs. Float
The JVM has two integral types int and long of 4 and 8 bytes respectively. P-code
has one integral type of no specified size. Or rather, it has as many integral types: 1-byte int,
2-byte int, 3-byte int, and so on. We thus describe p-code operands as having a type
behavior: integral or floating-point. Note there are
two ancillary behaviors any and copy to describe the operands of truly typeless
operators, like JitCopyOp.
Size
When paired with a varnode's size, we have enough information to start mapping p-code types to
JVM types. For float types, p-code only supports specific sizes defined by IEEE 754: 2-byte
half-precision, 4-byte single-precision, 8-byte double-precision, 10-byte extended-precision,
16-byte quadruple-precision, and 32-byte octuple-precision. Some p-code types map precisely to
JVM counterparts: The 4- and 8-byte integer types map precisely to the JVM's int and
long types. Similarly, the 4- and 8-byte float types map precisely to float and
double. TODO: The JIT translator does not currently support integral types greater
than 8 bytes (64 bits) in size nor floating-point types other than 4 and 8 bytes (single and
double precision) in size.
Signedness
All floating-point types are signed, whether in p-code or in the JVM, so there's little to
consider in terms of mapping. Some p-code operators have signed operands, some have unsigned
operands, and others have no signedness at all. In contrast, no JVM bytecodes are strictly
unsigned. They are either signed or have no signedness. It was a choice of the Java language
designers that all variables would be signed, and this is consequence of that choice. In time,
"unsigned" operations were introduced in the form of static methods, e.g.,
Integer.compareUnsigned(int, int) and Long.divideUnsigned(long, long). Note that
at the bit level, unsigned multiplication is the same as signed, and so no "unsigned multiply"
method was provided. This actually aligns well with p-code in that, for this aspect of
signedness, the variables are all the same. Instead the operations apply the type interpretation.
Thus, we need not consider signedness when allocating JVM locals.
Conversions and Casts
Conversions between JVM primitive types must be explicit in the emitted bytecode, even if the intent is just to re-cast the bits. This is not the case for p-code. Conversions in p-code need only be explicit when they mutate the actual bits. Consider the following p-code:
$U00:4 = FLOAT_ADD r0, r1 r2 = INT_2COMP $U00:4
The native translation to bytecode:
FLOAD 1 # r0 FLOAD 2 # r1 FADD FSTORE 3 # $U00:4 LDC 0 ILOAD 3 # $U00:4 ISUB ISTORE 4 # r2
Will cause an error when loading the class. This is because the local variable 3 must be one of
int or float, and the bytecode must declare which, so either the FSTORE 3
or the ILOAD 3 will fail the JVM's type checker. To resolve this, we could assign the
type float to local variable 3, and change the erroneous ILOAD 3 to:
FLOAD 3
INVOKESTATIC Float.floatToRawIntBits(float)
At this point, the bit-vector contents of $U00:4 are on the stack, but for all the JVM
cares, they are now an int. We must assigned a JVM type to each local we allocate and
place bitwise type casts wherever the generated bytecodes would cause type disagreement. We would
like to assign JVM types in a way that reduces the number of INVOKESTATIC bytecodes
emitted. One could argue that we should instead seek to reduce the number of INVOKESTATIC
bytecodes actually executed, but I pray the JVM's JIT compiler can recognize calls to
Float.floatToRawIntBits(float) and similar and emit no native code for them, i.e., they
ought to have zero run-time cost.
Size conversions cause a similar need for explicit conversions, for two reasons: 1) Any
conversion between JVM int and long still requires specific bytecodes. Neither
platform supports implicit conversion between float and double. 2) We allocate
the smaller JVM integral type to accommodate each p-code integral type, so we must apply masks in
some cases to assure values to do not exceed their p-code varnode size. Luckily, p-code also
requires explicit conversions between sizes, e.g., using zext. However,
we often have to perform temporary conversions in order to meet the type/size requirements of JVM
bytecodes.
Consider r2 = INT_MULT r0, r1 where the registers are all 5 bytes. Thus, the registers
are allocated as JVM locals of type long. We load r0 and r1 onto the
stack, and then we emit an Opcodes.LMUL. Technically, the result is another JVM
long, which maps to an 8-byte p-code integer. Thus, we must apply a mask to "convert" the
result to a 5-byte p-code integer before storing the result in r2's JVM local.
Type Assignment
Given that only behavior and size require any explicit conversions, we omit signedness from the
formal definition of p-code type. It is just the behavior applied to a size,
e.g., int3.
We use a fairly straightforward voting algorithm that examines how each variable definition is
used. The type of an operand is trivially determined by examining the behavior of each operand,
as specified by the p-code opcode; and the size of the input varnode, specified by the specific
p-code op instance. For example, the p-code op $U00:4 = FLOAT_ADD r0, r1 has an output
operand of float4. Thus, it casts a vote that $U00:4 should be
that type. However, the subsequent op r2 = INT_2COMP $U00 casts a vote for
int4. We prefer an int when tied, so we assign $U00:4 the
type int4.
This become complicated in the face of typeless ops, namely copy and
phi. Again, we'd like to reduce the number of casts we have to emit in the
bytecode. Consider the op r1 = COPY r0. This should emit a load followed immediately by a
store, but The JVM will require both the source and destination locals to have the same type.
Otherwise, a cast is necessary. The votes regarding r0 will thus need to incorporate the
votes regarding r1 and vice versa.
Our algorithm is a straightforward queued traversal of the use-def graph until convergence.
First, we initialize a queue with all values (variables and constants) in the graph and
initialize all type assignments to any. We then process each value in
the queue until it is empty. A value receives votes from its uses as required by each operand.
integer and float behaviors count as 1
vote for that behavior. The any behavior contributes 0 votes. If the
behavior is copy, then we know the use is either a copy or phi op, so we fetch its output value. The op casts its vote for the
tentative type of that output value. Similar is done for the value's defining op, if applicable.
If it's a copy or phi, we start a sub contest where each input/option casts a vote for its
tentative type. The defining op's vote is cast according to the winner of the sub contest. Ties
favor integer. The final winner is computed and the tentative
type assignment is updated. If there are no votes, the tentative assignment is
JitTypeBehavior.ANY.
When an update changes the tentative type assignment of a value, then all its neighbors are added
back to the queue. Neighbors are those values connected to this one via a copy or phi. When the
queue is empty, the tentative type assignments are made final. Any assignment that remains
any is treated as if int.
TODO: Prove that this algorithm always terminates.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected static final recordA contest to determine a type assignment -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidanalyze()Perform the analysisprotected JitTypeBehaviorCompute the new tentative assignment for the given valueprotected voidRe-add the given value's neighbors to the processing queue.Get the final type assignment for the given value
-
Constructor Details
-
JitTypeModel
Construct the type model- Parameters:
dfm- the data flow model whose use-def graph to process
-
-
Method Details
-
computeNewAssignment
Compute the new tentative assignment for the given valueAs discussed in the "voting" section of
JitTypeModel, this tallies up the votes among the values's uses and defining op then selects the winner.- Parameters:
v- the value- Returns:
- the new assignment
-
queueNeighbors
Re-add the given value's neighbors to the processing queue.Neighbors are any values connected to the given one via
JitCopyOporJitPhiOp— or any op with an operand requiringJitTypeBehavior.COPYif additional ones should appear in the future. This is necessary because those ops may change their vote now that this value's tentative type has changed. Note if the value is already in the queue, it need not be added again. Thus, the queue is aSequencedSet.- Parameters:
v- the value whose neighbors to re-process
-
analyze
protected void analyze()Perform the analysisThis queues every value up to be processed at least once and then runs the algorithm to termination. Each value in the queue is removed and a voting contest run to update its type assignment. If the new assignment differs from its old assignment, its neighbors (if any) are re-added to the queue.
-
typeOf
Get the final type assignment for the given value- Parameters:
v- the value- Returns:
- the value's assigned type
-