Step-by-Step Java Decompilation Example

August 18, 2010 - 13:12

On this page we present a step by step decompilation of the Jasmin code in listing 1 using a stack evaluation method and taking advantage of Java to Java bytecode patterns.

Listing 1

.method public static sum([I)V
    .limit stack 3
    .limit locals 3
a:
    iconst_0
    istore_1
    iconst_0
    istore_2
b:
    iload_2
    aload_0
    arraylength
    if_icmpge c
    iload_1
    aload_0
    iload_2
    iaload
    iadd
    istore_1
    iinc 2 1
    goto b
c:
    iload_1
    ifge d
    getstatic java.lang.System.out Ljava/io/PrintStream;
    ldc "total is less than zero :("
    invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
    goto e
d:
    getstatic java.lang.System.out Ljava/io/PrintStream;
    new java/lang/StringBuilder
    dup
    invokespecial java/lang/StringBuilder/()V
    ldc "total is "
    invokevirtual java/lang/StringBuilder/append(Ljava/lang/String;)Ljava/lang/StringBuilder;
    iload_1
    invokevirtual java/lang/StringBuilder/append(I)Ljava/lang/StringBuilder;
    invokevirtual java/lang/StringBuilder/toString()Ljava/lang/String;
    invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
e:
    return
.end method

Firstly, we can see that the method signature, public static sum([I)V, is similar to Java source: it is public, static, the name of the method is sum and the parameters are shown in parenthesis. However, the syntax for parameters is not the same as Java, and the method return type is at the end of the signature.

In Java bytecode the I denotes an integer type and the square bracket preceding it declares a one dimensional array; notice also, that the parameter does not have a name. The capital V denotes the void return type. So we can deduce that the method signature in Java source will be (we've made up the variable name):

public static void sum(int[] numbers)

The next two lines are instructions for the compiler - the maximum stack height and the number of local variables that this method uses.

The first bytecode instruction, iconst_0, pushes a 0 onto the stack and the next bytecode instruction, istore_1, pops an integer from the stack and stores it in local variable slot 1. Local variable slot 0 contains the parameter array (if this was an instance method local variable slot 0 would contain a reference to the instance of the class i.e. the this variable in Java). This combination can be thought of as assigning the value of 0 to an integer variable in Java; we must also declare the variable if it hasn't already been declared:

int total = 0;

Decompilation Pattern - Integer Variable Assignment

Similarly, the next two bytecode instructions declare another integer variable and assign it the value 0:

int i = 0;

The next three instructions can be considered together: iload_2aload_0 and arraylength. When we're decompiling a program we can use the stack to build up expressions by pushing and popping the variables and expressions, instead of their values. The first instruction pushes the integer in local variable slot 2 onto the stack, the second pushes the parameter array onto the stack. So after these instructions our expression stack is:

numbers
total

 Next, the arraylength instruction pops an array from the stack and pushes it's length. We pop the numbers variable from our expression stack and push the numbers.length expression:

numbers.length
total

 

The next instruction, if_icmpge, is an integer comparison jump instruction; this instruction pops two integers a, b from the stack and jumps to the given label if a >= b. This is not, however, to be decompiled as an if-expression in the Java source; it is, in-fact, the condition in a loop. We know that this is a while loop because the instruction before the target of the jump instruction is a goto instruction; this goto instruction jumps backward, to before the conditional instruction, which indicates that it is a while loop and not an if statement. A do-while loop condition would appear with the goto at the beginning. We must use the inverse of the condition in our while loop because the bytecode condition makes control jump to the end of the loop if the condition is true; however, we want the loop to start at the beginning. Our condition is therefore i < numbers.length.

Decompilation Pattern - While Loop

The next 3 instructions push the values of the 3 variables in our method onto the stack; we push the variables onto our expression stack:

i
numbers
total

 

The next instruction, iaload, pops an integer i and an array reference arr from the stack; it then pushes the value at index i in the array arr onto the stack. We push the expression numbers[i] onto our expression stack:

numbers[i]
total

Decompilation Pattern - Integer Array

Next, the instruction iadd pops two integers from the stack and pushes their sum back onto the stack. We therefore pop our two variables from our expression stack and push the expression total + numbers[i].

The following instruction, istore_1, pops an integer from the stack and stores it in local variable slot 1. We pop an expression from our expression stack and assign it to our variable total.

total = total + numbers[i];

 

The last instruction within the loop is an integer increment instruction iinc 2 1 - it increments the integer value in local variable slot 2 by 1. This is equivalent to i++.

Decompilation Pattern - Integer Increment

So the entire while loop looks like this:

int i = 0;
while(i < numbers.length) {
    total = total + numbers[i];
    i++;
}

 

After the loop, the first instruction is iload_1 which pushes the value in local variable slot 1 onto the stack; we push the variable onto our expression stack:

total

 

Next, we have another condition jump instruction, ifge d, which pops an integer value from the stack and jumps to the specified label if the value is greater than or equal to zero. This time we are not dealing with a while loop; we know this because the goto instruction preceding the jump target is a forward jump, rather than a backward jump. Therefore, the code between the conditional jump and the goto is the if's then; the code from the goto until the goto's jump target is the else. Again, we must invert the conditional giving us the condition total < 0.

Decompilation Pattern - If-Then-Else

The first instruction in the body of the then clause is getstatic which pushes a reference to the specified static field onto the stack - in this case the field java.lang.System.out. The next instruction, ldc, pushes a constant onto the stack. Our expression stack therefore looks like this:

``total is less than zero :("
java.lang.System.out

 

The next instruction invokevirtual invokes the specified instance method, by first popping the correct number of arguments from the stack and lastly popping the object reference, on which the instance method acts, from the stack. The method specified is java/io/PrintStream/println(Ljava/lang/String;)V - that is, in Java, the void println(String s) in the java.io.PrintStream class. The object reference java.lang.System.out, on the bottom of the stack, is an instance of java.io.PrintStream. We therefore pop the string from the stack, followed by the object reference and build our method call for Java:

java.lang.System.out.println("total is less than zero :(");

 

The else clause is similar to the then clause but we build up a longer string using a java.lang.StringBuilder instance. The first instruction, again, pushes java.lang.System.out onto the stack. The next instruction is new which creates an instance of the specified class and pushes a reference to the instance onto the stack. This is followed by dup which duplicates the item at the top of the stack, giving us an expression stack like this:

new java.lang.StringBuilder()
new java.lang.StringBuilder()
java.lang.System.out

 

The invokespecial instruction is then used to call the java.lang.StringBuilder's constructor; it pops the object reference from the top of the stack. The next instruction ldc pushes the constant total is onto the stack:

"total is"
new java.lang.StringBuilder()
java.lang.System.out

 

The invokevirtual instruction then pops the string from the stack, followed by the java.lang.StringBuilder instance and invokes the public StringBuilder java.lang.StringBuilder.append(String s) method; it pushes the result back onto the stack:

new java.lang.StringBuilder().append("total is")
java.lang.System.out

 

Then iload_1 is used to push the integer value in local variable slot 1 onto the stack. In our case, we push the variable total onto the stack and then use invokevirtual to call the append method again:

new java.lang.StringBuilder().append("total is").append(total)
java.lang.System.out

 

Finally, we invoke the public String java.lang.StringBuilder.toString() method and invoke the public void java.io.PrintStream.println(String s) method to print out the result. We end up with an empty stack and the following code:

java.lang.System.out(new java.lang.StringBuilder().append("total is").append(total).toString());

 

Putting all this together gives us the Java code in listing 2. We can tidy the code up slightly by removing java.lang., converting the while loop to a for loop and turning theStringBuilder into standard string concatenation to obtain listing 3.

Listing 2

public static void sum(int[] numbers) {
	int total = 0;
	
	int i = 0;
	while(i < numbers.length) {
		total = total + numbers[i];
		i++;
	}
	
	if(total < 0) {
		java.lang.System.out.println("total is less than zero :(");
	}else{
		java.lang.System.out(new java.lang.StringBuilder().append("total is").append(total).toString());
	}
}

Listing 3

public static void sum(int[] numbers) {
	int total = 0;
	
	for(int i = 0; i < numbers.length; i++) {
		total += numbers[i];
	}
	
	if(total < 0) {
		System.out.println("total is less than zero :(");
	}else{
		System.out.println("total is " + total);
	}
}