What is decompilation?

July 07, 2010 - 23:38

Compilation is the act of transforming a high-level language, into a low-level language such as machine code or bytecode. Decompilation is the reverse. It is the act of transforming a low-level language into a high-level language. Java source code is compiled into an intermediate language known as Java bytecode.

A Java virtual machine executes Java bytecode in class files conforming to the class file specification which is part of the Java Virtual Machine Specification and updated in JSR202. The open specification allows tools other than Sun’s Java compiler to generate and/or manipulate Java bytecode. Java bytecode can be generated in three ways:

  1. from a Java source program using a Java compiler (such as Sun’s javac),
  2. using a language other than Java to Java Bytecode compiler (such as JGNAT) or
  3. by writing a class file by hand.

Java bytecode can also be manipulated by tools such as obfuscators and optimisers which perform semantics-preserving transformations on bytecode contained within Java class file.

Figure I shows the Java bytecode cycle from generation to decompilation to Java source. Java bytecode retains type information about fields, method returns and parameters but it does not, for example, contain type information for local variables. The type information in the Java class file renders the task of decompilation of bytecode easier than decompilation of machine code. Decompiling Java bytecode, thus, requires analysis of most local variable types, flattening of stackbased instructions and structuring of loops and conditionals.

The Java Bytecode Cycle

The task of bytecode decompilation, however, is much harder than compilation. We show that often decompilers cannot fully perform their intended function. Decompilation has many applications including legitimate uses, such as the recovery of lost source code for a crucial application and non-legitimate uses such as reverseengineering a proprietary application. Consider the case in which a company has lost the source code for their application and hence to continue development on the software they require recovery of source code from Java class files.

The company must decompile the Java class files and attempt to recover Java source equivalent to the originally lost source. In this case, in comparison to an illegitimate use, it is likely that the company knows more about how the Java class files were generated. Knowledge of how class files are generated provides information useful in the recovery of the original source as a decompiler can be optimised for the compiler used.

If the purpose of decompilation is to simply understand a program, the syntactical correctness of a complete decompiled program may not be a high priority. Correct portions of an incorrect program could help in the understanding of a program, in contrast to the case of source recovery where correct source is needed.