A Survey of Software Watermarking by Code Re-Ordering

August 14, 2010 - 22:08

We survey the proposed software watermarking algorithms based on code re-ordering. This family of static watermarks use semantics-preserving transformations to encode a watermark in a permutation of the existing code. We describe the existing techniques and highlight the short-comings of these algorithms, namely that they are highly susceptible to semantics preserving transformations attacks.

It is well known that information can be hiding in the order of lists1 and this can be exploited for watermarking purposes. Given a list of size n we can store log2n! bits as there are n! permutations of the items in n. See my previous blog post for more information.

The full paper is summarised on this page; the full paper contains more details and examples.

Basic Block Re-Ordering

Davidson and Myhrvold2 proposed one of the first software watermarking algorithms which encodes the watermark by basic block re-ordering. The embedding algorithm was described in a patent issued to Microsoft but the extraction algorithm was not discussed. Collberg et al. 3 proposed a method of watermark extraction and implemented the DM algorithm in Sandmark.

Hattanda et al.4 evaluated the DM watermarking algorithm by watermarking several C programs and analysing metrics such as program size and program performance. In their implementation they found that the size increase of a watermarked program was between 9% and 24% while the performance was 86% to 102% of the original program. The 2% performance increase was due to the re-ordered code containing no redundant jump instructions, and that it showed more locality than the original - greater locality increases performance and increasing locality is a common optimisation performed by compilers5.

Hattanda et al. reported a data-rate of approximately 0.2% of program size (in bytes) based on their implementation that used a partial permutation scheme, which only used 6 basic blocks. Collberg et al.'s implementation was not constrained in this way and the data rate is dependent on the number of basic blocks in program methods.

The DM algorithm is highly unstealthy due to the fact that a normal compiler would not linearise the control flow graph as in DM watermarked programs 6. A simple way to discover a DM watermark is to examine the ratio of goto statements to the total number of instructions - programs with the DM watermark show a high ratio compared to an unwatermarked program 3.

The biggest flaw with the DM watermark is that it is highly fragile; that is, it is not resilient to semantics-preserving transformations. For example, any transformation which re-ordered the basic blocks would eliminate the watermark7.

Equation Re-Ordering

Shirali-Shahrez et al.8 proposed a software watermark scheme based the re-ordering of operations in mathematical equations. The idea involves re-ordering symmetric mathematical operations, such as addition, to preserve program semantics.

The data-rate of this watermarking technique is related to the number of safe-swappable operations within a program. It is likely that there will be many, but the watermark will probably need to be split into pieces as most equations will only encode a small number of bits individually.

Zonglu et al.9 proposed a very similar technique using a re-ordering based on operand coefficients. Neither of these techniques cannot be applied to source-code as the compiler itself may re-order the operands and even when applied to bytecode or machine code this technique is highly susceptible to semantics-preserving transformations. Any re-ordering of the operands after watermarking will remove the watermark.

Constant Pool Re-Ordering

Gong et al.10 proposed a watermarking scheme, CPW, for Java based on the ordering of a class file's constant pool.

The constant pool is an array of variable length elements containing every constant used in the Java class 11. Constants are referenced by an index throughout the bytecode. Some entries in the constant pool are direct references to a constant while other entries are references to other members in the constant pool - these are indirect constants. Each constant in the pool is preceded by a tag denoting it's type, for example class, integer, Utf812.

The CPW scheme involves re-ordering the direct constants corresponding to the Wth permutation of the direct constants where W is an integer watermark.

Gong et al. evaluated their algorithm by embedding watermarks into 4000 random class files downloading from the Internet and concluded that it has a good robustness but small capacity. However, the CPW algorithm is far from robust - any further re-ordering of the constant pool would destroy the watermark.

Conclusion

We have presented a survey of software watermarking schemes based on code re-ordering. This family of watermarks are highly susceptible to semantics-preserving transformation attacks7, and can be unstealthy. The DM watermark is highly unstealthy due to addition of large numbers to goto statements inserted to preserve the original control-flow after re-ordering.

Any further re-ordering of a program watermarked with any of the presented algorithms will likely destroy the watermark. In fact, even re-watermarking the program will likely destroy the watermark.

Academic research continues in this area, with the latest paper published last year9. We believe that further research should instead focus on dynamic software watermarking techniques which, in theory, should be resilient to semantics-preserving transformations; thus providing a robust technique for the protection of software.


Download the full paper which contains more details and examples.

AttachmentSize
PDF icon Draft Paper260.5 KB