# A Survey of Software Watermarking by Register Allocation (for Java Bytecode)

July 31, 2010 - 13:37

In this paper, we examine register allocation based software watermarking algorithms; these algorithms are constraint-based static software watermarking techniques. Figure 1 shows the evolution of this family of algorithms on which we report previous findings, describe some recent additions (including a correction to a published algorithm) and conclude by suggesting a direction for future work.

The full paper is summarised on this page. The full paper includes examples, further detail and diagrams.

This family of static watermarks embed the watermark in a solution to a graph colouring problem; the graph colouring is then used to allocate registers. In graph theory, the simplest form of graph colouring is a way of colouring the vertices of a graph such that no two adjacent vertices share the same colour. Register allocation is the process of allocating program variables to a finite number of CPU registers and a compiler commonly uses graph colouring to obtain the best allocation.

## The QP Algorithm

The QP algorithm 1 is a constraint-based watermarking algorithm based on the concept of graph colouring. In the QP algorithm edges are added to the graph based on the value of the watermark. When edges are added to an interference graph the vertices that become connected must be re-coloured - and they cannot be assigned the same registers. In other words, we add a new constraint to the problem (the extra edges) and when we compute the graph colouring we have a solution which solves the graph colour problem and one which also includes the watermark.

The QP algorithm was proprosed to embed watermarks in any graph colouring solution and can be applied to graph colouring for register allocation to embed watermarks in software.

The first step in the QP algorithm is to convert the message into binary, for example convert a string into binary by using ASCII codes. The next step is to add additional edges to the interference graph, in such a way that the extra edges encode the watermark. The QP algorithm requires that the vertices of the graph are indexed and relies on the ordering of such indices for embedding and extraction. The originally published QP algorithm contained a minor problem which assumed that every vertex in an arbitrary graph could contain one bit of information; Zhu et al. proposed a clarified version of the algorithm 2.

Myles et al.3 implemented the QP algorithm using register allocation; they found that a stricter embedding criteria are required due to the unpredictability of the colouring of vertices and the fact that one vertex can be used multiple times. Another, major, flaw in the QP algorithm is that it is possible to insert two different messages into an interference graph and obtained the same watermark graph 2. The QP algorithm, therefore, is not extractable4 5.

Due to these flaws in the QP algorithm, Myles et al.3 proposed an improvement which they call the QPS algorithm.

## The QPS Algorithm

The key difference between the QP and the QPS algorithms is the selection of vertices. In the QPS algorithm triples of vertices are selected such that they are isolated units that will not effect other vertices in the graph. As watermark bits can only be inserted where there are coloured triples the data-rate of this algorithm is far lower than the QP algorithm.

Zhu et al.4 provide clarified versions of the QPS algorithms which eliminate some ambiguities in the originals.

Myles et al. implemented the QPS algorithm in Sandmark6, an open-source tool for the study of software protection algorithms. They performed a variety of empirical tests to evaluate their algorithm's overall effectiveness, examining five properties: credibility, data-rate, stealthiness, part protection and resilience. The results showed that the QPS algorithm has a very low data-rate and is susceptible to a variety of simple attacks, such as obfuscations. However, they also conclude that the QPS algorithm is quite stealthy and is extremely credible. In other words, the watermarks are hard to detect by an attacker whilst readily detectable by the watermark author.

## The QPI Algorithm

The QPS algorithm is an improvement on the QP algorithm, in terms of extractability, however the QPS algorithm has a very low data-rate. Zhu et al.4 proposed a further improvement which they call the QPI algorithm. The QPI algorithm changes the definition of the nearest vertices and the original QP algorithm used cyclic mod n order for numbers 1,2,...,n, while QPI uses 1 < 2 < ... n.

The QPI extraction algorithm requires the original interference graph and the watermarked graph in order to extract the watermark message. We find the candidate vertices from the original graph, then compare the colours in the watermarked graph.

The QPI algorithm is an improvement on the QPS algorithm with an increased data-rate and it has been shown that QPI algorithm is extractable, unlike the original QP algorithm 4.

## The Colour Change Algorithm

The Colour Change (CC) algorithm 7 is another improvement on the QPS algorithm. In the CC algorithm the colouring function is modified to embed a message, rather than modifying the interference graph; the modification only occurs for 1 bits but not 0 bits. The data-rate of the CC algorithm is higher than that of QPS and QPI because each vertex in the interference graph can store 1 watermark bit.

The first step in the CC algorithm is to colour the interference graph to obtain the colouring function. We then adapt the colouring function to produce a new colouring function which includes the embedded watermark.

Lee and Kaneko 8 experimentally evaluated their CC algorithm and found that, on average, it takes 0.75 extra colours to embed a message in an interefence graph. They suggest using the CC algorithm for programs which contain many variables, as the data-rate is equal to the number of vertices in the interference graph. They introduce a second algorithm - Color Permutation - for use in programs which require a large number of registers.

## The Colour Permutation Algorithm

The Colour Permutation (CP) algorithm 7 uses a similar idea to the CC algorithm, as they both change the colouring function for an intereference graph to encode the watermark bits. The CP algorithm converts the watermark bit string into a natural number $M$, and then chooses the Mth permutation of the lexicographically ordered colours to replace the original colour. The algorithm uses the relationship between the factorial number system and lexicographically ordered permutations to obtain the Mth permutation.

Lee and Kaneko8 experimentally evaluated their CP algorithm and found that the embedded watermark is stealthy and has a high credibility but low-resilience to semantics-preserving attacks. The CP algorithm has the advantage that it will never introduce a new register; however, this means that the program must already use a large amount of registers in order to embed a large message. The CC and CP algorithms cannot embed a watermark consisting of all zeros.

## The Selected Colour Change Algorithm

Li et al.9 proposed a more efficient algorithm based on the CC algorithm which they call Selected Colour Change (SCC). The SCC algorithm is very similar to the CC algorithm, however, the efficiency increase is obtained by only changing the colours of either the 1 or the 0 bits, but not both. If the occurance of 0 bits is higher than 1 bits in the watermark then 0 bits are changed; otherwise 1 bits are changed. The choice of which bits are changed is stored in a virtual vertex v-1 to allow the embedding algorithm to extract the watermark.

Li et al. evaluate their SCC algorithm and compare the results to the QP, QPS and CC algorithms. They conclude that their SCC algorithm is equivilent to the CC algorithm in many ways, including data-rate, stealthiness and cost. They suggest the SCC algorithm is more efficient than the CC algorithm because it doesn't change all the colours in the interference graph; however, it is not clear how much the efficiency increase affects watermark embedding in real-world programs.

## Fingerprinting via Register Allocation

Qu et al.10 proposed a modification to the QP algorithm for fingerprinting. Fingerprinting is a class of watermarking which embeds a unique identifier into each copy of the software (or music, or films, etc) which allows the origin of the stolen intellectual property to be identified.

The generic approach proposed is to generate n solutions to a graph colouring problem and assign a given solution to one customer only. This approach guarantees that each customer will be able to be uniquely identified by the register allocation generated by their unique inteference graph colouring.

## QP Algorithms and Public-Key Cryptography

It is advisable to encrypt the watermark message before embedding to prevent an adversery from extracting a watermark, even if they know the extraction algorithm. For example, if an adversery obtains a program which they know contains a watermark embedding with the QPS algorithm they can simply run the QPS extraction algorithm and obtain the watermark; they could then claim that they inserted the watermark. Jian et al.11 propose a technique based on a combination of RSA public-key encryption and the QPI algorithm4. Simply, they suggest that the watermark bit string is encrypted before being embedded. If an adversery extracts the encrypted watermark they will not be able to decipher it, if the encryption is strong enough.

## Conclusion

The QP algorithm has been shown to be unextractable2 and an implementation of the QPS algorithm has shown that the algorithm has a low data-rate12 and is highly suspceptible to semantics-preserving transformation attacks3. Any other register allocation based software watermarking algorithm will also be susceptible to semantics-preserving transformations. More specifically, any transformation which alters the interference graph or register allocation will easily remove a watermark.

Register allocation based algorithms maybe be stealthy3, as they do not add any extra code, but we do not recommended them for protecting software due to their low-resilience to attacks. However, academic research continues in this area with the latest register allocation based approach9 published this year.

We believe that further research should instead focus on dynamic software watermarking techniques which, in theory, should be resilient to semantics-preserving transformations; thus providing a robust technique for the protection of software.

• 1. Qu, G., and K.. Potkonjak, Computer-Aided Design, 1998. ICCAD 98. Digest of Technical Papers. 1998 IEEE/ACM International Conference on, San Jose, California, United States, IEEE, 1998.
• 2. a. b. c. Zhu, W., and C. Thomborson, MM&Sec '06 Proceedings of the 8th workshop on Multimedia and security: ACM, 2006.
• 3. a. b. c. d. Myles, G., and C. Collberg, Information Security and Cryptology-ICISC 2003: Springer Berlin / Heidelberg, 2004.
• 4. a. b. c. d. e. Zhu, W., and C. Thomborson, Lecture notes in computer science, Berlin, ALLEMAGNE, Springer, 2006.
• 5. Zhu, W., and C. Thomborson, Recognition in software watermarking, , Santa Barbara, California, USA, ACM, 2006.
• 6. Sandmark, , 2004.
• 7. a. b. Lee, H., and K. Kaneko, Software Engineering, Artificial Intelligence, łdots, Phuket, IEEE Computer Society, 2008.
• 8. a. b. Lee, H., and K. Kaneko, Information Technology: New Generations, 2009. ITNG '09. Sixth International Conference on, Las Vegas, NV, IEEE Computer Society, 2009.
• 9. a. b. Li, J., and Q. Liu, e-Business and Information System Security (EBISS), 2010 2nd International Conference on, Wuhan, IEEE, 2010.
• 10. Qu, G., and M. Potkonjak, DAC '00 Proceedings of the 37th Annual Design Automation Conference, Los Angeles, California, USA, ACM New York, NY, USA, 2000.