The Importance of Stealth in Software Watermarking

April 02, 2011 - 17:08

The stealthiness of watermarked code is the degree to which the watermarked code can be distinguished from the unwatermarked code. Stealh is an important concept in watermark because if a watermark is unstealthy an attacker could find the watermark. If an attacker can find a watermark it will be easier for them to remove it. The attacker may still have to spend some time figuring out how to remove the watermark but it makes the task easier.

The stealthiness of watermarked code is the degree to which the watermarked code can be distinguished from the unwatermarked code. Stealh is an important concept in watermark because if a watermark is unstealthy an attacker could find the watermark. If an attacker can find a watermark it will be easier for them to remove it. The attacker may still have to spend some time figuring out how to remove the watermark but it makes the task easier. As Collberg puts it, if an attacker knows the location of your watermark, "you've already lost!"1.

There is static stealth and dynamic stealth - static stealth refers to the static properties of the program (such as instruction frequencies), whereas dynamic stealth refers to dynamic properties (such as execution trace). Stealth can also be local or global: local stealth refers to a comparison of watermarked code to non-watermarked code within a program, while global stealth refers to a comparison between the watermarked program and non-watermarked programs2.

Many watermarking algorithms (all static + some dynamic) are susceptible to many semantics-preserving transformations so, in theory, an attacker could apply many transformations to a watermarked program and hope that the watermark is destroyed (although the attacker can never be 100% sure). But applying transformations uniformily across the whole program could have an adverse effect on the performance of the program, especially if the attacker is applying multiple obfuscations. Therefore, an attacker could not destroy the watermark efficiently in this way assuming that the attacker cannot find the location of the watermark i.e. that the watermark is stealthy.

If the attacker could find the (probable) location of the watermark, and apply transformations to that section of the program, performance will not be greatly affected because only the areas which encode the watermark are transformed. Furthermore, the attacker may be able to identify the type of watermark algorithm used and apply a suitable transformation, instead of many transformations. For example, if the attacker identifies the watermark as a "static Arboit"3 watermark, which encodes the watermark in opaque predicates, then applying a transformation which inserts more opaque predicates will destroy the watermark.

Dynamic watermarks are usually embedded in program locations based on execution with a secret input - this means that the watermark code is only executed when the program is run with a secret input (known only to the embedder). If a dynamic algorithm is susceptible to obfuscation attacks and is unstealthy (such as the execution path algorithm) then an attacker could find the watermarked sections and apply obfuscations to those sections. There should be no performance issues because most of the watermark code sections should only get executed with the secret input.

If a dynamic watermark is not susceptible to obfuscation attacks then the location of the watermark will still help an attacker to perform a subtractive attack. Narrowing down the area in this way will allow the attacker to manually inspect the code and probably figure out a way to remove the watermark.

So what makes a watermark unstealthy? Some examples from current watermarking algorithms include:

  • in the execution path algorithm4 there are a large number of unusual constants and a large number of consecutive conditional statements.
  • in the Davidson/Myhrvold5 watermarked methods contain an unusually high ratio of goto instructions.
  • in the Monden6 algorithm dummy methods contain code which is likely to be statistically different from other methods in the program.
  • Allatori inserts large groups of 4 integer push instructions, followed by 4 pop instructions.
  • the dynamic graph watermarking algorithm7 will be unstealthy in a program which does mostly numerical operations, compared to object allocations1.

If all watermarking systems are public knowledge then an attacker could build up a database of watermark signatures, like anti-virus companies. The attacker could then use similar techniques to anti-virus software (e.g. 8) to automatically detect watermarks & the type of the watermark embedded in programs. For example, Teng9 showed that dynamic graph watermarks are susceptible to pattern matching analysis if the watermark/program code ratio is too high. Of course, this technique assumes that the attackers know all of the watermarking algorithms in use - we must assume this worse-case scenario because we cannot be sure that an attacker hasn't figured out our watermarking algorithms.

Intuitively, it is easy to make the assumption that watermarked code (or any computer generated code) is inherently unstealthy because a computer cannot program like a human (or can it?). A computer program will generate code based on a pre-programmed pattern, for example the structure of a watermark encoding, or the code for generating a GUI. If watermark/computer-generated code is inherently unstealthy then it may be possible to find any watermarks by analysing programs looking for tell-tale signs of watermarked/computer-generated code (similar to techniques for finding unseen viruses10). However, it is not clear if all the current watermarking algorithms exhibit any common unstealthy attributes or wether we could detect a newly created watermarking algorithm based on inherent unstealthiness.

What seems likely is that there is a relationship between the amount of effort put into creating a watermark and the stealthiness of a watermark - for example, a programmer could hand craft a custom watermark for every program they create, spending many days worrying about small aspects of stealthiness; on the other hand, an automatic watermark embedder would generate the same code for each watermark, and would not be able to place code as well as the programmer. Ideally, we want to embed stealthy watermarks with little effort (i.e. automatically).