Types of Software Watermark

July 15, 2010 - 15:01

Software watermarks can be broadly divided into two categories: static and dynamic. The former embeds the watermark in the data and/or code of the program, while the latter embeds the watermark in a data structure built at runtime.

Software watermarks can be broadly divided into two categories: static and dynamic1. The former embeds the watermark in the data and/or code of the program, while the latter embeds the watermark in a data structure built at runtime.

A diagram showing a simple static watermarking system.

Figure 1 shows a very simple static watermarking system - The watermark W, the input program P and an optional key K are fed into the embedder. If a key is used then the watermark is encrypted. The embedder outputs a semantically equivalent program P' containing an extra field; this field contains the watermark. A recogniser takes the watermarked program P', looks for the watermark field and outputs the original watermark W.

A diagram showing a simple dynamic watermarking system.

Figure 2 shows a very simple dynamic watermarking system - The watermark W, the input program P and a key K are fed into the embedder. The key, in this case, is a secret input to the program. This input causes a specific path through the program execution to be taken. The embedder outputs a new program P' which, when executed with the secret input, generates the watermark. A recogniser inspects the program P' while running, looking for value of the watermark variable and outputs the original watermark W.

In comparison to a static system, a dynamic watermarking system needs to execute the program in order to retrieve the watermark. The watermark is stored in the semantics of the program rather than the syntax. Dynamic watermarks should, in theory, be resilient to semantics-preserving transformations.

Software watermarking algorithms can be further categorised based on the method of embedding the watermark. Some of these categories include:

  • Abstract interpretation algorithms are static analysis based in which watermarking recovery is handled by an abstract interpretation of the code semantics2.
  • Basic block re-ordering algorithms encode a watermark by re-ordering the basic blocks of a program and inserting jump statements to preserve the original control flow. One of the earliest software watermarking algorithms3 uses this technique.
  • Code replacement algorithms encode a watermark by replacing parts of a program, for example by replacing certain op-codes with equivalent instructions. For example, Hydan, defines sets of functionally-equivalent instructions and encodes  information in machine code by using the appropriate instructions from each set4.
  • Constraint based algorithms encode the watermark in a set of extra constraints to the original algorithmic problem. The solution to this new problem will therefore satisfy the original problem and the watermark constraints5. Register allocation based software watermarks are examples of constraint based algorithms.
  • Dynamic path algorithms encode watermarks based on the dynamic branching behaviour of programs by adding branches to program code6.
  • Graph based algorithms encode the watermark in a graph structure and embed the graph structure in a program. Venkatesan et al.7 describe a static graph watermarking system in which the watermark graph is merged with the program control flow graph.
  • Opaque predicate algorithms rely on being enclosed by a predicate whose outcome is known a priori. It is difficult for automated software analysis to know the value of the predicate; therefore it is not know whether the enclosed code (which may or may not be a watermark) could be removed8.
  • Spread-spectrum algorithms were originally developed for multimedia watermarking 9; in software the idea is to spread the watermark across a program by modifying instruction frequencies 10
  • Thread based algorithms encode watermarks within distinctive behaviour of threads added to programs11.

Some of these could be implemented statically or dynamically, for example there are static7 and dynamic1 graph watermarking algorithms.

Watermarks can be visible or invisible, for example a copyright message would be visible whereas a fingerprint mark may be invisible. Watermarks are either robust or fragile - robust watermarks are resilient to attacks, whereas fragile watermarks are designed to become ineffective if damamged.

Watermarks can also be divided into 4 groups based on their purpose12:

  • Authorship Mark identifying a software author, or authors. These watermarks are generally visible and robust.
  • Fingerprinting Mark identifying the channel of distribution, i.e. the person who leaked the software. The watermarks are generally invisible, robust and consist of a unique identifier such as a customer reference number.
  • Validation Mark to verify that software is genuine and unchanged, for example like digitally signed Java Applets. These watermarks must be visible to the end-user to allow validation and fragile to ensure the software is not tampered with.
  • Licensing Mark used to authenticate software against a license key. The key should become ineffective if the watermark is damaged therefore licensing marks should be fragile.