More on Dynamic Graph Watermarking

April 10, 2011 - 21:31

Graph watermarking techniques encode a watermark in a graph structure which is embedded in a program either statically, or dynamically. Static watermarks can be encoded in a control flow graph while dynamic graphs are encoded in a data structure built at runtime. Like other static watermarking algorithms, static graph watermarking is susceptible to semantics-preserving transformation attacks. Collberg and Thomborson proposed the first dynamic graph based watermarking scheme, CT, to overcome problems with static watermarking schemes.

Graph watermarking techniques encode a watermark in a graph structure which is embedded in a program either statically, or dynamically. Static watermarks can be encoded in a control flow graph1 while dynamic graphs are encoded in a data structure built at runtime2. Like other static watermarking algorithms, static graph watermarking is susceptible to semantics-preserving transformation attacks.

Collberg and Thomborson3 proposed the first dynamic graph based watermarking scheme, CT, to overcome problems with static watermarking schemes. The first CT implementation was part of a system called JavaWizz4 which used PPCT graphs. Later, Collberg and Thomborson2 implemented a version of their algorithm in Sandmark which can use many different graph encoding schemes to encode the watermark, along with several other differences (see graph watermarking paper for more details).

Collberg and Thomborson2 evaluate CT_SMand conclude that it is efficient and resilient to semantics-preserving transformations. The algorithm is vulnerable to a node-splitting attack but the Sandmark implementation uses cycles in the graph to protect against this attack. The greatest vulnerability is it's limited stealthiness - although many object-oriented programs have lots of news and pointer assignments the large number of them used by this algorithm may actually be unstealthy5. Stealthiness is greatly decreased when using cycles in graphs, to protect against node-splitting, because the size of the code is dramatically increased.

Collberg and Nagra5 describe how alias analysis algorithms find it difficult to analyse deeply recursive structures and code with destructive updates - "the algorithms do very well for constructive code, which just builds a linked structure, but poorly for code that both builds and changes structures". However, if we can find just the construtive part of the code we can use program slicing to find the dependent 'destructive' code.

The simplest analysis that an attacker can perform is to look for a class which contains two fields of the same type as the class - this would indicate a potential watermark node class. Sandmark's implementation can find a suitable program class to use as a watermark node (instead of introducing a new class) - two instance variables of the class' type are added to the class. So we can track down nodes generated by the Sandmark implementation easily. Sandmark generates the following class (shown as Jimple code), if no other class is used as a node:

public class Watermark extends java.lang.Object
{
    public Watermark x$0;
    public Watermark x$1;

    public void <init>()
    {
        Watermark r0;

        r0 := @ this: Watermark;
        specialinvoke r0.<java.lang.Object: void <init>()>();
        return;
    }
}

The following is an example of using an existing class (from the jscicalc program) - notice the two watermark fields (the only change from the original):

public class jscicalc.graph.Point extends java.awt.geom.Point2D$Double
{
    private static final long serialVersionUID;
    public jscicalc.graph.Point newwmfield$0;
    public jscicalc.graph.Point newwmfield$1;

    public void <init>()
    {
        jscicalc.graph.Point r0;

        r0 := @ this: jscicalc.graph.Point;
        specialinvoke r0.<java.awt.geom.Point2D$Double: void <init>()>();
        return;
    }

    public void <init>(double, double)
    {
        jscicalc.graph.Point r0;
        double d0, d1;

        r0 := @ this: jscicalc.graph.Point;
        d0 := @ parameter0: double;
        d1 := @ parameter1: double;
        specialinvoke r0.<java.awt.geom.Point2D$Double: void <init>(double,double)>(d0, d1);
        return;
    }
}

Below is another example, from the jscicalc - the right() method before watermarking. The annotation can be seen which helps Sandmark decide where to place the watermark code.

    public void right() 
   {
        jscicalc.CalculatorApplet r0;
        jscicalc.DisplayPanel $r1;

        r0 := @ this: jscicalc.CalculatorApplet;
        staticinvoke <sandmark.watermark.ct.trace.Annotator: void sm$mark()>();
        $r1 = r0.<jscicalc.CalculatorApplet: jscicalc.DisplayPanel displayPanel>;
        virtualinvoke $r1.<jscicalc.DisplayPanel: void right()>();
        return;
    }

The method after watermarking is too long to paste here - download at the end of the article, if you really want to see. The size of the method is hugely increased to 811 lines! Ideally, the watermark should be spread throughout the program more - it is very unstealthy to have such a long method with so many object allocations. Sandmark may have chosen only this method because this is the only appropriate method found from the trace (although other annotated methods were called during execution).

When annotating a program the programmer really needs to think hard about where and where-not to place annotations. Annotations should be placed in areas that are executed deterministically and, ideally, only get hit during execution with the secret input (at least some must be hit during the secret input).

Teng6 showed that pattern analysis methods were good enough to find CT watermarks without the need to any expensive alias analysis, or slicing, etc Teng explored the watermark code/original code ratio and found that many programs cause the ratio to be high enough to be unstealthy.

So the question is, can CT watermarks be stealthy and resilient? Collberg and Nagra 5 suggest that you have to be "very careful" when implementing this algorithm. It seems, as with most watermarking algorithms, that there is a big trade off with stealthy & resilience. If you increase the resilience of the CT watermark you decrease the stealth - so attackers will be able to find the watermark and remove/destroy it anyway! If you decrease the resilience the attacker may not be able to find the watermark as easily but it will be susceptible to node splitting attacks.

The embedding of dynamic watermarks, such as CT, requires greater effort than embedding static watermarks but they seem only marginally better in terms of resilience and similar in terms of stealthiness.

This is an update of a previous, more general article on graph watermarking. More details about the CT algorithm are available there.

1. Venkatesan, R., V. Vazirani, and S. Sinha, "A graph theoretic approach to software watermarking", Information Hiding: Springer, 2001.
2. a. b. c. Collberg, C., C. Thomborson, and G. M. Townsend, "Dynamic graph-based software fingerprinting", ACM Trans. Program. Lang. Syst., vol. 29, pp. 35, 2007.
3. Collberg, C., "Software watermarking: Models and dynamic embeddings", Proceedings of the 26th ACM SIGPLAN-SIGACT, 1999.
4. Palsberg, J., S. Krishnaswamy, M. Kwon, and D. Ma, "Experience with software watermarking", Proceedings of the 16th Annual Computer Security Applications Conference: IEEE Computer Society, 2000.
5. a. b. c. Collberg, C., and J. Nagra, Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection: Addison-Wesley, 2009.
6. Teng, T., Unauthorized Detection of CT Watermarks Based on Pattern Analysis Methods: The University of Auckland, pp. 104, 2006.