Software watermarking by providing a means to identify the owner of a piece of software and/or the origin of the stolen software. The hidden watermark can be recognised or extracted, at a later date, by the use of a recogniser or extractor to prove ownership of stolen software. It is also possible to embed a unique customer identifier in each copy of the software distributed which allows the software company to identify the individual that pirated the software - this is known as fingerprinting. A software watermark should allow an author to prove ownership of a piece of copied software but how can the author demonstrate extraction of a watermark to a judge in a court of law?
Software watermarking1 2 3 etc involves embedding a unique identifier within a piece of software to prove ownership. Watermarking does not prevent copying but instead discourages software thieves4 by providing a means to identify the owner of a piece of software and/or the origin of the stolen software. The hidden watermark can be recognised or extracted, at a later date, by the use of a recogniser or extractor to prove ownership of stolen software. It is also possible to embed a unique customer identifier in each copy of the software distributed which allows the software company to identify the individual that pirated the software - this is known as fingerprinting.
A software watermark should allow an author to prove ownership of a piece of copied software but how can the author demonstrate extraction of a watermark to a judge in a court of law? If Alice develops software compiled from source code S1 into bytecode B1 she must protect the software from software 'pirates'. Alice might obfuscate the bytecode to hinder program understanding and decompilation. The obfuscated code B1' will also contain a software watermark identifying the software owner as Alice.
Alice publishes her software online where Bob purchases a copy B2 which he copies and passes off as his own. He has made some changes to the bytecode to remove Alice's name and various identifying marks from the GUI. Alice discovers Bob's version of the software and takes Bob to court where she must demonstrate that she is the owner of the software.
The judge asks both Bob and Alice to demonstrate that they are the authors of the software by extracting watermarks. Alice can extract her watermark from both copies of the software but Bob claims that her recogniser is fake. He then continues to demonstrate how his recogniser can 'extract' a watermark from both copies of the software. Who can the judge believe? The judge is most likely not a software developer so an expert witness will have to be called to examine all copies of the software and the recognisers to determine who is genuine5.
In another scenario, Bob decompiles the program to source S2, makes some changes to disguise it's origin, compiles and publishes the software as his own as B2'. Alice can extract a watermark from her copy of the software but not from Bob's copy because he has decompiled and changed the program enough that the watermark is not intact. Bob has embedded a watermark in his software which he can extract on demand. So both parties have a peice of software which they claim is their own. Who should the judge believe? Again, an expert witness must be called to determine the real owner.
The expert witness may have to spend a fair amount of time examining both parties' applications before they are able to report back.
In order to speed up the process of identifying the real owner, both parties could be asked to show the source code from which their software is compiled. It will be obvious, on inspecting the source-code, which party is the real author of the software. Either Bob won't have a copy of the source code, or he'll have a decompiled copy of the source code. If Bob cannot show source code for the program he claims is his then we can be pretty sure the real author is Alice.
If Bob presents decompiled source code S2 then his will `look' decompiled; whereas Alice's S1 will be well-formatted, ordered, commented and consistent with other software written by human programmers. Source code produced by current decompilers does not look like it was written by a human programmer. Apart from obvious differences such as lack of comments and coherent variable names, many decompilers produce code which is verbose and contains extraneous instructions. Although some work has been done to improve the output of decompilers6 it is difficult to produce code which looks like it has been produced by a human. Obviously, without a great deal of human re-working, decompiled source code will never contain comments or coherent variable names.
Currently, there are no perfect decompilers7 so Bob may not even be able to produce source code or Alice's obfuscations may hinder Bob's decompilation; Bob may not be willing to put in the effort required to decompile the program as it may, instead, be quicker to write his own version of the program from scratch.
Static watermarks are inherently flawed due to a low resilience to semantics-preserving transformations. Some dynamic watermarks are also susceptible to semantics-preserving transformation attacks and they are only able to protect a whole program rather than individual modules or classes. Software watermarking, in theory, has an advantage over other types of digital watermarking (e.g. audio, video, etc) - software can be tamperproofed. For example, in Java we can use the Reflection API to count how many fields are in a class; if an attacker splits variables and there are now more fields than we expect we know that the program has been tampered with.
However, Java is restricted in the type of questions it can ask itself about it's own program; for example a program cannot check whether it's 52nd instruction is iconst_0. This makes Java tamperproofing harder than that of native code tamperproofing, which can inspect it's own code more readily. However, tamperproofed code is highly unstealthy, especially in type-safe bytecode languages like Java, and unusual in most real-world programs8. If an attacker can find tamperproofing code (because tamperproofing is unstealthy) then they can then attempt to remove the code. So tamper-poofing won't help us much with protecting watermarks in Java bytecode.
Software watermarks are actually unnecessary, to prove ownership, as the true software author should be able to produce, on demand, a copy of the source-code which produced the bytecode*. Even if an advsary could produce a decompiled copy of the source-code it will be obvious (to any software expert) who is the true software owner.
Software watermarks may, however, be useful for tracking copied software, for example using a web-crawler to look for watermarked software. Or for identifying the person who copied the software (via fingerprinting).
- 1. , "On the limits of software watermarking", Proc. ACM Symp. on Principles of Programming Languages, 1999.
- 2. , "Software watermarking: Protective terminology", Proceedings of the ACSC: Citeseer, 2002.
- 3. , "A functional taxonomy for software watermarking", 25th Australasian Computer Science Conference (ACSC2002), Melbourne, Australia, Australian Computer Society, Inc., 2001.
- 4. , "Using Software Watermarking to Discourage Piracy", Crossroads - The ACM Student Magazine, 2004.
- 5. , Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection: Addison-Wesley, 2009.
- 6. , "Programmer-friendly decompiled Java", 14th IEEE International Conference on Program Comprehension, Athens, Greece, IEEE Computer Society, 2006.
- 7. , "An evaluation of current java bytecode decompilers", 2009 Ninth IEEE International Working Conference on Source Code Analysis and Manipulation, Edmonton, Alberta, Canada, IEEE, 2009.
- 8. , "Dynamic graph-based software fingerprinting", ACM Trans. Program. Lang. Syst., vol. 29, pp. 35, 2007.
- *. Of course, there is always the possibility of the Bob copying Alice's source code which will make it difficult to prove ownership of the software but we assume that Alice is able to keep her source code safe