Software watermarking is a software protection technique based on the insertion of copyright notices or unique identifiers into a program to prove ownership. The basic idea is that if a copyright owner finds a copy of their software (e.g. online) they would be able to prove, in a court of law, that they own that software. Alternatively, a software fingerprint - where the watermark is a unique customer ID for every copy of the program - would allow a software owner to trace the person who copied the software. I've previously discussed the problem of using software watermarking to prove ownership but here I intend to list the kinds of attacks we can expect against watermarked or fingerprinted programs.
We must assume that an attacker has the same watermarking tools that we have used to watermark our software - either it's a commercial system which they could purchase (like Allatori), open-source (like Sandmark), or an internal system which could be leaked. We therefore immediately have the problem of the attacker knowing exactly how are system works. They can study the system, watermark their own programs, compare watermarked and unwatermarked programs, analyse the watermarked programs etc. This will help them attack other programs. They may not know of our watermarking system but we cannot know that for sure. Even if they didn't there may be tell-tale signs of a watermark in a program which are similar to other watermark systems they have studied.
There are several types of attacks to think about: additive, distortive, subtractive, collusive, protocol.
An additive attack involves inserting another watermark into an already watermarked application, which may or may not over-write the existing watermark. Many static watermarking algorithms are susceptible to semantics-preserving transformation attacks so adding an additional watermark could destroy the original. It may also be the case that neither of the watermarks can be extracted. With current watermarking algorithms, an additive attack sometimes removes the original watermark if a watermark of the same type is embedded but not necessarily if a different type of watermark is embedded1.
If the original watermark is not over-written by the new watermark the attacker could claim ownership of the software if both watermarks are recognisable, resulting in a dispute over ownership. Any other person can insert their own watermark into our program by means of a semantics-preserving transformation. This may preserve the original watermark if the original watermark is not reliant on any syntactic properties of the program. The original author will be able to extract his or her watermark from the program and the attacker can counter this demonstration of ownership by extracting their own watermark. Although, the original author could demonstrate that the attacker's watermark is not in their copy of the software showing that the attacker was the one to add the extra watermark. Obviously, this whole situation causes confusion and casts suspicion on the origin of the software.
Distortive attacks involve applying semantics preserving transformations to a program, such as obfuscations or optimisations thus removing any watermarks which rely on program syntax. Transformations include, for example, renaming variables, loop transformations, function inlining, etc. Both static and dynamic watermarks can be susceptible to distortive attacks - current static watermarking algorithms are susceptible to distortive attacks, as well as some dynamic ones.
Myles and Collberg2 conducted an evaluation of dynamic and static versions of the Arboit algorithm by watermarking and obfuscating test files - they found the dynamic version to be only minimally stronger than the static version, and both versions could be defeated by distortive attacks. The dynamic graph watermarking algorithm CT is protected against a node splitting distortive attack only if cycles are used in the graph - this does, however, reduce stealth.
Usually, it is assumed that the attacker does not know the location of the watermark which means they would have to apply all distortive attacks uniformily across a program; this is likely to have an adverse affect on performance. However, most watermark algorithms are unstealthy so this does not necessarily hold true - if the attacker can find the watermark they can apply the attack to only the watermarked code (read more about the importance of stealth in software watermarking).
A subtractive attack involves removing the section, or sections, of code where the watermark is stored while leaving behind a working program. This could be achieved by dead code elimination, statistical analysis or program slicing. If we can sufficiently hide the watermark then this attack is simple to protect against - if the attacker cannot find the watermark they cannot remove it.
A defence against subtractive attacks is the use of opaque predicates which can hide bogus dependencies between the original program and the new program. For example, variables or methods from the original program can be assigned values generated by the watermarking code; it will appear that the watermark code is part of the original program. Automated tools, such as program slicing tools, will find it difficult to remove the watermark code due to the bogus dependencies.
Collusive attacks involve the comparative analysis of 2 or more copies of a fingerprinted program. In a simple case the only difference between the two watermarks would be the fingerprint thus revealing the location of the fingerprint in all the programs. Every program must be obfuscated differently before distribution in order to avoid this kind of attack. After obfuscation there will be many differences between the programs and the watermark will be harder to find. However, this may cause a problem for debugging customers' programs; for example, bug reports sent in by customers may be specific to their copy of the software. Collberg and Thomborson3 suggest that it will be neccessary to store a copy of the keys used to fingerprint and obfuscation every copy of a sold program in order to recreate and exact copy of the customers program for debugging purposes.
Protocol attacks involve the attack producing a fake recogniser that can "recognise" a watermark from a program. The attacker would demonstrate, in court to a Judge, that he created the stolen program by "recognising" a watermarking. The attacker could also accuse the real owner of have the fake recogniser thus causing confusion over the real owner of the program. An expert would have to be bought in to study the recognisers to work out who is the real owner4. I argue that protocol attacks are a big problem for software watermarking and actually watermarks are not necessary to prove software ownership as the real author should always have a copy of the original source-code.
It is known that other types of digital watermark are susceptible to additive attacks5, for example watermarks in images6.
Software watermark has an advantage over these other types of watermarks as tamperproofing can be added to a program. For example, in Java we can use the Reflection API to count how many fields are in a class; if an attacker splits variables and there are now more fields than we expect we know that the program has been tampered with. However, Java is restricted in the type of questions it can ask itself about it's own program; for example a program cannot check whether it's 52nd instruction is iconst_0. This makes Java tamperproofing harder than that of native code tamperproofing. However, tamperproofed code is highly unstealthy, especially in type-safe bytecode languages like Java, and unusual in most real-world programs7.
If an attacker can find tamperproofing code (because tamperproofing is unstealthy) then they can then attempt to remove the code. Continuing the above example, if an attacker found the code that checked for 3 fields in a certain class the attacker could simply amend the code so that the conditional test is always true. Of course, tamperproofing could be combined with obfuscation but many obfuscations just provide layers of obscurity and a determined attacker will find a way in.
Tamper-proofing dynamic watermarks is easier because it is far more stealthy to check the value of a variable than it is to check a property of the code4.
Static techniques are highly susceptible to semantics preserving transformation attacks and are therefore easily removed by an adversary. Dynamic techniques are slightly more resilient to distortive attacks but can only protect whole programs, not individual classes or modules.
It is difficult to tamperproof Java bytecode stealthily and any watermarking systems are susceptible to protocol attacks. An protocol attack ensures that there will be a dispute over ownership as the attacker shows that they can extract their watermark from the program. This confuses the problem which software watermarking was designed to solve.
It is easier to tamperproof native code and much commercial software is distributed in this way; however, recently there has been significant problem of software piracy on the Android market* whose programs are distributed as bytecode for the Dalvik virtual machine - similar to Java bytecode.
- 1. , "Using Software Watermarking to Discourage Piracy", Crossroads - The ACM Student Magazine, 2004.
- 2. , "Software watermarking via opaque predicates: Implementation, analysis, and attacks", Electronic Commerce Research, vol. 6, pp. 155–171, 2006.
- 3. , "Dynamic graph-based software fingerprinting", ACM Trans. Program. Lang. Syst., vol. 29, pp. 35, 2007.
- 4. a. b. , Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection: Addison-Wesley, 2009.
- 5. , Secrets and Lies: Digital Security in a Networked World: John Wiley & Sons, pp. 432, 2000.
- 6. , "Overview of Watermarks, Fingerprints, and Digital Signatures", JISC Technology Applications Programme (JTAP), 1999.
- 7. , "On the limits of software watermarking", Proc. ACM Symp. on Principles of Programming Languages, 1999.
- *. interestingly, the changes that the software 'thieves' made were so small that it was obvious to the owners, and others, that their software had been copied (see the screenshots in the article). Google's advise was to use their license checking service and use Proguard to obfuscate a program. A developer responded "I did use Proguard, and that didn't stop my app being pirated and uploaded ... and the License Checking Service is what we use to check an app has been paid for. It simply does a request to the Market to check the status (ie has the app been paid for). This works fine, if the hacker hasn't removed that bit of code. Which is easy to do.". Clearly obfuscation doesn't thwart attackers when they need to make such a small number of changes to an app to be profitable.