Purity Analysis Kit v0.11

Table of Contents:

A. Licence Information
B. Definition of Purity
C. Download and Installation
D. Requirements
E. Testing everything works fine
F. Usage
G. Programmatic Access to the Analysis Results
H. Limitations / Thorny Issues
I. Getting the Source Code

Warning: The purity analysis tool and this webpage are provided as-they-are, without any guarantee or legal responsibility on behalf of their authors.

A. Licence information

This purity analysis tool is released under the terms of the GNU General Public Licence (GNU GPL), available online from here. Please discontinue using this tool if you disagree with the GNU GPL. See Section I below to find out how you can obtain the source code.

B. Definition of Purity

This tool implements the combined pointer and purity analysis described in our VMCAI'05 paper. Our analysis considers a method pure if it does not mutate any object that existed before the method was invoked. This definition is identical to the one from JML. Notice that a method that is pure according to this definition may still create, mutate, and even return new objects. This definition is useful for the following purposes: (1) a pure method can safely be used in assertions and instrumentation code; (2) If a method does not mutate any pre-existent object, the invariants attached to those objects propagate over calls to that method. In addition, as a special case, our (and JML's) definition of purity allows a pure constructor to mutate fields of the this object. [ However, as explained in Section F. below, you may use the option --no-jml-constructor-purity to instruct the analysis to consider such constructors impure. ]

C. Download and Installation

The purity analysis kit is released by the jppa project hosted on SourceForge. The purity analysis kit consists of a .tgz archive file that you can download from here. It decompresses into a directory named purity-kit, containing the following files/subdirs:

purity.jar - Contains the classes of the purity analysis, together with a significant part of the Flex compiler infrastructure. Click here if you want to find out how to access the Java sources.
suppJars/ - Subdirectory with the two .jar libraries used by the implementation of the purity analysis: jpaul.jar and jutil.jar. These files are not really part of the purity analysis: they are provided here only for your convenience. You may also find them, together with their sources and Javadoc documentation, on the websites of the jpaul and the JUtil projects.
stdLib/ - Subdirectory that contains the implementation of the Java standard library that Flex compiles/analyzes Java applications against.
Clarification: the analyzed application consists of (1) the user code, and (2) the standard library from stdLib/. More precisely, the current tool analyzes only those parts of (1) and (2) that are transitively callable from the main method. Notice that the Java standard library from (2) may be different from the Java standard library used by your JVM.
Currently, the tool uses the Java standard library as implemented by the GNU Classpath project. More specifically, stdLib/ contains the glibj.zip from GNU Classpath v0.08, and some .jars that reconcile GNU Classpath with the object layout used by our compiler. Not sure they are really necessary for the purity analyses, but better safe than sorry :)
jolden/ - Subdirectory containing the Java Olden bechmarks. These applications are not part of the purity analysis: they are provided here only as testcases.
COPYING - The GNU GPL Licence details.
purity-test and flex-self-test - Two testing scripts.
run-purity - A convenient launch-script that takes care of setting CLASSPATH to include all relevant jars. It also asks the JVM for a large heap space (please adjust the -Xmx flag from this script as you feel fit).
README.html - The file you are reading right now.

D. Requirements

Software: JDK 1.5 (Java with generics). The purity tool has been tested on Linux (RedHat, Debian, Fedora Core 3) with the first 5 releases of SUN JDK 1.5.0 (_01 to _05). You should make sure that the CLASSPATH of your JVM contains the following files: purity.jar, suppJars/jutil.jar, and suppJars/jpaul.jar.

Hardware: a fast machine (recommended: P4 @ 3.2GHz or equivalent) with a lot of memory (recommended: 1Gb for medium benchmarks, at least 2Gb for more ambitious ones).

E. Testing

Once you have all files in place, it's a good idea to test right away whether you can successfully run the purity analysis. To do this, execute

cd purity-kit ./purity-test

The purity-test script should analyze two Java Olden benchmarks from jolden/. If everything goes fine, the script generates a very verbose output, full of progress indicators. At the end of the analysis of each application, the analysis prints a short statistic regarding the total number of pure methods. If you want to analyze all of the ten Java Olden benchmarks, execute

./purity-test all

purity-test creates a results/ subdirectory containing files with the analysis trace for each analyzed testcase.

If you want to do a really big test, then run

./flex-self-test

The flex-self-test script performs the purity analysis on the entire code reachable from the Flex top-level class harpoon.Main.SAMain (this includes the code of the analysis and much more).

F. Usage

Reminder: Before using the purity analysis, you should make sure that purity.jar, suppJars/jpaul.jar and suppJars/jutil.jar are on your JVM's CLASSPATH.

The entry point for the purity analysis is the method

public static void harpoon.Main.Purity.main(String[] args)

You may invoke this method in two ways: (1) programmatically (e.g., from another Java class); or (2) by starting your favorite JVM with the class harpoon.Main.Purity as main class.

The elements of the args array parameter correspond to the command line arguments. They should be, in order:

args[0] - The name of the main class of the application you want to analyze.

args[1] - The path where your application resides. As any Java CLASSPATH, this path can contain several elements (directors, .jar files, and .zip files), separated by ":". The path that the analysis uses in order to load analyzed classes consists of (in order):

1	The user-defined path (i.e., the value of `args[1]`).
2	`stdLib/reflect-thunk.jar: stdLib/cpvm.jar`	Reimplement a few classes (e.g., reflection related ones), in order to "reconcile" GNU Classpath 0.08 with the object layout of our compiler.
3	`stdLib/glibj-0.08-extra-purity.jar: stdLib/glibj-0.08-extra.jar`	A few additions to GNU Classpath 0.08; required to analyze class files generated with JDK 1.5.0. GNU Classpath 0.08 implements the Java library up to 1.1, plus some parts of 1.2, but does not contain classes like `java.lang.StringBuilder` (introduced in JDK 1.5). Needless to say, these `.jar`s does not fully upgrade GNU Classpath 0.08 to JDK 1.5; we provided only the classes that are required such that the analysis can analyze itself.
4	`stdLib/glibj-0.08-purity.zip`	The classes from the GNU Classpath 0.08 Java standard library, with some small changes to make them easier to analyze. These modifications concern the removal of caches from commonly-used library methods.

The rest of the arguments are passed verbatim to harpoon.Main.SAMain.main(String[] args). Here are a few possible options you may use:
- --no-jml-constructor-purity
  The analysis considers impure the constructors that mutate the fields of the this object.
- --list-pure-methods <fileName> [<format>]
  The analysis outputs a text file with the names of the pure methods, one method name per line (no empty lines). [ Note: this option requires several entries in the args array: one for the string "--list-pure-methods", one for the file name, and one for the optional argument <format> ]
  By default, the name of each pure method is output in Daikon format, identical to the format produced by HMethod.toString(), which is itself identical to the format produced by javap: access qualifiers, space, fully qualified result type, space, fully qualified class name, dot, method name, open paran, comma-separated list of fully qualified parameter types, closed paran, and (if the method throws any exceptions), space, the keyword throws, space, comma-separated list of fully qualified exception types. Here is an example:
  public java.lang.String java.io.File.getCanonicalPath() throws java.io.IOException public int java.util.Collections$SynchronizedCollection.size()
  This file can be passed to the Daikon invariant detector.
  If the optional argument <format> has the value joe, then the name of each pure methods is output in Joe format (see below).
- --joe-purity-file <fileName>
  The analysis outputs a text file listing the read-only parameters of all methods from the program. For brevity, methods with no read-only parameters are skipped. The read-only information is unsound; it can be used for program understanding, test generations, etc, but not for program optimizations. The output file is meant to be used in the Palulu/Joe projects of Shay Artzi, Carlos Pacheco, and Adam Kiezun. For each method, the tool outputs a line containing the fully qualified method name, and next a space-separated list of the indices of the read-only parameters: RC stands for the receiver (for non-static methods), the rest of the parameters use a 0-based index. Here is an example:
  daikon.DynamicConstants.is_constant(daikon.VarInfo) RC 0
  Yes, this indexing scheme is different than the one used in the method2ReadOnlyParams structure (see Section G below); I've tried to accomodate the needs of different analysis clients.
- --pa2:data-structs speed|space
  Choose between data-structures optimized for speed (default) or for space.
- -r Support/locale-root-set-classpath
  Adds some roots related to the locale information. See the discussion below on Finding the Appropriate Set of Program Roots. Ask C. Scott Ananian (the Flex creator) for more information on the roots. A previous version of the tool had this option by default.
- --purity-output-analyzed-methods <fileName>
  Self explanatory
- --pa2:partial-program --methods-to-analyze <fileName>
  [Experimental] See discussion on Full program analysis below.

Warning: you can take a look at ./purity-test and ./flex-self-test, but they are far more complex than what you actually need. The reason for this complexity is that we also use them for testing while doing development in the Flex project, and the locations of certains elements are different than in the kit.

G. Programmatic Access to the Analysis Results

The purity analysis examines (almost) all methods that may be transitively callable from the main method of the application (i.e., the main method of the main class). After the analysis terminates, all detected pure methods are put in a set stored in a static field in the class harpoon.Analysis.PA2.Mutation.WPMutationAnalysisCompStage:

public static Set<HMethod> pureMethods;

[ We say "almost" because, for efficiency reasons, the analysis skips (i.e., considers unanalyzable) several large library methods, whose analysis takes a long time and does not seem to produce anything useful. Those methods deal with the SecurityManager and the time zones. Let us know if you really care about them: the analysis can analyze them, at the cost of a bigger analysis time. ]

Also, the analysis stores a map from each analyzed method to a list of the indices of the "read-only" parameters. A parameter is read-only if the method does not mutate any object transitively reachable from that parameter. The read-only information is unsound in general; it should be used only for program understanding and testing tasks. This map is stored in another static field of harpoon.Analysis.PA2.Mutation.WPMutationAnalysisCompStage:

public static Map<HMethod,List<Integer>> method2ReadOnlyParams;

The parameter indices start from 0 and take into account all parameters, including those of primitive types, and, in the case of instance methods, the implicit this argument (the first one, i.e., #0).

harpoon.ClassFile.HMethod is the Flex handle for a method from the analyzed code. Full Javadoc available on the Flex website, or directly through this URL.

H. Limitations / Thorny Issues

Scalability problems: The current implementation (the first version of which was completed in September 2005) is much faster than the one used for the benchmarks from the VMCAI'05 paper. Still, even the current implementation has problems when analyzing big programs. Most of these problems stem from the fact that we use an inherently expensive pointer analysis; some other problems are due to the fact that some parts of the Flex compiler infrastructure are not very fast.
The primary goal of the research presented in the VMCAI'05 paper was to support a more flexible notion of purity than previous analyses; in particular, we want to allow pure methods to mutate newly allocated objects (e.g., the objects allocated in order to construct the method result). There are other published purity analyses (although it is unclear whether there is any publicly-available tool behind them). Some of these analyses support a stricter notion of purity and claim to be faster that this analysis. If interested, please search these papers using your favorite search engine and contact their authors directly. Alternatively, if you are an author of such an analysis, have a publicly available implementation, and want it mentioned here, please drop Alex Salcianu an email, and your project will be mentioned here.
Full program analysis: Normally, the current tool requires a full program, i.e., by default, it cannot analyze a library. Here is why: the analysis is inter-procedural and requires a call-graph for the analyzed program. The analysis needs to know all potential callees for a virtual call of the form o.foo(). This requires knowledge of the types of all objects that o may point to (once we know this, we can find out what foo method is invoked for each such class). Currently, Flex uses the Rapid Type Analysis (RTA) call-graph construction algortitm. This algorithm takes place before the pointer analysis: it first determines all classes that may be instantiated in the program; next, for each virtual call of the form o.foo(), it considers all subclasses of the declared type for o that are instantiated in the current program. [ Theoretically, it is possible to design and implement a combined call-graph and pointer analysis; however, the pointer analysis alone is very complex so we never investigated such a combination. ]
There exists experimental support for the analysis without a call-graph, by using very conservative assumptions, namely, that each virtual call can do anything (e.g., arbitrary mutations). The analysis assumes that special methods (e.g., toString) are well-behaved. This is better than no analysis at all! To use this experimental feature, you need the options
--pa2:partial-program --methods-to-analyze filename
filename should be a text file that lists the names of the methods you want to analyze, one per line. Each method name should be fully qualified: full class name, ".", method name, "(", fully qualified parameter types with "," in between, ")". filename can also contain fully qualified class names: in this case, the tool analyzes all methods from these classes.
So far we have explained the need to know all classes that may possibly be instantiated in a program. Discovering this information requires knowledge of all methods that may be invoked by the analyzed program (because each method can instantiates classes that may not be instantiated anywhere else in the program). Detecting all invoked methods and all instantiated classes is complex in the presence of (1) class instantiation via reflection, and (2) native methods - that our compiler infrastructure (that deals only with Java bytecode) does not see. That's how we run into the other difficulty, finding the appropriate set of roots (see next point).
Finding the Appropriate Set of Program Roots: In big lines, a root is (1) a Java class/method that is instantiated/invoked using reflection and dynamic class loading; (2) a Java class/method that is instantiated/invoked by a native method.
In general, Flex (or any other static compiler) is unable to find the roots of type (1) (e.g., the program can load classes with names read from an input file). Theoretically, the roots of type (2) should be easy to find by the compiler, once we know the specific Java standard library we compile against (we just make the compiler know about the Java classes/methods that each native method from the standard library instantiates/invokes). Still, Flex does not always do this (please don't ask me why; hopefully, this "feature" will be fixed at some point). Instead, you can use the additional command line option
-r <rootSetFileName>
and pass a few additional roots to Flex.
The root problem introduces a new limitation: Flex cannot compile / analyze applications against arbitrary implementations of the Java standard library (see next point).
Specific Java Standard Library Only: Currently, Flex compiles / analyzes applications against the GNU Classath 0.08 implementation of the Java standard library. The reasons should be obvious from reading the previous points. If your application uses a part of the standard library that is not in the currently used Java library, you're in trouble ...
[ You may be confused that we don't base our tool on one of the Sun JDK implementations of the standard library (a classes.zip / rt.jar file from a Sun JDK). The reason has to do with the copyright attached to the Sun JDK, and we don't want to say more here. ]
Correctness Issues: Although the underlying theoretical analysis is correct, there are several sources of unsoundness in the current implementation:
- This is a research prototype, not an industrial tool! Please have realistic assumptions before using this tool; due to the lack of resources (i.e., time) it has not been engineered to perfection. Bug reports to Alex Salcianu should consist of a plain-text email with a URL to a place that contains: (1) the file purity-kit.tgz that you're using (such that we make sure we debug the same version of the tool); (2) the application you analyze (including all relevant .jar / .zip files); (3) a text file containing the error trace, i.e., the output of the analysis; (4) the command line / script / Java code you use to invoke the analysis. Of course, there is no time guarantee about the processing of the bug reports.
- Assumptions about special methods: Several methods appear almost everywhere, and complicate the call-graph computation. E.g., many call-graph construction algorithms would normally report almost all toString methods from the program (and many other methods) as being part of a "nest" of mutually recursive methods. The reason is that these methods generally call StringBuffer.append(Object), which calls String.valueOf(Object); this last method returns "null" if its argument is null; otherwise, it dynamically dispatch the toString() method on its Object argument, which (statically) appears to invoke all toString() methods from the program! [ You are right, one could remove some of these problems by inlining String.valueOf until the type of its argument becomes more specific that Object; this will solve some, but not all the problems. ]
  Other similar methods are equals and hashCode. Clearly, running an inter-procedural fixed-point algorithm on groups of dozens, possibly hundreds of methods is too slow. However, notice that we don't usually expect these methods to perform mutation. To speed up the analysis, we assume that these methods are safe with respect to all their parameters (the methods do not mutate and do not create new externally visible paths to any object transitively reachable from the parameters). We assume this property about all overriders of the following methods:
  String java.lang.Object.toString(); boolean java.lang.Object.equals(Object); int java.lang.Object.hashCode(); int java.lang.Comparable.compareTo(Object);
  This assumption allows the analysis to treat calls to such methods very fast, bypassing the usual inter-procedural algorithm. Of course, after the analysis terminates, you can look over its log and see if any special method violated the assumption. In general, this happens because of some caching mechanism (i.e., the methods updates a cache pointed from the this parameter); in that case, you can check manually that the caching mechanism is just a speed issue, with no impact on the semantics of the program.

I. Getting the Source Code

The purity kit is part of the jppa project. To get the jppa sources, please read the jppa developer info. After you donwload the sources, the script Harpoon/Analysis/PA2/Mutation/make-purity-kit creates the purity kit.

Good luck!

Hosted by

Copyright (c) 2006 - Alexandru Salcianu <salcianu@alum.mit.edu>
GNU General Public Licence
File CVS version: $Id: PURITY-README.html,v 1.6 2006/08/15 23:52:14 salcianu Exp $