Table of Contents:
this
object.
[ However, as explained in Section F. below, you may use the option
--no-jml-constructor-purity
to instruct the analysis to
consider such constructors impure. ]
.tgz
archive file that you can download from here.
It decompresses into a directory named purity-kit
,
containing the following files/subdirs:
purity.jar
- Contains the classes of the purity
analysis, together with a significant part of the Flex compiler
infrastructure. Click here
if you want to find out how to access the Java sources.
suppJars/
- Subdirectory with the two .jar
libraries used by the implementation of the purity analysis:
jpaul.jar
and jutil.jar
. These files are
not really part of the purity analysis: they are provided here only
for your convenience. You may also find them, together with their
sources and Javadoc documentation, on the websites of the jpaul and the JUtil
projects.
stdLib/
- Subdirectory that contains the implementation
of the Java standard library that Flex compiles/analyzes Java
applications against.
Clarification: the analyzed application consists of (1) the user
code, and (2) the standard library from stdLib/
. More
precisely, the current tool analyzes only those parts of (1) and
(2) that are transitively callable from the main method. Notice
that the Java standard library from (2) may be different from the
Java standard library used by your JVM.
Currently, the tool uses the Java standard library as implemented by the
GNU Classpath
project. More specifically, stdLib/
contains the
glibj.zip
from GNU Classpath v0.08, and some
.jar
s that reconcile GNU Classpath with the object
layout used by our compiler. Not sure they are really necessary
for the purity analyses, but better safe than sorry :)
jolden/
- Subdirectory containing the Java Olden
bechmarks. These applications are not part of the purity analysis:
they are provided here only as testcases.
COPYING
- The GNU GPL Licence details.
purity-test
and flex-self-test
- Two
testing scripts.
run-purity
- A convenient launch-script that takes
care of setting CLASSPATH
to include all relevant
jar
s. It also asks the JVM for a large heap space
(please adjust the -Xmx
flag from this script as you feel fit).
README.html
- The file you are reading right now.
CLASSPATH
of your JVM contains the following files:
purity.jar
, suppJars/jutil.jar
, and
suppJars/jpaul.jar
.
Hardware: a fast machine (recommended: P4 @ 3.2GHz or
equivalent) with a lot of memory (recommended: 1Gb for medium
benchmarks, at least 2Gb for more ambitious ones).
E. Testing
Once you have all files in place, it's a good idea to test right
away whether you can successfully run the purity analysis. To do
this, execute
cd purity-kit
./purity-test
The purity-test
script should analyze two Java Olden
benchmarks from jolden/
. If everything goes fine, the
script generates a very verbose output, full of progress indicators.
At the end of the analysis of each application, the analysis prints a
short statistic regarding the total number of pure methods. If you
want to analyze all of the ten Java Olden benchmarks, execute
./purity-test all
purity-test
creates a results/
subdirectory containing files with the analysis trace for each
analyzed testcase.
If you want to do a really big test, then run
./flex-self-test
The flex-self-test
script performs the purity analysis on
the entire code reachable from the Flex top-level class
harpoon.Main.SAMain (this includes the code of the analysis and much
more).
F. Usage
Reminder: Before using the purity analysis, you should make
sure that purity.jar
, suppJars/jpaul.jar
and
suppJars/jutil.jar
are on your JVM's CLASSPATH
.
The entry point for the purity analysis is the method
public static void harpoon.Main.Purity.main(String[] args)
You may invoke this method in two ways: (1) programmatically (e.g.,
from another Java class); or (2) by starting your favorite JVM with
the class harpoon.Main.Purity
as main class.
The elements of the args
array parameter correspond to
the command line arguments. They should be, in order:
args[0]
- The name of the main class of the
application you want to analyze.
args[1]
- The path where your application resides.
As any Java CLASSPATH, this path can contain several elements
(directors, .jar
files, and .zip
files),
separated by ":". The path that the analysis uses in order to load
analyzed classes consists of (in order):
1 | The user-defined path (i.e., the value of args[1] ). |
|
2 | stdLib/reflect-thunk.jar: |
Reimplement a few classes (e.g., reflection related ones), in order to "reconcile" GNU Classpath 0.08 with the object layout of our compiler. |
3 | stdLib/glibj-0.08-extra-purity.jar: |
A few additions to GNU Classpath 0.08; required to analyze class
files generated with JDK 1.5.0. GNU Classpath 0.08 implements the
Java library up to 1.1, plus some parts of 1.2, but does not contain
classes like java.lang.StringBuilder (introduced in JDK
1.5). Needless to say, these .jar s does not fully
upgrade GNU Classpath 0.08 to JDK 1.5; we provided only the classes
that are required such that the analysis can analyze itself. |
4 | stdLib/glibj-0.08-purity.zip |
The classes from the GNU Classpath 0.08 Java standard library, with some small changes to make them easier to analyze. These modifications concern the removal of caches from commonly-used library methods. |
harpoon.Main.SAMain.main(String[] args)
. Here are a few
possible options you may use:
--no-jml-constructor-purity
this
object.
--list-pure-methods <fileName> [<format>]
args
array: one for the
string "--list-pure-methods"
, one for
the file name, and one for the optional argument <format> ]
By default, the name of each pure method is output in Daikon format,
identical to the format produced by HMethod.toString()
,
which is itself identical to the format produced by
javap
: access qualifiers, space, fully qualified result
type, space, fully qualified class name, dot, method name, open paran,
comma-separated list of fully qualified parameter types, closed paran,
and (if the method throws any exceptions), space, the keyword
throws
, space, comma-separated list of fully qualified
exception types. Here is an example:
public java.lang.String java.io.File.getCanonicalPath() throws
java.io.IOException
public int java.util.Collections$SynchronizedCollection.size()
This file can be passed to the Daikon invariant detector.
If the optional argument <format> has the value
joe
, then the name of each pure methods is output in Joe
format (see below).
--joe-purity-file <fileName>
RC
stands for the receiver
(for non-static methods), the rest of the parameters use a
0
-based index. Here is an example:
daikon.DynamicConstants.is_constant(daikon.VarInfo) RC 0
Yes, this indexing scheme is different than the one used in the
method2ReadOnlyParams
structure (see Section G below); I've
tried to accomodate the needs of different analysis clients.
--pa2:data-structs speed|space
-r Support/locale-root-set-classpath
--purity-output-analyzed-methods <fileName>
--pa2:partial-program --methods-to-analyze <fileName>
Warning: you can take a look at ./purity-test
and
./flex-self-test
, but they are far more complex than what
you actually need. The reason for this complexity is that we also use
them for testing while doing development in the Flex project, and the
locations of certains elements are different than in the kit.
G. Programmatic Access to the Analysis Results
The purity analysis examines (almost) all methods that may be
transitively callable from the main method of the application (i.e.,
the main
method of the main class). After the analysis
terminates, all detected pure methods are put in a set stored in a
static field in the class
harpoon.Analysis.PA2.Mutation.WPMutationAnalysisCompStage
:
public static Set<HMethod> pureMethods;
[ We say "almost" because, for efficiency reasons, the analysis skips (i.e., considers unanalyzable) several large library methods, whose analysis takes a long time and does not seem to produce anything useful. Those methods deal with the SecurityManager and the time zones. Let us know if you really care about them: the analysis can analyze them, at the cost of a bigger analysis time. ]
Also, the analysis stores a map from each analyzed method to a list of
the indices of the "read-only" parameters. A parameter is read-only
if the method does not mutate any object transitively reachable from
that parameter. The read-only information is unsound in general; it
should be used only for program understanding and testing tasks. This
map is stored in another static field of
harpoon.Analysis.PA2.Mutation.WPMutationAnalysisCompStage
:
public static Map<HMethod,List<Integer>> method2ReadOnlyParams;
The parameter indices start from 0 and take into account all parameters, including those of primitive types, and, in the case of instance methods, the implicit this argument (the first one, i.e., #0).
harpoon.ClassFile.HMethod
is the Flex handle for a method
from the analyzed code. Full Javadoc available on the Flex website, or
directly through this URL.
H. Limitations / Thorny Issues
The primary goal of the research presented in the VMCAI'05 paper was to support a more flexible notion of purity than previous analyses; in particular, we want to allow pure methods to mutate newly allocated objects (e.g., the objects allocated in order to construct the method result). There are other published purity analyses (although it is unclear whether there is any publicly-available tool behind them). Some of these analyses support a stricter notion of purity and claim to be faster that this analysis. If interested, please search these papers using your favorite search engine and contact their authors directly. Alternatively, if you are an author of such an analysis, have a publicly available implementation, and want it mentioned here, please drop Alex Salcianu an email, and your project will be mentioned here.
There exists experimental support for the analysis without a
call-graph, by using very conservative assumptions, namely, that each
virtual call can do anything (e.g., arbitrary mutations). The
analysis assumes that special methods (e.g., toString) are
well-behaved. This is better than no analysis at all! To use this
experimental feature, you need the options
filename should be a text file that lists the names of the
methods you want to analyze, one per line. Each method name should be
fully qualified: full class name, ".", method name, "(", fully
qualified parameter types with "," in between, ")". filename
can also contain fully qualified class names: in this case, the tool
analyzes all methods from these classes.
So far we have explained the need to know all classes that may
possibly be instantiated in a program. Discovering this information
requires knowledge of all methods that may be invoked by the analyzed
program (because each method can instantiates classes that may not be
instantiated anywhere else in the program). Detecting all invoked
methods and all instantiated classes is complex in the presence of (1)
class instantiation via reflection, and (2) native methods - that our
compiler infrastructure (that deals only with Java bytecode) does not
see. That's how we run into the other difficulty, finding the
appropriate set of roots (see next point).
In general, Flex (or any other static compiler) is unable to find the
roots of type (1) (e.g., the program can load classes with names read
from an input file). Theoretically, the roots of type (2) should be
easy to find by the compiler, once we know the specific Java standard
library we compile against (we just make the compiler know about the
Java classes/methods that each native method from the standard library
instantiates/invokes). Still, Flex does not always do this (please
don't ask me why; hopefully, this "feature" will be fixed at some
point). Instead, you can use the additional command line option
and pass a few additional roots to Flex.
The root problem introduces a new limitation: Flex cannot compile /
analyze applications against arbitrary implementations of the Java
standard library (see next point).
[ You may be confused that we don't base our tool on one of the Sun
JDK implementations of the standard library (a
Other similar methods are
This assumption allows the analysis to treat calls to such methods
very fast, bypassing the usual inter-procedural algorithm. Of course,
after the analysis terminates, you can look over its log and see if
any special method violated the assumption. In general, this happens
because of some caching mechanism (i.e., the methods updates a cache
pointed from the o.foo()
. This requires knowledge of the types
of all objects that o
may point to (once we know this, we
can find out what foo
method is invoked for each such
class). Currently, Flex uses the Rapid Type Analysis (RTA) call-graph
construction algortitm. This algorithm takes place before the pointer
analysis: it first determines all classes that may be instantiated in
the program; next, for each virtual call of the form
o.foo()
, it considers all subclasses of the declared type
for o
that are instantiated in the current program.
[ Theoretically, it is possible to design and implement a combined
call-graph and pointer analysis; however, the pointer analysis alone
is very complex so we never investigated such a combination. ]
--pa2:partial-program --methods-to-analyze filename
-r <rootSetFileName>
classes.zip
/ rt.jar
file from a Sun JDK).
The reason has to do with the copyright attached to the Sun JDK, and
we don't want to say more here. ]
purity-kit.tgz
that you're using (such that we make
sure we debug the same version of the tool); (2) the application you
analyze (including all relevant .jar
/ .zip
files); (3) a text file containing the error trace, i.e., the output
of the analysis; (4) the command line / script / Java code you use to
invoke the analysis. Of course, there is no time guarantee about the
processing of the bug reports.
toString
methods from the program (and many
other methods) as being part of a "nest" of mutually recursive
methods. The reason is that these methods generally call
StringBuffer.append(Object)
, which calls
String.valueOf(Object)
; this last method returns
"null"
if its argument is null; otherwise, it dynamically
dispatch the toString()
method on its Object
argument, which (statically) appears to invoke all
toString()
methods from the program! [ You are right,
one could remove some of these problems by inlining
String.valueOf
until the type of its argument becomes
more specific that Object
; this will solve some, but not
all the problems. ]
equals
and
hashCode
. Clearly, running an inter-procedural
fixed-point algorithm on groups of dozens, possibly hundreds of
methods is too slow. However, notice that we don't usually
expect these methods to perform mutation. To speed up the analysis,
we assume that these methods are safe with respect to all their
parameters (the methods do not mutate and do not create new externally
visible paths to any object transitively reachable from the
parameters). We assume this property about all overriders of the
following methods:
String java.lang.Object.toString();
boolean java.lang.Object.equals(Object);
int java.lang.Object.hashCode();
int java.lang.Comparable.compareTo(Object);
this
parameter); in that case, you can
check manually that the caching mechanism is just a speed issue, with
no impact on the semantics of the program.
Harpoon/Analysis/PA2/Mutation/make-purity-kit
creates the
purity kit.
Good luck!