Partitioning Version 1.2 Author: Eyal Amir (eyal.amir@cs.stanford.edu) Date: September 8, 2002 Acknowledgments: 1. This software includes a modification of a software written by Boris Cherkassky (cher@cher.msk.su) and Andrew Goldberg (avg@research.nj.nec.com) for computing maximum flow (Dinitz's algorithm's implementation). IG Systems (igsys@eclipse.com) holds the copyright for that software (See the file COPYRIGHT.PRF for terms of use). 2. Mike Stilman (mstilman@cs.stanford.edu) has debugged some of the code in this software and has written the code for the executables "pto" and "otp". He has also written the first draft of this README. 3. Satish Kumar (tksk@cs.stanford.edu) has written the code for tp_split and the code in the 'direct' subdirectory, which is used to direct a graph for a given query. 4. This software can use the libraries provided by hMETIS version 1.5, which is available from the University of Minnesota at http://www-users.cs.umn.edu/~karypis/metis/hmetis/ In systems in which hMETIS is installed, setting the hMETIS directory in the Makefile for this partitioning software (split) will result in a build that uses hMETIS in algorithm 6. Copyright: See the file COPYRIGHT for terms of use. =========================================================================== Contents: -------- A. Program overview B. Overview of main programs C. Utility programs D. Programs' usage E. Compilation & setup =========================================================================== A. PROGRAM OVERVIEW =================== The purpose of this package is to create partitioned versions of logical theories, Bayesian networks, or any other graph-based representations of knowledge. It can be used to process graphs and triangulate them or find tree decompositions of graphs. [Amir '2001] (available at http://www-formal.stanford.edu/eyal/papers/decomp-uai2001.ps) describes the theoretical basis and some experiments done with an earlier version of this package. In the case of logical representations, this software is used to decompose a problem (a theory, i.e., a set of axioms) into trees of subproblems called partitions. Each partition consists of a set of symbols and axioms that can be expressed with those symbols. The partition tree created by the following programs maintains the "running intersection property", namely, if a symbol appears in two partitions, then is appears in all the partitions on the path between them (in the logical setup this can be weakened to say that the symbols must appear on all the edges on the path between those partitions, but here we produce the stronger outcome mentioned above). To partition a problem (a set of axioms, a Bayesian network, etc.), we need to do the following steps: 1. Convert the problem into a graph decomposition problem. 2. Run "sp" on this graph. 3. Translate the partitioning of the graph back into a partitioning of the problem. We can also use the partitioning to create an order on symbols, after step 2, in those problems where we would rather use an ordering on symbols than a partitioning. Almost all programs mentioned below output their usage upon being called without arguments. These usages appear below as well. =========================================================================== B. OVERVIEW OF MAIN PROGRAMS ============================ 1. CONVERT INTO A GRAPH ----------------------- This package includes two conversion utilities (of those problems into graphs) for SAT in DIMACS format and for Bayesian networks in ".txt" format (examples are in directory "examples"). In addition, SRI's Snark theorem prover can convert theories in any of a set of standard first-order logic (FOL) formats (TPTP, KIF, CycL, etc.) into an index file that can be used by "sp". A. In the SAT case (DIMACS format) we do this using "bin/axioms_to_graph > " (Sample input: "*.cnf" Sample output: "*.sp") This is the first step in creating a partitioning in which we form a graph of symbols from a SAT cnf file. The output of this program is an "index file" in which every line includes a number of an axiom and a symbol that appears in that axiom (the axioms are enumerated automatically). The generated graph can be passed into "sp". EXAMPLE: bin/axioms_to_graph tests/medium.cnf > tests/medium.sp B. In the Bayesian networks case (".txt" format) we use "bin/txt-to-sp > " Currently, the program can handle only binary Bayes-net nodes. All RVs are treated as if they are binary in deriving the partitioning (and ordering). EXAMPLE: bin/txt-to-sp CPCS-360.txt > CPCS-360.sp 2. RUNNING "sp" --------------- SPLIT - "sp" (Sample input: "*.sp" Sample output: "*.out*") This program is used to create a partitioning from a graph of vertices (integers) or an axiom-symbol index. In the latter case, the program will transform the index into a graph (internally) before proceeding to the main subroutine. The partitions created are sets of symbols. This program can also be used to find the minimum vertex separator for two symbols in the graph of symbols. In the case of creating a partitioning: The input consists of the name of the graph or index, as well as a number of options. The program can output each partition, its size and its contents in DIMACS and LISP formats. It also outputs the graph of the partitions in DOT format (a format that the graphical-layout program "dot" can read). Given this output, we can run "dot -Tps " and get a postscript depiction of the partitioning). "sp" implements a number of methods for generating the partitioning. The method is specified with other options in the command line parameters of the program. The algorithms are rougly described as follows: 0. "SPLIT" is a recursive algorithm that finds a minimum vertex separator (not necessarily balanced) at each iteration. For every earlier separator found (in the recursion above our current position) it considers the set of vertices of this separator as a single vertex in the current iteration (this way we make sure the resulting structure is a tree). Details in [Amir & McIlraith KR'2000]. 1. "half_vtx_triang" is a recursive algorithm that finds a minimum 1/2-balanced vertex separator of the last "separator to go" at each iteration into two or three sets. This "separator to go" for the iterations that follow recursively to be the union of this separator together with the part of the previous "separator to go" that is included in the subgraph that is sent to the recursive step. Details in [Amir UAI'2001]. 2. "two-way-2/3_triang" is a recursive algorithm that finds a minimum 2/3-balanced vertex separator of the last "separator to go" at each iteration into exactly two sets. This "separator to go" for the iterations that follow recursively to be the union of this separator together with the part of the previous "separator to go" that is included in the subgraph that is sent to the recursive step. Details in [Amir UAI'2001]. 3. "min_degree" is a heuristic algorithm that iteratively selects vertices and determines the partitions correspondingly. The selection heuristic is "take the vertex that has the lowest degree". After it makes the selection, it adds edges to the graph so that the neighbors of the selected vertex become a clique. Subsequently, it removes the vertex from the graph and goes on to select the next vertex. The partitions correspond to those "cliqued neighbors" (together with the removed vertex). 4. "min_width" does the same as "min_degree", but does not create a clique from the neighbors. 5. "min_discrepancy" (also known as "min_fill") is a heuristic similar to "min_degree", with the difference in the selection criteria for the vertex. In each step it selects the vertex that has the minimum number of edges missing between its neighbors. 6. "logOPT" is similar to the approximation algorithm producing log(OPT) approximation in polynomial time described in [Amir 2001]. The main difference between that algorithm and the one implemented here is that we use hMETIS (which is not guaranteed to approximate the optimal) for finding a separator instead of the procedure of [Layton & Rao '1999]. This procedure is available only if hMETIS is also installed in the system because of redistribution prohibition set by the University of Minnesota. EXAMPLES: sp -a 3 -x -f tests/medium.sp > tests/medium.out3 Uses algorithm 3 to partition tests/medium.sp. The flag "-x" tells "sp" that the input file is an index rather than a graph in DIMACS format. PERFORMANCE SUMMARY (as of August 19, 2003): A summary of the speed and approximation ratios of the algorithms above is the following: Alg.# Name Approximation guarantee Speed Ranking 0 SPLIT -- 3 1 half_vtx_triang 4.5 5 2 two-way-2/3_triang 4 6 (slowest) 3 min_degree -- 1 (fastest) 5 min_fill -- 2 6 logOPT O(log(OPT))* 4 * Currently alg #6 is implemented using hMETIS and not an approximation-guaranteed procedure. 3. TRANSLATE THE PARTITIONS OF THE GRAPH INTO PARTITIONS OF AXIOMS ------------------------------------------------------------------ - "bin/pa" (Sample input: "*.out*","*.cnf" Sample output: "*.pa") The "pa" program produces a tree of partitions of axioms. Currently, it is used solely for SAT problems, and it expects one of the input files to be the original (unpartitioned) SAT problem in DIMACS format. Input to "pa" consists of this partitioning as well as the original axioms in the theory. This program assigns axioms to partitions. An axiom belongs to a partition if all the symbols from an axiom are contained within the partition. "pa" outputs each partition along with its axioms. The output is in a DIMACS-like format, defined by Paul Ruhlen for partitioned SAT problems (ruhlen@cs.stanford.edu). An example output appears in directory examples/ together with an explanation (in the file) of the different lines. format: bin/pa [] EXAMPLE: bin/pa tests/medium.cnf tests/medium.out3 > tests/medium.pa A different program is currently used by the SRI theorem prover, Snark, to decide on the axioms that should be in each partition. =========================================================================== C. UTILITY PROGRAMS =================== THE ENTIRE PIPELINE FOR DIMACS SAT PROBLEMS ------------------------------------------- - "dimacs_split (Sample input: "*.cnf") This script performs all the steps outlined above for partitioning a SAT problem in DIMACS format. Along the way it produces all the intermediate files (it does not remove them). EXAMPLE: dimacs_split tests/medium.cnf 5 -C partitions the cnf problem tests/medium.cnf using algorithm 5. SOME OF THE PIPELINE FOR THEOREM PROVING PROBLEMS ------------------------------------------------- - "tp_split" (Sample input: "*.sp" "*.dat", sample output: "*.out?") This script takes an "sp" input file and a file that includes all the symbols of a query, partitions the set of symbols given by the ".sp" file into partitions, and directs the edges of the tree in the direction of a partition that includes the query. EXAMPLE: tp_split tests/spatial-index2.sp tests/spatial-query 5 partitions the symbols index tests/spatial-index2.sp using algorithm 5 and orients the edges in the direction of the query given in tests/spatial-query. FIND TREEWIDTH -------------- - "bin/find_k" (Sample input: "*.out*") This program can be used to find the size (number of symbols) of the largest partition. Run it on the output of sp (only if the original requested output format is 2 or 3). EXAMPLE: bin/find_k tests/medium.out3 PARTITIONING TO ORDER --------------------- - "bin/pto" (Sample input: "*.out*" Sample output: "*.ord") This program takes a partitioning of symbols generated by "sp" as well as a node in the partitioning. It then creates an ordering of symbols for querying a particular partition. The ordering begins with the outer nodes and converges at the given partition. This ordering is simply a list of symbols. It is guaranteed to maintain the same treewidth. EXAMPLE: bin/pto tests/medium.out3 1 > tests/medium.order3-1 ORDER TO PARTITIONING --------------------- - "bin/otp" (Sample input: "*.ord","*.sp" Sample output: "*.sp") Given any order over the symbols (or the nodes in the graph), we can create a partitioning of the same treewidth as the order. For example, the ordering created by "pto" can be used to recreate a partitioning of equal or smaller treewidth to the original. For each symbol in the ordering, a partition is made from that symbol and its neighbors in the graph of symbols. The symbol is then removed, and a clique is made from its neighbors. The process continues until the end of the sequence. The output of this program is analogous to the output of "sp". EXAMPLE: bin/otp tests/medium.sp tests/medium.order3-1 > tests/medium.new_partitioning GETTING GRAPHS DISPLAYED ------------------------ - "bin/get_dot" (Sample input: "*.out?" Sample output: "dot_file?") - "bin/get_dot_ps" (Sample input: "*.out?" Sample output: "*.ps" and "*.dot") Given any output file from "sp", "tp_split" or alike, we can get a file that can be processed by the AT&T's "dot" graph layout program (can be downloaded for free from http://www.research.att.com/sw/tools/graphviz/). To get this file, either "get_dot" or "get_dot_ps" can be used. The former always produced a file called "dot_file?", where "?" stands for a number. The latter goes the extra step and produces a postscript file of the layout (i.e., runs "dot" to produce this postscript file). Its outputs are the original name of the input file, concatenated with ".dot" and ".ps", respectively. EXAMPLE: bin/get_dot_ps tests/hd.out2 =========================================================================== D. PROGRAMS' USAGE ================== AXIOMS TO GRAPH axioms_to_graph There are no options to this program. SPLIT 'split' version 1.2; September 2002. usage: sp [-a n] [-O n] [-p n] [-o n] [-F n] [-x] [-T] [-h] -f [-m ] Finds a complete decomposition. -a n - algorithm (default is 3=min_degree): APPROXIMATION: 1=half_vtx_triang (4.5-apx); 2=two_way2/3_triang (4-apx); 6=logOPT (logOPT-apx (currently using HMETIS)) HEURISTICS: 0=SPLIT; 3=min_degree; 5=min_discrepancy (min fill); [4 not available] -O n - logOPT options (use '-h' to get full description) -p n - post-processing level [0..3] (default is 3) -o n - output level [0..3] (default is 0) -F n - output format [0=human 1=lisp 2=graph 3=all] (default is 3) -x - first translate axiom-symbol index to graph -T - Check the resulting tree for the running intersection prop. -h - Print this help together with further features. - the minimum size for a partition - the maximum size for a link -O n - logOPT options (add i+j to get both) [1='split-edges' (default=vertices) 2='balance-all' (default=balance separator) 4='2/3-vtx-sep' (default=1/2) 8='2-way' (default=3-way) (overall default=0)] usage: sp -1 [-a n] [-o n] [-x] -f [-u [-v ]] Find a minimum vertex separator. -a n - algorithm to run [0=(u,v)-separator,1=hMETIS-balanced] (default is 1) -n n - number of parts [only for hMETIS-balanced] (default is 2) -b n - imbalance factor in [1..49] [only for hMETIS-balanced] (default is 5) -o n - output level [0..3] (default is 1) -x - first translate axiom-symbol index to graph - first vertex - second vertex PARTITION AXIOMS BY SYMBOL-PARTITIONING pa [] The axioms are the original CNF of the SAT problem. The partitioning is the output of sp. A tree of partitions and their symbols. The optional output level specifies information about processing THE ENTIRE PIPELINE FOR DIMACS SAT PROBLEMS dimacs_split Takes as input a cnf problem, an "sp" algorithm number and options for "sp". FIND TREEWIDTH find_k Takes as input a partitioning generated by "sp" or "otp". PARTITION TO ORDER pto Takes as input a partitioning generated by "sp" or "otp", as well as the index of a partition node in the tree of partitions. ORDER TO PARTITION otp [<-x>] The input is a graph of symbols or an axiom-symbol index. If an index is provided the program must be run with the "-x" flag. The is an order generated by "pto" above. =========================================================================== E. COMPILATION & SETUP ====================== There are several compiled versions available on the website http://www-formal.stanford.edu/eyal/ SUN/Solaris 2.6 : GCC version 2.95.3 make version 3.79.1 SUN/Solaris 2.8 : GCC version 2.95.3 make version 3.79.1 i386/Linux-RedHat 7.2 : GCC version 2.95.3; GNU make version 3.79.1 "make all" compiles all the objects and executables. "make sp" compiles only the sp program "make external" compiles the rest of the programs. The objects from a compile of sp must already exist in order to make external. The directory paths of the Makefile are declared at the top of the file. Please modify these accordingly. Also, change the file SOURCE_ME to point to the correct installation directory of this package. You may want to add that line to your .cshrc file to avoid needing to run "source SOURCE_ME" for every new shell.