TL;DR: In this article, the effective graph resistance is derived from the field of electric circuit analysis where it is defined as the accumulated effective resistance between all pairs of vertices, and the derivation of new expressions is based on the analysis of the associated random walk on the graph and applies tools from Markov chain theory.
TL;DR: An open-source toolbox for drawing large-scale undirected graphs based on a previously implemented closed-source algorithm known as VxOrd, which is extended by incorporating edge-cutting, a multi-level approach, average-link clustering, and a parallel implementation.
Abstract: We document an open-source toolbox for drawing large-scale undirected graphs. This toolbox is based on a previously
implemented closed-source algorithm known as VxOrd. Our toolbox, which we call OpenOrd, extends the capabilities of
VxOrd to large graph layout by incorporating edge-cutting, a multi-level approach, average-link clustering, and a parallel
implementation. At each level, vertices are grouped using force-directed layout and average-link clustering. The clustered
vertices are then re-drawn and the process is repeated. When a suitable drawing of the coarsened graph is obtained, the
algorithm is reversed to obtain a drawing of the original graph. This approach results in layouts of large graphs which
incorporate both local and global structure. A detailed description of the algorithm is provided in this paper. Examples
using datasets with over 600K nodes are given. Code is available at www.cs.sandia.gov/~smartin.
TL;DR: This paper proposes the first external-memory algorithm for core decomposition in massive graphs and demonstrates the efficiency of the algorithm on real networks with up to 52.9 million vertices and 1.65 billion edges.
Abstract: The k-core of a graph is the largest subgraph in which every vertex is connected to at least k other vertices within the subgraph. Core decomposition finds the k-core of the graph for every possible k. Past studies have shown important applications of core decomposition such as in the study of the properties of large networks (e.g., sustainability, connectivity, centrality, etc.), for solving NP-hard problems efficiently in real networks (e.g., maximum clique finding, densest subgraph approximation, etc.), and for large-scale network fingerprinting and visualization. The k-core is a well accepted concept partly because there exists a simple and efficient algorithm for core decomposition, by recursively removing the lowest degree vertices and their incident edges. However, this algorithm requires random access to the graph and hence assumes the entire graph can be kept in main memory. Nevertheless, real-world networks such as online social networks have become exceedingly large in recent years and still keep growing at a steady rate. In this paper, we propose the first external-memory algorithm for core decomposition in massive graphs. When the memory is large enough to hold the graph, our algorithm achieves comparable performance as the in-memory algorithm. When the graph is too large to be kept in the memory, our algorithm requires only O(k max ) scans of the graph, where k max is the largest core number of the graph. We demonstrate the efficiency of our algorithm on real networks with up to 52.9 million vertices and 1.65 billion edges.
TL;DR: It is believed that the algorithm rejects any graph that is e-far from having second eigenvalue at most λα/O(1), and proves the validity of this belief under an appealing combinatorial conjecture.
Abstract: We consider testing graph expansion in the bounded-degree graph model. Specifically, we refer to algorithms for testing whether the graph has a second eigenvalue bounded above by a given threshold or is far from any graph with such (or related) property.
We present a natural algorithm aimed towards achieving the foregoing task. The algorithm is given a (normalized) eigenvalue bound λ 0. The algorithm runs in time N0.5+α/poly(e), and accepts any graph having (normalized) second eigenvalue at most λ. We believe that the algorithm rejects any graph that is e-far from having second eigenvalue at most λα/O(1), and prove the validity of this belief under an appealing combinatorial conjecture.
TL;DR: In this paper, the authors give a new proof which avoids Szemer edi's regularity lemma and gives a better bound for the directed and multicolored analogues of the graph removal lemma.
Abstract: Let H be a xed graph with h vertices. The graph removal lemma states that every graph on n vertices with o(n h ) copies of H can be made H-free by removing o(n 2 ) edges. We give a new proof which avoids Szemer edi’s regularity lemma and gives a better bound. This approach also works to give improved bounds for the directed and multicolored analogues of the graph removal lemma. This answers questions of Alon and Gowers.
TL;DR: This talk surveys recent progress on the design of provably fast algorithms for solving linear equations in the Laplacian matrices of graphs.
Abstract: The Laplacian matrices of graphs are fundamental. In addition to facilitating the application of linear algebra to graph theory, they arise in many practical problems. In this talk we survey recent progress on the design of provably fast algorithms for solving linear equations in the Laplacian matrices of graphs. These algorithms motivate and rely upon fascinating primitives in graph theory, including low-stretch spanning trees, graph sparsifiers, ultra-sparsifiers, and local graph clustering. These are all connected by a definition of what it means for one graph to approximate another. While this definition is dictated by Numerical Linear Algebra, it proves useful and natural from a graph theoretic perspective. Mathematics Subject Classification (2010). Primary 68Q25; Secondary 65F08.
TL;DR: This work provides the first formal algorithmic study of the optimization of human computation for graph search by asking an omniscient human questions of the form "Is there a target node that is reachable from the current node?".
Abstract: We consider the problem of human-assisted graph search: given a directed acyclic graph with some (unknown) target node(s), we consider the problem of finding the target node(s) by asking an omniscient human questions of the form "Is there a target node that is reachable from the current node?". This general problem has applications in many domains that can utilize human intelligence, including curation of hierarchies, debugging workflows, image segmentation and categorization, interactive search and filter synthesis. To our knowledge, this work provides the first formal algorithmic study of the optimization of human computation for this problem. We study various dimensions of the problem space, providing algorithms and complexity results. We also compare the performance of our algorithm against other algorithms, for the problem of webpage categorization on a real taxonomy. Our framework and algorithms can be used in the design of an optimizer for crowd-sourcing platforms such as Mechanical Turk.
TL;DR: This paper proposes a novel approach for the efficient computation of graph edit distance based on bipartite graph matching by means of the Volgenant-Jonker assignment algorithm, which provides only suboptimal edit distances, but runs in polynomial time.
Abstract: In the field of structural pattern recognition graphs constitute a very common and powerful way of representing objects. The main drawback of graph representations is that the computation of various graph similarity measures is exponential in the number of involved nodes. Hence, such computations are feasible for rather small graphs only. One of the most flexible graph similarity measures is graph edit distance. In this paper we propose a novel approach for the efficient computation of graph edit distance based on bipartite graph matching by means of the Volgenant-Jonker assignment algorithm. Our proposed algorithm provides only suboptimal edit distances, but runs in polynomial time. The reason for its sub-optimality is that edge information is taken into account only in a limited fashion during the process of finding the optimal node assignment between two graphs. In experiments on diverse graph representations we demonstrate a high speed up of our proposed method over a traditional algorithm for graph edit distance computation and over two other sub-optimal approaches that use the Hungarian and Munkres algorithm. Also, we show that classification accuracy remains nearly unaffected by the suboptimal nature of the algorithm.
TL;DR: It is shown that, using O(k log(n)) path measurements, it is able to recover any k-sparse link vector (with no more than k nonzero elements), even though the measurements have to follow the graph path constraints.
Abstract: In this paper, motivated by network inference and tomography applications, we study the problem of compressive sensing for sparse signal vectors over graphs. In particular, we are interested in recovering sparse vectors representing the properties of the edges from a graph. Unlike existing compressive sensing results, the collective additive measurements we are allowed to take must follow connected paths over the underlying graph. For a sufficiently connected graph with n nodes, it is shown that, using O(k log(n)) path measurements, we are able to recover any k-sparse link vector (with no more than k nonzero elements), even though the measurements have to follow the graph path constraints. We mainly show that the computationally efficient l 1 minimization can provide theoretical guarantees for inferring such k-sparse vectors with O(k log(n)) path measurements from the graph.
TL;DR: This paper defines a novel D-core framework, extending the classic graph-theoretic notion of k-cores for undirected graphs to directed ones and devise a wealth of novel metrics used to evaluate graph collaboration features of directed graphs.
Abstract: Community detection and evaluation is an important task in graph mining. In many cases, a community is defined as a sub graph characterized by dense connections or interactions among its nodes. A large variety of measures have been proposed to evaluate the quality of such communities â" in most cases ignoring the directed nature of edges. In this paper, we introduce novel metrics for evaluating the collaborative nature of directed graphs â" a property not captured by the single node metrics or by other established community evaluation metrics. In order to accomplish this objective, we capitalize on the concept of graph degeneracy and define a novel D-core framework, extending the classic graph-theoretic notion of k-cores for undirected graphs to directed ones. Based on the D-core, which essentially can be seen as a measure of the robustness of a community under degeneracy, we devise a wealth of novel metrics used to evaluate graph collaboration features of directed graphs. We applied the D-core approach on large real-world graphs such as Wikipedia and DBLP and report interesting results at the graph as well at node level.
TL;DR: This paper considers the problem of answering threshold-based probabilistic queries over a large uncertain graph database with the possible world semantics and adopts a filtering-and-verification strategy to speed up the search.
Abstract: Retrieving graphs containing a query graph from a large graph database is a key task in many graph-based applications, including chemical compounds discovery, protein complex prediction, and structural pattern recognition. However, graph data handled by these applications is often noisy, incomplete, and inaccurate because of the way the data is produced. In this paper, we study subgraph queries over uncertain graphs. Specifically, we consider the problem of answering threshold-based probabilistic queries over a large uncertain graph database with the possible world semantics. We prove that problem is #P-complete, therefore, we adopt a filtering-and-verification strategy to speed up the search. In the filtering phase, we use a probabilistic inverted index, PIndex, based on subgraph features obtained by an optimal feature selection process. During the verification phase, we develop exact and bound algorithms to validate the remaining candidates. Extensive experimental results demonstrate the effectiveness of the proposed algorithms.
TL;DR: Achlioptas, D'Souza, and Spencer as discussed by the authors showed that there exists a product rule such that with high probability the order of the largest component "jumps" from $o(n)$ to at least $delta n$ in the process, a phenomenon known as "explosive percolation".
Abstract: It is widely believed that certain simple modifications of the random graph process lead to discontinuous phase transitions. In particular, starting with the empty graph on $n$ vertices, suppose that at each step two pairs of vertices are chosen uniformly at random, but only one pair is joined, namely, one minimizing the product of the sizes of the components to be joined. Making explicit an earlier belief of Achlioptas and others, in 2009, Achlioptas, D'Souza and Spencer [Science 323 (2009) 1453-1455] conjectured that there exists a $\delta>0$ (in fact, $\delta\ge1/2$) such that with high probability the order of the largest component "jumps" from $o(n)$ to at least $\delta n$ in $o(n)$ steps of the process, a phenomenon known as "explosive percolation." We give a simple proof that this is not the case. Our result applies to all "Achlioptas processes," and more generally to any process where a fixed number of independent random vertices are chosen at each step, and (at least) one edge between these vertices is added to the current graph, according to any (online) rule. We also prove the existence and continuity of the limit of the rescaled size of the giant component in a class of such processes, settling a number of conjectures. Intriguing questions remain, however, especially for the product rule described above.
TL;DR: 3D-ASAP successfully incorporates information specific to the molecule problem in structural biology, in particular information on known substructures and their orientation, and compare favorably with similar state-of-the-art localization algorithms.
Abstract: The graph realization problem has received a great deal of attention in recent years, due to its importance in applications such as wireless sensor networks and structural biology. In this paper, we extend on previous work and propose the 3D-ASAP algorithm, for the graph realization problem in $\mathbb{R}^3$, given a sparse and noisy set of distance measurements. 3D-ASAP is a divide and conquer, non-incremental and non-iterative algorithm, which integrates local distance information into a global structure determination. Our approach starts with identifying, for every node, a subgraph of its 1-hop neighborhood graph, which can be accurately embedded in its own coordinate system. In the noise-free case, the computed coordinates of the sensors in each patch must agree with their global positioning up to some unknown rigid motion, that is, up to translation, rotation and possibly reflection. In other words, to every patch there corresponds an element of the Euclidean group Euc(3) of rigid transformations in $\mathbb{R}^3$, and the goal is to estimate the group elements that will properly align all the patches in a globally consistent way. Furthermore, 3D-ASAP successfully incorporates information specific to the molecule problem in structural biology, in particular information on known substructures and their orientation. In addition, we also propose 3D-SP-ASAP, a faster version of 3D-ASAP, which uses a spectral partitioning algorithm as a preprocessing step for dividing the initial graph into smaller subgraphs. Our extensive numerical simulations show that 3D-ASAP and 3D-SP-ASAP are very robust to high levels of noise in the measured distances and to sparse connectivity in the measurement graph, and compare favorably to similar state-of-the art localization algorithms.
TL;DR: This paper introduces a new automata model for query answering with two modes of acceptance: one captures queries returning nodes, and the other queries returning paths, and introduces additional restrictions for tractability, and shows that some intractable cases can be naturally cast as instances of constraint satisfaction problem.
Abstract: Graph data appears in a variety of application domains, and many uses of it, such as querying, matching, and transforming data, naturally result in incompletely specified graph data, i.e., graph patterns. While queries need to be posed against such data, techniques for querying patterns are generally lacking, and properties of such queries are not well understood.Our goal is to study the basics of querying graph patterns. We first identify key features of patterns, such as node and label variables and edges specified by regular expressions, and define a classification of patterns based on them. We then study standard graph queries on graph patterns, and give precise characterizations of both data and combined complexity for each class of patterns. If complexity is high, we do further analysis of features that lead to intractability, as well as lower complexity restrictions. We introduce a new automata model for query answering with two modes of acceptance: one captures queries returning nodes, and the other queries returning paths. We study properties of such automata, and the key computational tasks associated with them. Finally, we provide additional restrictions for tractability, and show that some intractable cases can be naturally cast as instances of constraint satisfaction problem.
TL;DR: A skeletal representation is introduced that generalizes the definition of the Reeb graph to arbitrary point clouds sampled from m-dimensional manifolds embedded in the d-dimensional space and yields to an effective abstraction of the data.
Abstract: This paper introduces a skeletal representation, called Point Cloud Graph, that generalizes the definition of the Reeb graph to arbitrary point clouds sampled from m-dimensional manifolds embedded in the d-dimensional space. The proposed algorithm is easy to implement and the graph representation yields to an effective abstraction of the data. Finally, we present experimental results on point-sampled surfaces and volumetric data that show the robustness of the Point Cloud Graph to non-uniform point distributions and its usefulness for shape comparison.
TL;DR: This paper proposes an integrated approach to concurrently select the discriminative features and the negative graphs in an iterative manner and derives an evaluation criterion to estimate the dependency between sub graph features and class labels based on a set of estimated negative graphs.
Abstract: The problem of graph classification has drawn much attention in the last decade. Conventional approaches on graph classification focus on mining discriminative sub graph features under supervised settings. The feature selection strategies strictly follow the assumption that both positive and negative graphs exist. However, in many real-world applications, the negative graph examples are not available. In this paper we study the problem of how to select useful sub graph features and perform graph classification based upon only positive and unlabeled graphs. This problem is challenging and different from previous works on PU learning, because there are no predefined features in graph data. Moreover, the sub graph enumeration problem is NP-hard. We need to identify a subset of unlabeled graphs that are most likely to be negative graphs. However, the negative graph selection problem and the sub graph feature selection problem are correlated. Before the reliable negative graphs can be resolved, we need to have a set of useful sub graph features. In order to address this problem, we first derive an evaluation criterion to estimate the dependency between sub graph features and class labels based on a set of estimated negative graphs. In order to build accurate models for the PU learning problem on graph data, we propose an integrated approach to concurrently select the discriminative features and the negative graphs in an iterative manner. Experimental results illustrate the effectiveness and efficiency of the proposed method.
TL;DR: A closure concept that turns a claw-free graph into the line graph of a multigraph while preserving its (non-)Hamilton-connectedness is introduced, and it is proved that the closure operation is, in a sense, best possible.
TL;DR: This work presents a framework for object representation based on fuzzy segmented graphs for general graphs and demonstrates improved precision of area measurements of synthetic two-dimensional objects.
TL;DR: In this paper, a system gathers information on important and influential people and builds a social graph, which can be processed to determine the influence of a node in the graph or a subsection of the graph.
Abstract: A system gathers information on important and influential people and builds a social graph. The social graph can be processed to determine the influence of a node in the graph or a subsection of the graph. For the influence in a subsection of the graph, only nodes with a specific type of relationship or concept is included in the influence calculation. For example, for the concept art, only relationship that have to do with art are included in the influence calculation (e.g., museum, artists, musician). In an implementation, the edge-weight of edges of the system are dependent on a property of the edge. For example, the edge-weight for an edge tracking donations is stronger if the amount of money donated is higher.
TL;DR: This paper proposes a new approach to decompose a graph into a series of spanning trees which may share common edges, to transform a reachability query over aGraph into a set of queries over trees, and demonstrates the efficiency and effectiveness of this method.
Abstract: Let G(V, E) be a digraph (directed graph) with n nodes and e edges. Digraph G* = (V, E*) is the reflexive, transitive closure if (v, u) ∈ E* iff there is a path from v to u in G. Efficient storage of G* is important for supporting reachability queries which are not only common on graph databases, but also serve as fundamental operations used in many graph algorithms. A lot of strategies have been suggested based on the graph labeling, by which each node is assigned with certain labels such that the reachability of any two nodes through a path can be determined by their labels. Among them are interval labelling, chain decomposition, and 2-hop labeling. However, due to the very large size of many real world graphs, the computational cost and size of labels using existing methods would prove too expensive to be practical. In this paper, we propose a new approach to decompose a graph into a series of spanning trees which may share common edges, to transform a reachability query over a graph into a set of queries over trees. We demonstrate both analytically and empirically the efficiency and effectiveness of our method.
TL;DR: In this article, the authors show how to prepare any graph state of up to 12 qubits with (a) the minimum number of controlled-$Z$ gates and (b) minimum preparation depth.
Abstract: We show how to prepare any graph state of up to 12 qubits with (a) the minimum number of controlled-$Z$ gates and (b) the minimum preparation depth. We assume only one-qubit and controlled-$Z$ gates. The method exploits the fact that any graph state belongs to an equivalence class under local Clifford operations. We extend up to 12 qubits the classification of graph states according to their entanglement properties, and identify each class using only a reduced set of invariants. For any state, we provide a circuit with both properties (a) and (b), if it does exist, or, if it does not, one circuit with property (a) and one with property (b), including the explicit one-qubit gates needed.
TL;DR: In this article, the Cayley graph automatic groups (CGA groups) were introduced, which generalizes the standard notion of an automatic group and are invariant under the change of generators, closed under direct and free products, certain types of amalgamated products, and finite extensions.
Abstract: In this paper we introduce the concept of a Cayley graph automatic group (CGA group or graph automatic group, for short) which generalizes the standard notion of an automatic group. Like the usual automatic groups graph automatic ones enjoy many nice properties: these group are invariant under the change of generators, they are closed under direct and free products, certain types of amalgamated products, and finite extensions. Furthermore, the Word Problem in graph automatic groups is decidable in quadratic time. However, the class of graph automatic groups is much wider then the class of automatic groups. For example, we prove that all finitely generated 2-nilpotent groups and Baumslag-Solitar groups B(1,n) are graph automatic, as well as many other metabelian groups.
TL;DR: PrEd, a force‐directed algorithm that improves the existing layout of a graph while preserving its edge crossing properties, has a number of applications including: improving the layouts of planar graph drawing algorithms, interacting with a graph layout, and drawing Euler‐like diagrams.
Abstract: PrEd [Ber00] is a force-directed algorithm that improves the existing layout of a graph while preserving its edge crossing properties. The algorithm has a number of applications including: improving the layouts of planar graph drawing algorithms, interacting with a graph layout, and drawing Euler-like diagrams. The algorithm ensures that nodes do not cross edges during its execution. However, PrEd can be computationally expensive and overlyrestrictive in terms of node movement.
In this paper, we introduce ImPrEd: an improved version of PrEd that overcomes some of its limitations and widens its range of applicability. ImPrEd also adds features such as flexible or crossable edges, allowing for greater control over the output. Flexible edges, in particular, can improve the distribution of graph elements and the angular resolution of the input graph. They can also be used to generate Euler diagrams with smooth boundaries. As flexible edges increase data set size, we experience an execution/drawing quality trade off. However, when flexible edges are not used, ImPrEd proves to be consistently faster than PrEd.
TL;DR: This paper constructs a gate graph from a large graph so that for any non-local vertex pair in the original graph, their shortest-path distance can be recovered by consecutive “local†walks through the gate vertices in the gate graph.
Abstract: Large graphs are difficult to represent, visualize, and understand. In this paper, we introduce âgate graphâ - a new approach to perform graph simplification. A gate graph provides a simplified topological view of the original graph. Specifically, we construct a gate graph from a large graph so that for any ânon-localâ vertex pair (distance greater than some threshold) in the original graph, their shortest-path distance can be recovered by consecutive âlocalâ walks through the gate vertices in the gate graph. We perform a theoretical investigation on the gate-vertex set discovery problem. We characterize its computational complexity and reveal the upper bound of minimum gate vertex set using VC-dimension theory. We propose an efficient mining algorithm to discover a gate-vertex set with guaranteed logarithmic bound. The detailed experimental results using both real and synthetic graphs demonstrate the effectiveness and efficiency of our approach.
TL;DR: This paper proposes a novel structure-aware and attribute-aware index to process approximate graph matching in a large attributed graph and builds an index on the similarity of the attributed graph by partitioning the large search space into smaller subgraphs based on structure similarity and attribute similarity.
TL;DR: A graph pattern matching problem, which is to find all patterns in a large data graph that match a user-given graph pattern, is studied, and new two-step R-join (reachability join) algorithms with a filter step (R-semijoin) and a fetch step(R-join) by utilizing a new cluster-based join index with graph codes in a relational database context are proposed.
Abstract: Due to rapid growth of the Internet and new scientific/technological advances, there exist many new applications that model data as graphs, because graphs have sufficient expressiveness to model complicated structures. The dominance of graphs in real-world applications demands new graph processing techniques to access large data graphs effectively and efficiently. In this paper, we study a graph pattern matching problem, which is to find all patterns in a large data graph that match a user-given graph pattern. We propose new two-step R-join (reachability join) algorithms with a filter step (R-semijoin) and a fetch step (R-join) by utilizing a new cluster-based join index with graph codes in a relational database context. We also propose two optimization approaches to further optimize sequences of R-joins/R-semijoins. The first approach is based on R-join order selection followed by R-semijoin enhancement, and the second approach is to interleave R-joins with R-semijoins. We conducted extensive performance studies, and confirm the efficiency of our proposed new approaches.
TL;DR: This paper defines two novel assembly models, namely the accretive graph assembly model and the self-destructible graph Assembly model, and identifies a fundamental problem in them: the sequential construction of a given graph.
TL;DR: In this paper, the authors present a method for static analysis of source code by constructing a control flow graph (CFG) corresponding to the source code, by identifying control structures within the source codes, creating a set of graph nodes of the CFG, and assigning a first Boolean flow value to a selected node of the set of node nodes; backward traversing the flow graph from the selected node to a target node; computing, by a computer processor and while backward traversed the flow node, disjoint predicate expressions representing flow values at the directed graph edges, a resulting
Abstract: In general, in one aspect, the invention relates to a method for static analysis. The method includes: obtaining source code; constructing a control flow graph (CFG) corresponding to the source code, by identifying control structures within the source code, creating a set of graph nodes of the CFG, and creating a set of directed graph edges of the CFG connecting the set of graph nodes; assigning a first Boolean flow value to a selected node of the set of graph nodes; backward traversing the CFG from the selected node to a target node; computing, by a computer processor and while backward traversing the CFG, disjoint predicate expressions representing flow values at the set of directed graph edges; computing, based on the disjoint predicate expressions, a resulting disjoint predicate expression; and identifying, based on the resulting disjoint predicate expression, a potential program property in the source code.