TL;DR: A new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences is considered and the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank.
Abstract: We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents.
TL;DR: In GNMF, an affinity graph is constructed to encode the geometrical information and a matrix factorization is sought, which respects the graph structure, and the empirical study shows encouraging results of the proposed algorithm in comparison to the state-of-the-art algorithms on real-world problems.
Abstract: Matrix factorization techniques have been frequently applied in information retrieval, computer vision, and pattern recognition. Among them, Nonnegative Matrix Factorization (NMF) has received considerable attention due to its psychological and physiological interpretation of naturally occurring data whose representation may be parts based in the human brain. On the other hand, from the geometric perspective, the data is usually sampled from a low-dimensional manifold embedded in a high-dimensional ambient space. One then hopes to find a compact representation,which uncovers the hidden semantics and simultaneously respects the intrinsic geometric structure. In this paper, we propose a novel algorithm, called Graph Regularized Nonnegative Matrix Factorization (GNMF), for this purpose. In GNMF, an affinity graph is constructed to encode the geometrical information and we seek a matrix factorization, which respects the graph structure. Our empirical study shows encouraging results of the proposed algorithm in comparison to the state-of-the-art algorithms on real-world problems.
TL;DR: G2o, an open-source C++ framework for optimizing graph-based nonlinear error functions, is presented and demonstrated that while being general g2o offers a performance comparable to implementations of state-of-the-art approaches for the specific problems.
Abstract: Many popular problems in robotics and computer vision including various types of simultaneous localization and mapping (SLAM) or bundle adjustment (BA) can be phrased as least squares optimization of an error function that can be represented by a graph. This paper describes the general structure of such problems and presents g2o, an open-source C++ framework for optimizing graph-based nonlinear error functions. Our system has been designed to be easily extensible to a wide range of problems and a new problem typically can be specified in a few lines of code. The current implementation provides solutions to several variants of SLAM and BA. We provide evaluations on a wide range of real-world and simulated datasets. The results demonstrate that while being general g2o offers a performance comparable to implementations of state-of-the-art approaches for the specific problems.
TL;DR: In this article, a new ACO model that overcomes the difficulties found when working with a huge construction graph is presented. But it is not suitable when the graph size can be a challenge for the computer memory and cannot be completely generated or stored in it.
Abstract: Ant Colony Optimization (ACO) has been successfully applied to those combinatorial optimization problems which can be translated into a graph exploration. Artificial ants build solutions step by step adding solution components that are represented by graph nodes. The existing ACO algorithms are suitable when the graph is not very large (thousands of nodes) but is not useful when the graph size can be a challenge for the computer memory and cannot be completely generated or stored in it. In this paper we study a new ACO model that overcomes the difficulties found when working with a huge construction graph. In addition to the description of the model, we analyze in the experimental section one technique used for dealing with this huge graph exploration. The results of the analysis can help to understand the meaning of the new parameters introduced and to decide which parameterization is more suitable for a given problem. For the experiments we use one real problem with capital importance in Software Engineering: refutation of safety properties in concurrent systems. This way, we foster an innovative research line related to the application of ACO to formal methods in Software Engineering.
TL;DR: A robust method for collective disambiguation is presented, by harnessing context from knowledge bases and using a new form of coherence graph that significantly outperforms prior methods in terms of accuracy, with robust behavior across a variety of inputs.
Abstract: Disambiguating named entities in natural-language text maps mentions of ambiguous names onto canonical entities like people or places, registered in a knowledge base such as DBpedia or YAGO. This paper presents a robust method for collective disambiguation, by harnessing context from knowledge bases and using a new form of coherence graph. It unifies prior approaches into a comprehensive framework that combines three measures: the prior probability of an entity being mentioned, the similarity between the contexts of a mention and a candidate entity, as well as the coherence among candidate entities for all mentions together. The method builds a weighted graph of mentions and candidate entities, and computes a dense subgraph that approximates the best joint mention-entity mapping. Experiments show that the new method significantly outperforms prior methods in terms of accuracy, with robust behavior across a variety of inputs.
TL;DR: A strong effect of age on friendship preferences as well as a globally modular community structure driven by nationality are observed, but it is shown that while the Facebook graph as a whole is clearly sparse, the graph neighborhoods of users contain surprisingly dense structure.
Abstract: We study the structure of the social graph of active Facebook users, the largest social network ever analyzed. We compute numerous features of the graph including the number of users and friendships, the degree distribution, path lengths, clustering, and mixing patterns. Our results center around three main observations. First, we characterize the global structure of the graph, determining that the social network is nearly fully connected, with 99.91% of individuals belonging to a single large connected component, and we confirm the "six degrees of separation" phenomenon on a global scale. Second, by studying the average local clustering coefficient and degeneracy of graph neighborhoods, we show that while the Facebook graph as a whole is clearly sparse, the graph neighborhoods of users contain surprisingly dense structure. Third, we characterize the assortativity patterns present in the graph by studying the basic demographic and network properties of users. We observe clear degree assortativity and characterize the extent to which "your friends have more friends than you". Furthermore, we observe a strong effect of age on friendship preferences as well as a globally modular community structure driven by nationality, but we do not find any strong gender homophily. We compare our results with those from smaller social networks and find mostly, but not entirely, agreement on common structural network characteristics.
TL;DR: N-Descent is presented, a simple yet efficient algorithm for approximate K-NNG construction with arbitrary similarity measures that typically converges to above 90% recall with each point comparing only to several percent of the whole dataset on average.
Abstract: K-Nearest Neighbor Graph (K-NNG) construction is an important operation with many web related applications, including collaborative filtering, similarity search, and many others in data mining and machine learning. Existing methods for K-NNG construction either do not scale, or are specific to certain similarity measures. We present NN-Descent, a simple yet efficient algorithm for approximate K-NNG construction with arbitrary similarity measures. Our method is based on local search, has minimal space overhead and does not rely on any shared global index. Hence, it is especially suitable for large-scale applications where data structures need to be distributed over the network. We have shown with a variety of datasets and similarity measures that the proposed method typically converges to above 90% recall with each point comparing only to several percent of the whole dataset on average.
TL;DR: It is shown that a soft inference procedure based on a combination of constrained, weighted, random walks through the knowledge base graph can be used to reliably infer new beliefs for theknowledge base.
Abstract: We consider the problem of performing learning and inference in a large scale knowledge base containing imperfect knowledge with incomplete coverage. We show that a soft inference procedure based on a combination of constrained, weighted, random walks through the knowledge base graph can be used to reliably infer new beliefs for the knowledge base. More specifically, we show that the system can learn to infer different target relations by tuning the weights associated with random walks that follow different paths through the graph, using a version of the Path Ranking Algorithm (Lao and Cohen, 2010b). We apply this approach to a knowledge base of approximately 500,000 beliefs extracted imperfectly from the web by NELL, a never-ending language learner (Carlson et al., 2010). This new system improves significantly over NELL's earlier Horn-clause learning and inference method: it obtains nearly double the precision at rank 100, and the new learning method is also applicable to many more inference tasks.
TL;DR: A graph based algorithm, called graph regularized sparse coding, is proposed, to learn the sparse representations that explicitly take into account the local manifold structure of the data.
Abstract: Sparse coding has received an increasing amount of interest in recent years. It is an unsupervised learning algorithm, which finds a basis set capturing high-level semantics in the data and learns sparse coordinates in terms of the basis set. Originally applied to modeling the human visual cortex, sparse coding has been shown useful for many applications. However, most of the existing approaches to sparse coding fail to consider the geometrical structure of the data space. In many real applications, the data is more likely to reside on a low-dimensional submanifold embedded in the high-dimensional ambient space. It has been shown that the geometrical information of the data is important for discrimination. In this paper, we propose a graph based algorithm, called graph regularized sparse coding, to learn the sparse representations that explicitly take into account the local manifold structure of the data. By using graph Laplacian as a smooth operator, the obtained sparse representations vary smoothly along the geodesics of the data manifold. The extensive experimental results on image classification and clustering have demonstrated the effectiveness of our proposed algorithm.
TL;DR: In this article, a necessary and sufficient condition for consensusability under a common control protocol is given, which explicitly reveals how the intrinsic entropy rate of the agent dynamic and the communication graph jointly affect consensusability.
Abstract: This paper investigates the joint effect of agent dynamic, network topology and communication data rate on consensusability of linear discrete-time multi-agent systems. Neglecting the finite communication data rate constraint and under undirected graphs, a necessary and sufficient condition for consensusability under a common control protocol is given, which explicitly reveals how the intrinsic entropy rate of the agent dynamic and the communication graph jointly affect consensusability. The result is established by solving a discrete-time simultaneous stabilization problem. A lower bound of the optimal convergence rate to consensus, which is shown to be tight for some special cases, is provided as well. Moreover, a necessary and sufficient condition for formationability of multi-agent systems is obtained. As a special case, the discrete-time second-order consensus is discussed where an optimal control gain is designed to achieve the fastest convergence. The effects of undirected graphs on consensability/formationability and optimal convergence rate are exactly quantified by the ratio of the second smallest to the largest eigenvalues of the graph Laplacian matrix. An extension to directed graphs is also made. The consensus problem under a finite communication data rate is finally investigated.
TL;DR: This State‐of‐the‐Art Report surveys available techniques for the visual analysis of large graphs and discusses various graph algorithmic aspects useful for the different stages of the visual graph analysis process.
Abstract: The analysis of large graphs plays a prominent role in various fields of research and is relevant in many important application areas. Effective visual analysis of graphs requires appropriate visual presentations in combination with respective user interaction facilities and algorithmic graph analysis methods. How to design appropriate graph analysis systems depends on many factors, including the type of graph describing the data, the analytical task at hand and the applicability of graph analysis methods. The most recent surveys of graph visualization and navigation techniques cover techniques that had been introduced until 2000 or concentrate only on graph layouts published until 2002. Recently, new techniques have been developed covering a broader range of graph types, such as timevarying graphs. Also, in accordance with ever growing amounts of graph-structured data becoming available, the inclusion of algorithmic graph analysis and interaction techniques becomes increasingly important. In this State-of-the-Art Report, we survey available techniques for the visual analysis of large graphs. Our review first considers graph visualization techniques according to the type of graphs supported. The visualization techniques form the basis for the presentation of interaction approaches suitable for visual graph exploration. As an important component of visual graph analysis, we discuss various graph algorithmic aspects useful for the different stages of the visual graph analysis process. We also present main open research challenges in this field.
TL;DR: The algorithm incrementally constructs a graph of trajectories through state space, while efficiently searching over candidate paths through the graph at each iteration results in a search tree in belief space that provably converges to the optimal path.
Abstract: In this paper we address the problem of motion planning in the presence of state uncertainty, also known as planning in belief space. The work is motivated by planning domains involving nontrivial dynamics, spatially varying measurement properties, and obstacle constraints. To make the problem tractable, we restrict the motion plan to a nominal trajectory stabilized with a linear estimator and controller. This allows us to predict distributions over future states given a candidate nominal trajectory. Using these distributions to ensure a bounded probability of collision, the algorithm incrementally constructs a graph of trajectories through state space, while efficiently searching over candidate paths through the graph at each iteration. This process results in a search tree in belief space that provably converges to the optimal path. We analyze the algorithm theoretically and also provide simulation results demonstrating its utility for balancing information gathering to reduce uncertainty and finding low cost paths.
TL;DR: Experimental results show that the proposed graph-based collective EL method can achieve significant performance improvement over the traditional EL methods, and the purely collective nature of the inference algorithm, in which evidence for related EL decisions can be reinforced into high-probability decisions.
Abstract: Entity Linking (EL) is the task of linking name mentions in Web text with their referent entities in a knowledge base. Traditional EL methods usually link name mentions in a document by assuming them to be independent. However, there is often additional interdependence between different EL decisions, i.e., the entities in the same document should be semantically related to each other. In these cases, Collective Entity Linking, in which the name mentions in the same document are linked jointly by exploiting the interdependence between them, can improve the entity linking accuracy. This paper proposes a graph-based collective EL method, which can model and exploit the global interdependence between different EL decisions. Specifically, we first propose a graph-based representation, called Referent Graph, which can model the global interdependence between different EL decisions. Then we propose a collective inference algorithm, which can jointly infer the referent entities of all name mentions by exploiting the interdependence captured in Referent Graph. The key benefit of our method comes from: 1) The global interdependence model of EL decisions; 2) The purely collective nature of the inference algorithm, in which evidence for related EL decisions can be reinforced into high-probability decisions. Experimental results show that our method can achieve significant performance improvement over the traditional EL methods.
TL;DR: In this paper, a graph-theoretic definition of connectivity is provided, as well as an equivalent definition based on algebraic graph theory, which employs the adjacency and Laplacian matrices of the graph and their spectral properties.
Abstract: In this paper, we provide a theoretical framework for controlling graph connectivity in mobile robot networks. We discuss proximity-based communication models composed of disk-based or uniformly-fading-signal-strength communication links. A graph-theoretic definition of connectivity is provided, as well as an equivalent definition based on algebraic graph theory, which employs the adjacency and Laplacian matrices of the graph and their spectral properties. Based on these results, we discuss centralized and distributed algorithms to maintain, increase, and control connectivity in mobile robot networks. The various approaches discussed in this paper range from convex optimization and subgradient-descent algorithms, for the maximization of the algebraic connectivity of the network, to potential fields and hybrid systems that maintain communication links or control the network topology in a least restrictive manner. Common to these approaches is the use of mobility to control the topology of the underlying communication network. We discuss applications of connectivity control to multirobot rendezvous, flocking and formation control, where so far, network connectivity has been considered an assumption.
TL;DR: It is found that searching with jump points can speed up A* by an order of magnitude and more and report significant improvement over the current state of the art.
Abstract: Pathfinding in uniform-cost grid environments is a problem commonly found in application areas such as robotics and video games. The state-of-the-art is dominated by hierarchical pathfinding algorithms which are fast and have small memory overheads but usually return suboptimal paths. In this paper we present a novel search strategy, specific to grids, which is fast, optimal and requires no memory overhead. Our algorithm can be described as a macro operator which identifies and selectively expands only certain nodes in a grid map which we call jump points. Intermediate nodes on a path connecting two jump points are never expanded. We prove that this approach always computes optimal solutions and then undertake a thorough empirical analysis, comparing our method with related works from the literature. We find that searching with jump points can speed up A* by an order of magnitude and more and report significant improvement over the current state of the art.
TL;DR: An open-source toolbox for drawing large-scale undirected graphs based on a previously implemented closed-source algorithm known as VxOrd, which is extended by incorporating edge-cutting, a multi-level approach, average-link clustering, and a parallel implementation.
Abstract: We document an open-source toolbox for drawing large-scale undirected graphs. This toolbox is based on a previously
implemented closed-source algorithm known as VxOrd. Our toolbox, which we call OpenOrd, extends the capabilities of
VxOrd to large graph layout by incorporating edge-cutting, a multi-level approach, average-link clustering, and a parallel
implementation. At each level, vertices are grouped using force-directed layout and average-link clustering. The clustered
vertices are then re-drawn and the process is repeated. When a suitable drawing of the coarsened graph is obtained, the
algorithm is reversed to obtain a drawing of the original graph. This approach results in layouts of large graphs which
incorporate both local and global structure. A detailed description of the algorithm is provided in this paper. Examples
using datasets with over 600K nodes are given. Code is available at www.cs.sandia.gov/~smartin.
TL;DR: In this paper, the authors propose a statistical ranking method called HodgeRank for ranking data that may be incomplete and imbalanced, characteristics common in modern datasets coming from e-commerce and internet applications.
Abstract: We propose a technique that we call HodgeRank for ranking data that may be incomplete and imbalanced, characteristics common in modern datasets coming from e-commerce and internet applications. We are primarily interested in cardinal data based on scores or ratings though our methods also give specific insights on ordinal data. From raw ranking data, we construct pairwise rankings, represented as edge flows on an appropriate graph. Our statistical ranking method exploits the graph Helmholtzian, which is the graph theoretic analogue of the Helmholtz operator or vector Laplacian, in much the same way the graph Laplacian is an analogue of the Laplace operator or scalar Laplacian. We shall study the graph Helmholtzian using combinatorial Hodge theory, which provides a way to unravel ranking information from edge flows. In particular, we show that every edge flow representing pairwise ranking can be resolved into two orthogonal components, a gradient flow that represents the l 2-optimal global ranking and a divergence-free flow (cyclic) that measures the validity of the global ranking obtained—if this is large, then it indicates that the data does not have a good global ranking. This divergence-free flow can be further decomposed orthogonally into a curl flow (locally cyclic) and a harmonic flow (locally acyclic but globally cyclic); these provides information on whether inconsistency in the ranking data arises locally or globally. When applied to statistical ranking problems, Hodge decomposition sheds light on whether a given dataset may be globally ranked in a meaningful way or if the data is inherently inconsistent and thus could not have any reasonable global ranking; in the latter case it provides information on the nature of the inconsistencies. An obvious advantage over the NP-hardness of Kemeny optimization is that HodgeRank may be easily computed via a linear least squares regression. We also discuss connections with well-known ordinal ranking techniques such as Kemeny optimization and Borda count from social choice theory.
TL;DR: This paper presents a new, polynomial-time MWIS algorithm, and proves that it converges to an optimum, and demonstrates advantages of simultaneously accounting for soft and hard contextual constraints in multitarget tracking.
Abstract: This paper addresses the problem of simultaneous tracking of multiple targets in a video We first apply object detectors to every video frame Pairs of detection responses from every two consecutive frames are then used to build a graph of tracklets The graph helps transitively link the best matching tracklets that do not violate hard and soft contextual constraints between the resulting tracks We prove that this data association problem can be formulated as finding the maximum-weight independent set (MWIS) of the graph We present a new, polynomial-time MWIS algorithm, and prove that it converges to an optimum Similarity and contextual constraints between object detections, used for data association, are learned online from object appearance and motion properties Long-term occlusions are addressed by iteratively repeating MWIS to hierarchically merge smaller tracks into longer ones Our results demonstrate advantages of simultaneously accounting for soft and hard contextual constraints in multitarget tracking We outperform the state of the art on the benchmark datasets
TL;DR: The fastest known algorithm for computing approximately maximum s-t flows in a capacitated, undirected graph with n vertices and m edges takes O(mn 1/3e-11/3) time as mentioned in this paper.
Abstract: We introduce a new approach to computing an approximately maximum s-t flow in a capacitated, undirected graph. This flow is computed by solving a sequence of electrical flow problems. Each electrical flow is given by the solution of a system of linear equations in a Laplacian matrix, and thus may be approximately computed in nearly-linear time. Using this approach, we develop the fastest known algorithm for computing approximately maximum s-t flows. For a graph having n vertices and m edges, our algorithm computes a (1-e)-approximately maximum s-t flow in time ~O(mn1/3e-11/3). A dual version of our approach gives the fastest known algorithm for computing a (1+e)-approximately minimum s-t cut. It takes ~O(m+n4/3e-16/3) time. Previously, the best dependence on m and n was achieved by the algorithm of Goldberg and Rao (J. ACM 1998), which can be used to compute approximately maximum s-t flows in time ~O({m√ne-1), and approximately minimum s-t cuts in time ~O(m+n3/2e-3).
TL;DR: This paper introduces LEMON, a generic open source C++ library providing easy-to-use and efficient implementations of graph and network algorithms and related data structures and benchmarks show that it typically outperforms them in efficiency.
TL;DR: To fully appreciate the utility of network science, a greater understanding of how network models apply to the brain is needed, and an integrated appraisal of multiple network analyses should be performed to better understand network structure.
Abstract: Although graph theory has been around since the 18th century, the field of network science is more recent and continues to gain popularity, particularly in the field of neuroimaging. The field was propelled forward when Watts and Strogatz introduced their small-world network model, which described a network that provided regional specialization with efficient global information transfer. This model is appealing to the study of brain connectivity, as the brain can be viewed as a system with various interacting regions that produce complex behaviors. In practice, graph metrics such as clustering coefficient, path length, and efficiency measures are often used to characterize system properties. Centrality metrics such as degree, betweenness, closeness, and eigenvector centrality determine critical areas within the network. Community structure is also essential for understanding network organization and topology. Network science has led to a paradigm shift in the neuroscientific community, but it sho...
TL;DR: ReFeX (Recursive Feature eXtraction), a novel algorithm, that recursively combines local features with neighborhood features; and outputs regional features -- capturing "behavioral" information in large graphs, is proposed.
Abstract: Given a graph, how can we extract good features for the nodes? For example, given two large graphs from the same domain, how can we use information in one to do classification in the other (i.e., perform across-network classification or transfer learning on graphs)? Also, if one of the graphs is anonymized, how can we use information in one to de-anonymize the other? The key step in all such graph mining tasks is to find effective node features. We propose ReFeX (Recursive Feature eXtraction), a novel algorithm, that recursively combines local (node-based) features with neighborhood (egonet-based) features; and outputs regional features -- capturing "behavioral" information. We demonstrate how these powerful regional features can be used in within-network and across-network classification and de-anonymization tasks -- without relying on homophily, or the availability of class labels. The contributions of our work are as follows: (a) ReFeX is scalable and (b) it is effective, capturing regional ("behavioral") information in large graphs. We report experiments on real graphs from various domains with over 1M edges, where ReFeX outperforms its competitors on typical graph mining tasks like network classification and de-anonymization.
TL;DR: The results show that the GSNMF algorithm provides better facial representations and achieves higher recognition rates than nonnegative matrix factorization and is also more robust to partial occlusions than other tested methods.
Abstract: In this paper, a novel graph-preserving sparse nonnegative matrix factorization (GSNMF) algorithm is proposed for facial expression recognition. The GSNMF algorithm is derived from the original NMF algorithm by exploiting both sparse and graph-preserving properties. The latter may contain the class information of the samples. Therefore, GSNMF can be conducted as an unsupervised or a supervised dimension reduction method. A sparse representation of the facial images is obtained by minimizing the -norm of the basis images. Furthermore, according to the graph embedding theory, the neighborhood of the samples is preserved by retaining the graph structure in the mapped space. The GSNMF decomposition transforms the high-dimensional facial expression images into a locality-preserving subspace with sparse representation. To guarantee convergence, we use the projected gradient method to calculate the nonnegative solution of GSNMF. Experiments are conducted on the JAFFE database and the Cohn-Kanade database with unoccluded and partially occluded facial images. The results show that the GSNMF algorithm provides better facial representations and achieves higher recognition rates than nonnegative matrix factorization. Moreover, GSNMF is also more robust to partial occlusions than other tested methods.
TL;DR: iSAM2 is a fully incremental, graph-based version of incremental smoothing and mapping (iSAM), based on a novel graphical model-based interpretation of incremental sparse matrix factorization methods, afforded by the recently introduced Bayes tree data structure.
Abstract: We present iSAM2, a fully incremental, graph-based version of incremental smoothing and mapping (iSAM). iSAM2 is based on a novel graphical model-based interpretation of incremental sparse matrix factorization methods, afforded by the recently introduced Bayes tree data structure. The original iSAM algorithm incrementally maintains the square root information matrix by applying matrix factorization updates. We analyze the matrix updates as simple editing operations on the Bayes tree and the conditional densities represented by its cliques. Based on that insight, we present a new method to incrementally change the variable ordering which has a large effect on efficiency. The efficiency and accuracy of the new method is based on fluid relinearization, the concept of selectively relinearizing variables as needed. This allows us to obtain a fully incremental algorithm without any need for periodic batch steps. We analyze the properties of the resulting algorithm in detail, and show on various real and simulated datasets that the iSAM2 algorithm compares favorably with other recent mapping algorithms in both quality and efficiency.
TL;DR: In this article, the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms, is explored and two highly-tuned parallel approaches for BFS on large parallel systems: a level-synchronous strategy that relies on a simple vertex-based partitioning of the graph and a two-dimensional sparse matrix partitioning-based approach that mitigates parallel communication overhead.
Abstract: Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms. We present two highly-tuned parallel approaches for BFS on large parallel systems: a level-synchronous strategy that relies on a simple vertex-based partitioning of the graph, and a two-dimensional sparse matrix partitioning-based approach that mitigates parallel communication overhead. For both approaches, we also present hybrid versions with intra-node multithreading. Our novel hybrid two-dimensional algorithm reduces communication times by up to a factor of 3.5, relative to a common vertex based approach. Our experimental study identifies execution regimes in which these approaches will be competitive, and we demonstrate extremely high performance on leading distributed-memory parallel systems. For instance, for a 40,000-core parallel execution on Hopper, an AMD Magny-Cours based system, we achieve a BFS performance rate of 17.8 billion edge visits per second on an undirected graph of 4.3 billion vertices and 68.7 billion edges with skewed degree distribution.
TL;DR: An algorithm to compute shortest paths on continental road networks with arbitrary metrics (cost functions) that supports turn costs, enables real-time queries, and can incorporate a new metric in a few seconds--fast enough to support real- time traffic updates and personalized optimization functions.
Abstract: We present an algorithm to compute shortest paths on continental road networks with arbitrary metrics (cost functions). The approach supports turn costs, enables real-time queries, and can incorporate a new metric in a few seconds--fast enough to support real-time traffic updates and personalized optimization functions. The amount of metric-specific data is a small fraction of the graph itself, which allows us to maintain several metrics in memory simultaneously.
TL;DR: This paper utilizes Büchi automata to produce an automaton (which can be thought of as a graph) whose runs satisfy the temporal-logic specification, and presents a graph algorithm that computes a run corresponding to the optimal robot path.
Abstract: In this paper we present a method for automatically generating optimal robot paths satisfying high-level mission specifications. The motion of the robot in the environment is modeled as a weighted transition system. The mission is specified by an arbitrary linear temporal-logic (LTL) formula over propositions satisfied at the regions of a partitioned environment. The mission specification contains an optimizing proposition, which must be repeatedly satisfied. The cost function that we seek to minimize is the maximum time between satisfying instances of the optimizing proposition. For every environment model, and for every formula, our method computes a robot path that minimizes the cost function. The problem is motivated by applications in robotic monitoring and data-gathering. In this setting, the optimizing proposition is satisfied at all locations where data can be uploaded, and the LTL formula specifies a complex data-collection mission. Our method utilizes BA¼chi automata to produce an automaton (which can be thought of as a graph) whose runs satisfy the temporal-logic specification. We then present a graph algorithm that computes a run corresponding to the optimal robot path. We present an implementation for a robot performing data collection in a road-network platform.
TL;DR: In this article, the authors use a visual sensor and dead reckoning sensors to process simultaneous localization and mapping (SLAM) in robot navigation, which can be used to autonomously generate and update a map.
Abstract: The invention is related to methods and apparatus that use a visual sensor and dead reckoning sensors to process Simultaneous Localization and Mapping (SLAM). These techniques can be used in robot navigation. Advantageously, such visual techniques can be used to autonomously generate and update a map. Unlike with laser rangefinders, the visual techniques are economically practical in a wide range of applications and can be used in relatively dynamic environments, such as environments in which people move. Certain embodiments contemplate improvements to the front-end processing in a SLAM-based system. Particularly, certain of these embodiments contemplate a novel landmark matching process. Certain of these embodiments also contemplate a novel landmark creation process. Certain embodiments contemplate improvements to the back-end processing in a SLAM- based system. Particularly, certain of these embodiments contemplate algorithms for modifying the SLAM graph in real-time to achieve a more efficient structure.
TL;DR: Graph Cube is introduced, a new data warehousing model that supports OLAP queries effectively on large multidimensional networks and is shown to be a powerful and efficient tool for decision support on large multi-dimensional networks.
Abstract: We consider extending decision support facilities toward large sophisticated networks, upon which multidimensional attributes are associated with network entities, thereby forming the so-called multidimensional networks. Data warehouses and OLAP (Online Analytical Processing) technology have proven to be effective tools for decision support on relational data. However, they are not well-equipped to handle the new yet important multidimensional networks. In this paper, we introduce Graph Cube, a new data warehousing model that supports OLAP queries effectively on large multidimensional networks. By taking account of both attribute aggregation and structure summarization of the networks, Graph Cube goes beyond the traditional data cube model involved solely with numeric value based group-by's, thus resulting in a more insightful and structure-enriched aggregate network within every possible multidimensional space. Besides traditional cuboid queries, a new class of OLAP queries, crossboid, is introduced that is uniquely useful in multidimensional networks and has not been studied before. We implement Graph Cube by combining special characteristics of multidimensional networks with the existing well-studied data cube techniques. We perform extensive experimental studies on a series of real world data sets and Graph Cube is shown to be a powerful and efficient tool for decision support on large multidimensional networks.
TL;DR: This paper proposes to rank edges using a simple similarity-based heuristic that is efficiently compute by comparing the minhash signatures of the nodes incident to the edge, to preferentially retain the edges that are likely to be part of the same cluster.
Abstract: In this paper we look at how to sparsify a graph i.e. how to reduce the edgeset while keeping the nodes intact, so as to enable faster graph clustering without sacrificing quality. The main idea behind our approach is to preferentially retain the edges that are likely to be part of the same cluster. We propose to rank edges using a simple similarity-based heuristic that we efficiently compute by comparing the minhash signatures of the nodes incident to the edge. For each node, we select the top few edges to be retained in the sparsified graph. Extensive empirical results on several real networks and using four state-of-the-art graph clustering and community discovery algorithms reveal that our proposed approach realizes excellent speedups (often in the range 10-50), with little or no deterioration in the quality of the resulting clusters. In fact, for at least two of the four clustering algorithms, our sparsification consistently enables higher clustering accuracies.