TL;DR: This work proposes a heuristic method that is shown to outperform all other known community detection methods in terms of computation time and the quality of the communities detected is very good, as measured by the so-called modularity.
Abstract: We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2.6 million customers and by analyzing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad-hoc modular networks. .
TL;DR: In this paper, the authors proposed a simple method to extract the community structure of large networks based on modularity optimization, which is shown to outperform all other known community detection methods in terms of computation time.
Abstract: We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection methods in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2 million customers and by analysing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad hoc modular networks.
TL;DR: This note shows that consensus is reached asymptotically for the first two cases if the undirected interaction graph is connected and for the third case if the directed interaction graph has a directed spanning tree and the gain for velocity matching with the group reference velocity is above a certain bound.
Abstract: This note considers consensus algorithms for double-integrator dynamics. We propose and analyze consensus algorithms for double-integrator dynamics in four cases: 1) with a bounded control input, 2) without relative velocity measurements, 3) with a group reference velocity available to each team member, and 4) with a bounded control input when a group reference state is available to only a subset of the team. We show that consensus is reached asymptotically for the first two cases if the undirected interaction graph is connected. We further show that consensus is reached asymptotically for the third case if the directed interaction graph has a directed spanning tree and the gain for velocity matching with the group reference velocity is above a certain bound. We also show that consensus is reached asymptotically for the fourth case if and only if the group reference state flows directly or indirectly to all of the vehicles in the team.
TL;DR: An approach based on supervised learning to infer people's motion modes from their GPS logs is proposed, which identifies a set of sophisticated features, which are more robust to traffic condition than those other researchers ever used.
Abstract: Both recognizing human behavior and understanding a user's mobility from sensor data are critical issues in ubiquitous computing systems As a kind of user behavior, the transportation modes, such as walking, driving, etc, that a user takes, can enrich the user's mobility with informative knowledge and provide pervasive computing systems with more context information In this paper, we propose an approach based on supervised learning to infer people's motion modes from their GPS logs The contribution of this work lies in the following two aspects On one hand, we identify a set of sophisticated features, which are more robust to traffic condition than those other researchers ever used On the other hand, we propose a graph-based post-processing algorithm to further improve the inference performance This algorithm considers both the commonsense constraint of real world and typical user behavior based on location in a probabilistic manner Using the GPS logs collected by 65 people over a period of 10 months, we evaluated our approach via a set of experiments As a result, based on the change point-based segmentation method and Decision Tree-based inference model, the new features brought an eight percent improvement in inference accuracy over previous result, and the graph-based post-processing achieve a further four percent enhancement
TL;DR: A novel graph-based semi supervised learning approach is proposed based on a linear neighborhood model, which assumes that each data point can be linearly reconstructed from its neighborhood, and can propagate the labels from the labeled points to the whole data set using these linear neighborhoods with sufficient smoothness.
Abstract: In many practical data mining applications such as text classification, unlabeled training examples are readily available, but labeled ones are fairly expensive to obtain. Therefore, semi supervised learning algorithms have aroused considerable interests from the data mining and machine learning fields. In recent years, graph-based semi supervised learning has been becoming one of the most active research areas in the semi supervised learning community. In this paper, a novel graph-based semi supervised learning approach is proposed based on a linear neighborhood model, which assumes that each data point can be linearly reconstructed from its neighborhood. Our algorithm, named linear neighborhood propagation (LNP), can propagate the labels from the labeled points to the whole data set using these linear neighborhoods with sufficient smoothness. A theoretical analysis of the properties of LNP is presented in this paper. Furthermore, we also derive an easy way to extend LNP to out-of-sample data. Promising experimental results are presented for synthetic data, digit, and text classification tasks.
TL;DR: This work formally defines the graph-anonymization problem that, given a graph G, asks for the k-degree anonymous graph that stems from G with the minimum number of graph-modification operations, and devise simple and efficient algorithms for solving this problem.
Abstract: The proliferation of network data in various application domains has raised privacy concerns for the individuals involved. Recent studies show that simply removing the identities of the nodes before publishing the graph/social network data does not guarantee privacy. The structure of the graph itself, and in its basic form the degree of the nodes, can be revealing the identities of individuals. To address this issue, we study a specific graph-anonymization problem. We call a graph k-degree anonymous if for every node v, there exist at least k-1 other nodes in the graph with the same degree as v. This definition of anonymity prevents the re-identification of individuals by adversaries with a priori knowledge of the degree of certain nodes. We formally define the graph-anonymization problem that, given a graph G, asks for the k-degree anonymous graph that stems from G with the minimum number of graph-modification operations. We devise simple and efficient algorithms for solving this problem. Our algorithms are based on principles related to the realizability of degree sequences. We apply our methods to a large spectrum of synthetic and real datasets and demonstrate their efficiency and practical utility.
TL;DR: A revised version of the metareg command, which performs meta-analysis regression (meta-regression) on study-level summary data, is presented, which involves improvements to the estimation methods and the addition of an option to use a permutation test to estimate p-values.
Abstract: We present a revised version of the metareg command, which performs meta-analysis regression (meta-regression) on study-level summary data. The ma- jor revisions involve improvements to the estimation methods and the addition of an option to use a permutation test to estimate p-values, including an adjustment for multiple testing. We have also made additions to the output, added an option to produce a graph, and included support for the predict command. Stata 8.0 or above is required.
TL;DR: This work replaces the dynamic programming method of seam carving with graph cuts that are suitable for 3D volumes and presents a novel energy criterion that improves the visual quality of the retargeted images and videos.
Abstract: Video, like images, should support content aware resizing. We present video retargeting using an improved seam carving operator. Instead of removing 1D seams from 2D images we remove 2D seam manifolds from 3D space-time volumes. To achieve this we replace the dynamic programming method of seam carving with graph cuts that are suitable for 3D volumes. In the new formulation, a seam is given by a minimal cut in the graph and we show how to construct a graph such that the resulting cut is a valid seam. That is, the cut is monotonic and connected. In addition, we present a novel energy criterion that improves the visual quality of the retargeted images and videos. The original seam carving operator is focused on removing seams with the least amount of energy, ignoring energy that is introduced into the images and video by applying the operator. To counter this, the new criterion is looking forward in time - removing seams that introduce the least amount of energy into the retargeted result. We show how to encode the improved criterion into graph cuts (for images and video) as well as dynamic programming (for images). We apply our technique to images and videos and present results of various applications.
TL;DR: Graph-theoretic conditions are obtained which address the convergence question for the leaderless version of the widely studied Vicsek consensus problem.
Abstract: This paper presents new graph-theoretic results appropriate for the analysis of a variety of consensus problems cast in dynamically changing environments. The concepts of rooted, strongly rooted, and neighbor-shared are defined, and conditions are derived for compositions of sequences of directed graphs to be of these types. The graph of a stochastic matrix is defined, and it is shown that under certain conditions the graph of a Sarymsakov matrix and a rooted graph are one and the same. As an illustration of the use of the concepts developed in this paper, graph-theoretic conditions are obtained which address the convergence question for the leaderless version of the widely studied Vicsek consensus problem.
TL;DR: An application of the reduction procedure is presented for autoignition using a large iso-octane mechanism, which is automatic, is fast, has moderate CPU and memory requirements, and compares favorably to other existing techniques.
TL;DR: It is found that people tend to communicate more with each other when they have similar age, language, and location, and that cross-gender conversations are both more frequent and of longer duration than conversations with the same gender.
Abstract: We present a study of anonymized data capturing a month of high-level communication activities within the whole of the Microsoft Messenger instant-messaging system. We examine characteristics and patterns that emerge from the collective dynamics of large numbers of people, rather than the actions and characteristics of individuals. The dataset contains summary properties of 30 billion conversations among 240 million people. From the data, we construct a communication graph with 180 million nodes and 1.3 billion undirected edges, creating the largest social network constructed and analyzed to date. We report on multiple aspects of the dataset and synthesized graph. We find that the graph is well-connected and robust to node removal. We investigate on a planetary-scale the oft-cited report that people are separated by "six degrees of separation" and find that the average path length among Messenger users is 6.6. We find that people tend to communicate more with each other when they have similar age, language, and location, and that cross-gender conversations are both more frequent and of longer duration than conversations with the same gender.
TL;DR: In this article, the stability of a network partition is defined in terms of the statistical properties of a dy namical process taking place on the graph, and the connection between community detection and Laplacian dynamics enables them to establish dynamically motivated stability measures linked to distinct null models.
Abstract: Most methods proposed to uncover communities in complex networks rely on their structural properties. Here we introduce the stability of a network partition, a measure of its quality defined in terms of the statistical properties of a dy namical process taking place on the graph. The time-scale of the process acts as an intrinsic parameter that uncovers community structures at different resolutions. The stability extends and unifies standard notions for community detection: modularity and spectral partitioning can be seen as limiting cases of our dynamic measure. Similarly, recently proposed multi-resolution methods correspond to linearisations of the stability at short times. The connection between community detection and Laplacian dynamics enables us to establish dynamically motivated stability measures linked to distinct null models. We apply our method to find multi-scale partitions for different networks and show that the stability can be computed
TL;DR: This paper introduces a novel algorithm to order markers on a genetic linkage map based on a simple yet fundamental mathematical property that proves the validity of this property, and shows that it consistently outperforms the best available methods in the literature.
Abstract: Genetic linkage maps are cornerstones of a wide spectrum of biotechnology applications, including map-assisted breeding, association genetics, and map-assisted gene cloning. During the past several years, the adoption of high-throughput genotyping technologies has been paralleled by a substantial increase in the density and diversity of genetic markers. New genetic mapping algorithms are needed in order to efficiently process these large datasets and accurately construct high-density genetic maps. In this paper, we introduce a novel algorithm to order markers on a genetic linkage map. Our method is based on a simple yet fundamental mathematical property that we prove under rather general assumptions. The validity of this property allows one to determine efficiently the correct order of markers by computing the minimum spanning tree of an associated graph. Our empirical studies obtained on genotyping data for three mapping populations of barley (Hordeum vulgare), as well as extensive simulations on synthetic data, show that our algorithm consistently outperforms the best available methods in the literature, particularly when the input data are noisy or incomplete. The software implementing our algorithm is available in the public domain as a web tool under the name MSTmap.
TL;DR: The Ontologizer allows users to visualize data as a graph including all significantly overrepresented GO terms and to explore the data by linking GO terms to all genes/proteins annotated to the term and by linking individual terms to child terms.
Abstract: Summary: The Ontologizer is a Java application that can be used to perform statistical analysis for overrepresentation of Gene Ontology (GO) terms in sets of genes or proteins derived from an experiment. The Ontologizer implements the standard approach to statistical analysis based on the one-sided Fisher's exact test, the novel parent–child method, as well as topology-based algorithms. A number of multiple-testing correction procedures are provided. The Ontologizer allows users to visualize data as a graph including all significantly overrepresented GO terms and to explore the data by linking GO terms to all genes/proteins annotated to the term and by linking individual terms to child terms.
Availability: The Ontologizer application is available under the terms of the GNU GPL. It can be started as a WebStart application from the project homepage, where source code is also provided: http://compbio.charite.de/ontologizer
Requirements: Ontologizer requires a Java SE 5.0 compliant Java runtime engine and GraphViz for the optional graph visualization tool.
Contact:sebastian.bauer@charite.de; peter.robinson@charite.de
TL;DR: A graph algebra extended from the relational algebra in which the selection operator is generalized to graph pattern matching and a composition operator is introduced for rewriting matched graphs is presented and access methods of the selectionoperator are investigated.
Abstract: With the prevalence of graph data in a variety of domains, there is an increasing need for a language to query and manipulate graphs with heterogeneous attributes and structures. We propose a query language for graph databases that supports arbitrary attributes on nodes, edges, and graphs. In this language, graphs are the basic unit of information and each query manipulates one or more collections of graphs. To allow for flexible compositions of graph structures, we extend the notion of formal languages from strings to the graph domain. We present a graph algebra extended from the relational algebra in which the selection operator is generalized to graph pattern matching and a composition operator is introduced for rewriting matched graphs. Then, we investigate access methods of the selection operator. Pattern matching over large graphs is challenging due to the NP-completeness of subgraph isomorphism. We address this by a combination of techniques: use of neighborhood subgraphs and profiles, joint reduction of the search space, and optimization of the search order. Experimental results on real and synthetic large graphs demonstrate that our graph specific optimizations outperform an SQL-based implementation by orders of magnitude.
TL;DR: A design methodology to stabilize relative equilibria in a model of identical, steered particles moving in the plane at unit speed to show how previous results assuming all-to-all communication can be extended to a general communication framework.
Abstract: This paper proposes a design methodology to stabilize relative equilibria in a model of identical, steered particles moving in the plane at unit speed. Relative equilibria either correspond to parallel motion of all particles with fixed relative spacing or to circular motion of all particles around the same circle. Particles exchange relative information according to a communication graph that can be undirected or directed and time-invariant or time-varying. The emphasis of this paper is to show how previous results assuming all-to-all communication can be extended to a general communication framework.
TL;DR: This approach provides a generalization of threshold classical secret sharing via insecure quantum channels beyond the current requirement of 100% collaboration by players to just a simple majority in the case of five players.
Abstract: We consider three broad classes of quantum secret sharing with and without eavesdropping and show how a graph state formalism unifies otherwise disparate quantum secret sharing models. In addition to the elegant unification provided by graph states, our approach provides a generalization of threshold classical secret sharing via insecure quantum channels beyond the current requirement of 100% collaboration by players to just a simple majority in the case of five players. Another innovation here is the introduction of embedded protocols within a larger graph state that serves as a one-way quantum-information processing system.
TL;DR: This paper construct an affinity graph to encode the geometrical information and seek a matrix factorization which respects the graph structure and demonstrates the success of this novel algorithm by applying it on real world problems.
Abstract: Recently non-negative matrix factorization (NMF) has received a lot of attentions in information retrieval, computer vision and pattern recognition. NMF aims to find two non-negative matrices whose product can well approximate the original matrix. The sizes of these two matrices are usually smaller than the original matrix. This results in a compressed version of the original data matrix. The solution of NMF yields a natural parts-based representation for the data. When NMF is applied for data representation, a major disadvantage is that it fails to consider the geometric structure in the data. In this paper, we develop a graph based approach for parts-based data representation in order to overcome this limitation. We construct an affinity graph to encode the geometrical information and seek a matrix factorization which respects the graph structure. We demonstrate the success of this novel algorithm by applying it on real world problems.
TL;DR: This is the first work to compute graph summaries using the MDL principle, and use the summaries (along with corrections) to compress graphs with bounded error.
Abstract: We propose a highly compact two-part representation of a given graph G consisting of a graph summary and a set of corrections. The graph summary is an aggregate graph in which each node corresponds to a set of nodes in G, and each edge represents the edges between all pair of nodes in the two sets. On the other hand, the corrections portion specifies the list of edge-corrections that should be applied to the summary to recreate G. Our representations allow for both lossless and lossy graph compression with bounds on the introduced error. Further, in combination with the MDL principle, they yield highly intuitive coarse-level summaries of the input graph G. We develop algorithms to construct highly compressed graph representations with small sizes and guaranteed accuracy, and validate our approach through an extensive set of experiments with multiple real-life graph data sets.To the best of our knowledge, this is the first work to compute graph summaries using the MDL principle, and use the summaries (along with corrections) to compress graphs with bounded error.
TL;DR: This annotated bibliography gives an elementary classification of problems and results related to graph searching and provides a source of bibliographical references on this field.
TL;DR: This paper proposes to use summary measures of the set of possible causal effects to determine variable importance and uses the minimum absolute value of this set, since that is a lower bound on the size of the causal effect.
Abstract: We assume that we have observational data generated from an unknown underlying directed acyclic graph (DAG) model. A DAG is typically not identifiable from observational data, but it is possible to consistently estimate the equivalence class of a DAG. Moreover, for any given DAG, causal effects can be estimated using intervention calculus. In this paper, we combine these two parts. For each DAG in the estimated equivalence class, we use intervention calculus to estimate the causal effects of the covariates on the response. This yields a collection of estimated causal effects for each covariate. We show that the distinct values in this set can be consistently estimated by an algorithm that uses only local information of the graph. This local approach is computationally fast and feasible in high-dimensional problems. We propose to use summary measures of the set of possible causal effects to determine variable importance. In particular, we use the minimum absolute value of this set, since that is a lower bound on the size of the causal effect. We demonstrate the merits of our methods in a simulation study and on a data set about riboflavin production.
TL;DR: The main result says that the problem of minimizing the size of S, while ensuring that targeting S would influence the whole network into adopting the product, is hard to approximate within a polylogarithmic factor.
Abstract: In this paper, we study the spread of influence through a social network, in a model initially studied by Kempe, Kleinberg and Tardos [14, 15]: We are given a graph modeling a social network, where each node v has a (fixed) threshold tv, such that the node will adopt a new product if tv of its neighbors adopt it. Our goal is to find a small set S of nodes such that targeting the product to S would lead to adoption of the product by a large number of nodes in the graph. We show strong inapproximability results for several variants of this problem. Our main result says that the problem of minimizing the size of S, while ensuring that targeting S would influence the whole network into adopting the product, is hard to approximate within a polylogarithmic factor. This implies similar results if only a fixed fraction of the network is ensured to adopt the product. Further, the hardness of approximation result continues to hold when all nodes have majority thresholds, or have constant degree and threshold two. The latter answers a complexity question proposed in [10, 29]. We also give some positive results for more restricted cases, such as when the underlying graph is a tree.
TL;DR: This paper formalizes the problem of Basic Graph Pattern (BGP) optimization for SPARQL queries and main memory graph implementations of RDF data and defines and analyzes the characteristics of heuristics for selectivity-based static BGP optimization.
Abstract: In this paper, we formalize the problem of Basic Graph Pattern (BGP) optimization for SPARQL queries and main memory graph implementations of RDF data. We define and analyze the characteristics of heuristics for selectivity-based static BGP optimization. The heuristics range from simple triple pattern variable counting to more sophisticated selectivity estimation techniques. Customized summary statistics for RDF data enable the selectivity estimation of joined triple patterns and the development of efficient heuristics. Using the Lehigh University Benchmark (LUBM), we evaluate the performance of the heuristics for the queries provided by the LUBM and discuss some of them in more details.
TL;DR: The first comprehensive study on general mining method aiming to find most significant patterns directly, and graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.
Abstract: With ever-increasing amounts of graph data from disparate sources, there has been a strong need for exploiting significant graph patterns with user-specified objective functions. Most objective functions are not antimonotonic, which could fail all of frequency-centric graph mining algorithms. In this paper, we give the first comprehensive study on general mining method aiming to find most significant patterns directly. Our new mining framework, called LEAP (Descending Leap Mine), is developed to exploit the correlation between structural similarity and significance similarity in a way that the most significant pattern could be identified quickly by searching dissimilar graph patterns. Two novel concepts, structural leap search and frequency descending mining, are proposed to support leap search in graph pattern space. Our new mining method revealed that the widely adopted branch-and-bound search in data mining literature is indeed not the best, thus sketching a new picture on scalable graph pattern discovery. Empirical results show that LEAP achieves orders of magnitude speedup in comparison with the state-of-the-art method. Furthermore, graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.
TL;DR: This work presents write integrity testing (WIT), a new technique that provides practical protection from memory errors that compiles C and C++ programs without modifications, it has high coverage with no false positives, and it has low overhead.
Abstract: Attacks often exploit memory errors to gain control over the execution of vulnerable programs. These attacks remain a serious problem despite previous research on techniques to prevent them. We present write integrity testing (WIT), a new technique that provides practical protection from these attacks. WIT uses points-to analysis at compile time to compute the control-flow graph and the set of objects that can be written by each instruction in the program. Then it generates code instrumented to prevent instructions from modifying objects that are not in the set computed by the static analysis, and to ensure that indirect control transfers are allowed by the control-flow graph. To improve coverage where the analysis is not precise enough, WIT inserts small guards between the original program objects. We describe an efficient implementation with optimizations to reduce space and time overhead. This implementation can be used in practice because it compiles C and C++ programs without modifications, it has high coverage with no false positives, and it has low overhead. WIT's average runtime overhead is only 7% across a set of CPU intensive benchmarks and it is negligible when IO is the bottleneck.
TL;DR: This paper uses the first few eigenfunctions of the backward Fokker–Planck diffusion operator as a coarse-grained low dimensional representation for the long-term evolution of a stochastic system and shows that they are optimal under a certain mean squared error criterion.
Abstract: The concise representation of complex high dimensional stochastic systems via a few reduced coordinates is an important problem in computational physics, chemistry, and biology. In this paper we use the first few eigenfunctions of the backward Fokker–Planck diffusion operator as a coarse-grained low dimensional representation for the long-term evolution of a stochastic system and show that they are optimal under a certain mean squared error criterion. We denote the mapping from physical space to these eigenfunctions as the diffusion map. While in high dimensional systems these eigenfunctions are difficult to compute numerically by conventional methods such as finite differences or finite elements, we describe a simple computational data-driven method to approximate them from a large set of simulated data. Our method is based on defining an appropriately weighted graph on the set of simulated data and computing the first few eigenvectors and eigenvalues of the corresponding random walk matrix on this graph...
TL;DR: Unidad de Investigación de Diseño de Farmacos y Conectividad Molecular, Departamento de Quisica Fisica, Facultad of Farmacı́a, Universitat de València, Spain, Instituto de Tecnologia Quimica, CSIC-Universidad Politecnica de Valencia,
Abstract: Unidad de Investigación de Diseño de Farmacos y Conectividad Molecular, Departamento de Quı́mica Fisica, Facultad de Farmacı́a, Universitat de València, 46100 Burjassot, València, Spain, Instituto de Tecnologia Quimica, CSIC-Universidad Politecnica de Valencia, Av. de los Naranjos s/n, 46022 València, Spain, and Dipartimento di Chimica, Università della Calabria, via P. Bucci 14/C, 87036 Rende (CS), Italy
TL;DR: The main result is that if the adjacent topology of the graph is frequently connected then the consensus is achievable via local-information-based decentralized controls, provided that the linear dynamic mode is completely controllable.
TL;DR: The paper investigates the synchronization of a network of identical linear time-invariant state-space models under a possibly time-varying and directed interconnection structure and construction of a dynamic output feedback coupling achieves synchronization if the decoupled systems have no exponentially unstable mode and if the communication graph is uniformly connected.
Abstract: The paper investigates the synchronization of a network of identical linear state-space models under a possibly time-varying and directed interconnection structure. The main result is the construction of a dynamic output feedback coupling that achieves synchronization if the decoupled systems have no exponentially unstable mode and if the communication graph is uniformly connected. The result can be interpreted as a generalization of classical consensus algorithms. Stronger conditions are shown to be sufficient but to some extent, also necessary to ensure synchronization with the diffusive static output coupling often considered in the literature.
TL;DR: A local gradient control law is proposed to stabilize a group of robots to a target formation derived from a potential function based on an undirected infinitesimally rigid graph that specifies the target formation.
Abstract: This paper proposes a local gradient control law to stabilize a group of robots to a target formation. The control is derived from a potential function based on an undirected infinitesimally rigid graph that specifies the target formation. It is shown that infinitesimal rigidity is a sufficient condition for local asymptotical stability of the equilibrium manifold describing the target formation.