TL;DR: The Motif-based Approximate Personalized PageRank (MAPPR) algorithm is developed, a generalization of the conductance metric for network motifs that finds clusters containing a seed node with minimalmotif conductance, and a theory of node neighborhoods for finding sets that have small motif conductance.
Abstract: Local graph clustering methods aim to find a cluster of nodes by exploring a small region of the graph. These methods are attractive because they enable targeted clustering around a given seed node and are faster than traditional global graph clustering methods because their runtime does not depend on the size of the input graph. However, current local graph partitioning methods are not designed to account for the higher-order structures crucial to the network, nor can they effectively handle directed networks. Here we introduce a new class of local graph clustering methods that address these issues by incorporating higher-order network information captured by small subgraphs, also called network motifs. We develop the Motif-based Approximate Personalized PageRank (MAPPR) algorithm that finds clusters containing a seed node with minimal \emph{motif conductance}, a generalization of the conductance metric for network motifs. We generalize existing theory to prove the fast running time (independent of the size of the graph) and obtain theoretical guarantees on the cluster quality (in terms of motif conductance). We also develop a theory of node neighborhoods for finding sets that have small motif conductance, and apply these results to the case of finding good seed nodes to use as input to the MAPPR algorithm. Experimental validation on community detection tasks in both synthetic and real-world networks, shows that our new framework MAPPR outperforms the current edge-based personalized PageRank methodology.
TL;DR: The proposed TAGCN not only inherits the properties of convolutions in CNN for grid-structured data, but it is also consistent with convolution as defined in graph signal processing, exhibiting better performance than existing graph-convolution-approximation methods on a number of data sets.
Abstract: Spectral graph convolutional neural networks (CNNs) require approximation to the convolution to alleviate the computational complexity, resulting in performance loss. This paper proposes the topology adaptive graph convolutional network (TAGCN), a novel graph convolutional network defined in the vertex domain. We provide a systematic way to design a set of fixed-size learnable filters to perform convolutions on graphs. The topologies of these filters are adaptive to the topology of the graph when they scan the graph to perform convolution. The TAGCN not only inherits the properties of convolutions in CNN for grid-structured data, but it is also consistent with convolution as defined in graph signal processing. Since no approximation to the convolution is needed, TAGCN exhibits better performance than existing spectral CNNs on a number of data sets and is also computationally simpler than other recent methods.
TL;DR: In this paper, the authors present a survey of the fundamental graph querying functionalities, such as graph patterns and navigational expressions, which are used in modern graph query languages such as SPARQL, Cypher and Gremlin.
Abstract: We survey foundational features underlying modern graph query languages. We first discuss two popular graph data models: edge-labelled graphs, where nodes are connected by directed, labelled edges, and property graphs, where nodes and edges can further have attributes. Next we discuss the two most fundamental graph querying functionalities: graph patterns and navigational expressions. We start with graph patterns, in which a graph-structured query is matched against the data. Thereafter, we discuss navigational expressions, in which patterns can be matched recursively against the graph to navigate paths of arbitrary length; we give an overview of what kinds of expressions have been proposed and how they can be combined with graph patterns. We also discuss several semantics under which queries using the previous features can be evaluated, what effects the selection of features and semantics has on complexity, and offer examples of such features in three modern languages that are used to query graphs: SPARQL, Cypher, and Gremlin. We conclude by discussing the importance of formalisation for graph query languages; a summary of what is known about SPARQL, Cypher, and Gremlin in terms of expressivity and complexity; and an outline of possible future directions for the area.
TL;DR: A family of autoregressive moving average (ARMA) recursions is designed, which are able to approximate any desired graph frequency response, and give exact solutions for specific graph signal denoising and interpolation problems.
Abstract: One of the cornerstones of the field of signal processing on graphs are graph filters, direct analogs of classical filters, but intended for signals defined on graphs. This paper brings forth new insights on the distributed graph filtering problem. We design a family of autoregressive moving average (ARMA) recursions, which are able to approximate any desired graph frequency response, and give exact solutions for specific graph signal denoising and interpolation problems. The philosophy to design the ARMA coefficients independently from the underlying graph renders the ARMA graph filters suitable in static and, particularly, time-varying settings. The latter occur when the graph signal and/or graph topology are changing over time. We show that in case of a time-varying graph signal, our approach extends naturally to a two-dimensional filter, operating concurrently in the graph and regular time domain. We also derive the graph filter behavior, as well as sufficient conditions for filter stability when the graph and signal are time varying. The analytical and numerical results presented in this paper illustrate that ARMA graph filters are practically appealing for static and time-varying settings, as predicted by theoretical derivations.
TL;DR: In this article, the authors study how global parameters of a graph, such as its edge density or chromatic number, can influence its local substructures and show that a sufficiently high average degree can ensure that one of these subgraphs occurs.
Abstract: In this chapter we study how global parameters of a graph, such as its edge density or chromatic number, can influence its local substructures. How many edges, for instance, do we have to give a graph on n vertices to be sure that, no matter how these edges are arranged, the graph will contain a K r subgraph for some given r? Or at least a K r minor? Will some sufficiently high average degree or chromatic number ensure that one of these substructures occurs?
TL;DR: This paper presents a graph kernel based on the Pyramid Match kernel that finds an approximate correspondence between the sets of vectors of the two graphs and evaluates the proposed methods on several benchmark datasets for graph classification and compares their performance to state-of-the-art graph kernels.
Abstract: Graph kernels have emerged as a powerful tool for graph comparison. Most existing graph kernels focus on local properties of graphs and ignore global structure. In this paper, we compare graphs based on their global properties as these are captured by the eigenvectors of their adjacency matrices. We present two algorithms for both labeled and unlabeled graph comparison. These algorithms represent each graph as a set of vectors corresponding to the embeddings of its vertices. The similarity between two graphs is then determined using the Earth Mover's Distance metric. These similarities do not yield a positive semidefinite matrix. To address for this, we employ an algorithm for SVM classification using indefinite kernels. We also present a graph kernel based on the Pyramid Match kernel that finds an approximate correspondence between the sets of vectors of the two graphs. We further improve the proposed kernel using the Weisfeiler-Lehman framework. We evaluate the proposed methods on several benchmark datasets for graph classification and compare their performance to state-of-the-art graph kernels. In most cases, the proposed algorithms outperform the competing methods, while their time complexity remains very attractive.
TL;DR: HARP is a general meta-strategy to improve all of the state-of-the-art neural algorithms for embedding graphs, including DeepWalk, LINE, and Node2vec, and it is demonstrated that applying HARP's hierarchical paradigm yields improved implementations for all three of these methods.
Abstract: We present HARP, a novel method for learning low dimensional embeddings of a graph's nodes which preserves higher-order structural features. Our proposed method achieves this by compressing the input graph prior to embedding it, effectively avoiding troublesome embedding configurations (i.e. local minima) which can pose problems to non-convex optimization. HARP works by finding a smaller graph which approximates the global structure of its input. This simplified graph is used to learn a set of initial representations, which serve as good initializations for learning representations in the original, detailed graph. We inductively extend this idea, by decomposing a graph in a series of levels, and then embed the hierarchy of graphs from the coarsest one to the original graph. HARP is a general meta-strategy to improve all of the state-of-the-art neural algorithms for embedding graphs, including DeepWalk, LINE, and Node2vec. Indeed, we demonstrate that applying HARP's hierarchical paradigm yields improved implementations for all three of these methods, as evaluated on both classification tasks on real-world graphs such as DBLP, BlogCatalog, CiteSeer, and Arxiv, where we achieve a performance gain over the original implementations by up to 14% Macro F1.
TL;DR: This article mainly study the biology features of biological networks in terms of eccentric topological indices computation by means of graph structure analysis and distance calculating, and illustrates that the bioengineering has the promising application prospects.
TL;DR: The introduction to graph theory richard j trudeau is universally compatible with any devices to read and is set as public so you can download it instantly.
Abstract: introduction to graph theory richard j trudeau is available in our book collection an online access to it is set as public so you can download it instantly. Our book servers spans in multiple countries, allowing you to get the most less latency time to download any of our books like this one. Merely said, the introduction to graph theory richard j trudeau is universally compatible with any devices to read
TL;DR: GraphGAN is proposed, an innovative graph representation learning framework unifying above two classes of methods, in which the generative model and discriminative model play a game-theoretical minimax game.
Abstract: The goal of graph representation learning is to embed each vertex in a graph into a low-dimensional vector space. Existing graph representation learning methods can be classified into two categories: generative models that learn the underlying connectivity distribution in the graph, and discriminative models that predict the probability of edge existence between a pair of vertices. In this paper, we propose GraphGAN, an innovative graph representation learning framework unifying above two classes of methods, in which the generative model and discriminative model play a game-theoretical minimax game. Specifically, for a given vertex, the generative model tries to fit its underlying true connectivity distribution over all other vertices and produces "fake" samples to fool the discriminative model, while the discriminative model tries to detect whether the sampled vertex is from ground truth or generated by the generative model. With the competition between these two models, both of them can alternately and iteratively boost their performance. Moreover, when considering the implementation of generative model, we propose a novel graph softmax to overcome the limitations of traditional softmax function, which can be proven satisfying desirable properties of normalization, graph structure awareness, and computational efficiency. Through extensive experiments on real-world datasets, we demonstrate that GraphGAN achieves substantial gains in a variety of applications, including link prediction, node classification, and recommendation, over state-of-the-art baselines.
TL;DR: This paper addresses the general case of directed graphs and proposes an alternative approach that builds the graph Fourier basis as the set of orthonormal vectors that minimize a continuous extension of the graph cut size, known as the Lovász extension.
Abstract: The analysis of signals defined over a graph is relevant in many applications, such as social and economic networks, big data or biological networks, and so on. A key tool for analyzing these signals is the so-called graph Fourier transform (GFT). Alternative definitions of GFT have been suggested in the literature, based on the eigen-decomposition of either the graph Laplacian or adjacency matrix. In this paper, we address the general case of directed graphs and we propose an alternative approach that builds the graph Fourier basis as the set of orthonormal vectors that minimize a continuous extension of the graph cut size, known as the Lovasz extension. To cope with the nonconvexity of the problem, we propose two alternative iterative optimization methods, properly devised for handling orthogonality constraints. Finally, we extend the method to minimize a continuous relaxation of the balanced cut size. The formulated problem is again nonconvex, and we propose an efficient solution method based on an explicit–implicit gradient algorithm.
TL;DR: Graphflow is demonstrated, a prototype active graph data-base that evaluates general one-time and continuous subgraph queries and supports the property graph data model and the Cypher++ query language, which extends Neo4j's declarative Cypher language with subgraph-condition-action triggers.
Abstract: Many applications detect the emergence or deletion of certain subgraphs in their input graphs continuously. In order to evaluate such continuous subgraph queries, these applications resort to inefficient or highly specialized solutions because existing graph databases are passive systems that only support one-time subgraph queries. We demonstrate Graphflow, a prototype active graph data-base that evaluates general one-time and continuous subgraph queries. Graphflow supports the property graph data model and the Cypher++ query language, which extends Neo4j's declarative Cypher language with subgraph-condition-action triggers. At the core of Graphflow's query processor are two worst-case optimal join algorithms called Generic Join and our new Delta Generic Join algorithm for one-time and continuous subgraph queries, respectively.
TL;DR: This paper introduces a dummy super node that is connected with all nodes in the graph by a directed edge as the representation of the graph and modify the graph operation to help the dummy super nodes learn graph-level feature.
Abstract: Predicating macroscopic influences of drugs on human body, like efficacy and toxicity, is a central problem of small-molecule based drug discovery. Molecules can be represented as an undirected graph, and we can utilize graph convolution networks to predication molecular properties. However, graph convolutional networks and other graph neural networks all focus on learning node-level representation rather than graph-level representation. Previous works simply sum all feature vectors for all nodes in the graph to obtain the graph feature vector for drug predication. In this paper, we introduce a dummy super node that is connected with all nodes in the graph by a directed edge as the representation of the graph and modify the graph operation to help the dummy super node learn graph-level feature. Thus, we can handle graph-level classification and regression in the same way as node-level classification and regression. In addition, we apply focal loss to address class imbalance in drug datasets. The experiments on MoleculeNet show that our method can effectively improve the performance of molecular properties predication.
TL;DR: A primal-dual optimization algorithm that scales linearly with the number of allowed edges and can be easily parallelized is proposed that is shown to outperform standard graph learning and other baseline methods both on a synthetic and a real dataset.
Abstract: We consider the problem of inferring the hidden structure of high-dimensional time-varying data. In particular, we aim at capturing the dynamic relationships by representing data as valued nodes in a sequence of graphs. Our approach is motivated by the observation that imposing a meaningful graph topology can help solving the generally ill-posed and challenging problem of structure inference. To capture the temporal evolution in the sequence of graphs, we introduce a new prior that asserts that the graph edges change smoothly in time. We propose a primal-dual optimization algorithm that scales linearly with the number of allowed edges and can be easily parallelized. Our new algorithm is shown to outperform standard graph learning and other baseline methods both on a synthetic and a real dataset.
TL;DR: In this article, the authors examine the statistical behavior of the two key filter types, finite impulse response and autoregressive moving average graph filters, when operating on random time-varying graph signals (or random graph processes) over random time varying graphs, and propose two novel ways of exploiting randomness to improve (joint graph-time) noise cancellation, as well as to reduce the computational complexity of graph filtering.
Abstract: Graph filters play a key role in processing the graph spectra of signals supported on the vertices of a graph. However, despite their widespread use, graph filters have been analyzed only in the deterministic setting, ignoring the impact of stochasticity in both the graph topology and the signal itself. To bridge this gap, we examine the statistical behavior of the two key filter types, finite impulse response and autoregressive moving average graph filters, when operating on random time-varying graph signals (or random graph processes) over random time-varying graphs. Our analysis shows that 1) in expectation, the filters behave as the same deterministic filters operating on a deterministic graph, being the expected graph, having as input signal a deterministic signal, being the expected signal, and 2) there are meaningful upper bounds for the variance of the filter output. We conclude this paper by proposing two novel ways of exploiting randomness to improve (joint graph-time) noise cancellation, as well as to reduce the computational complexity of graph filtering. As demonstrated by numerical results, these methods outperform the disjoint average and denoise algorithm and yield a (up to) four times complexity reduction, with a very little difference from the optimal solution.
TL;DR: This work develops the first polynomial-time approximation algorithms to compute the best possible summary of a certain size under both measures of the original graph.
Abstract: We study the problem of graph summarization. Given a large graph we aim at producing a concise lossy representation (a summary) that can be stored in main memory and used to approximately answer queries about the original graph much faster than by using the exact representation. In this work we study a very natural type of summary: the original set of vertices is partitioned into a small number of supernodes connected by superedges to form a complete weighted graph. The superedge weights are the edge densities between vertices in the corresponding supernodes. To quantify the dissimilarity between the original graph and a summary, we adopt the reconstruction error and the cut-norm error. By exposing a connection between graph summarization and geometric clustering problems (i.e., k-means and k-median), we develop the first polynomial-time approximation algorithms to compute the best possible summary of a certain size under both measures. We discuss how to use our summaries to store a (lossy or lossless) compressed graph representation and to approximately answer a large class of queries about the original graph, including adjacency, degree, eigenvector centrality, and triangle and subgraph counting. Using the summary to answer queries is very efficient as the running time to compute the answer depends on the number of supernodes in the summary, rather than the number of nodes in the original graph.
TL;DR: By enforcing the weight of edges between labeled samples of different classes to be zero, this paper explicitly incorporate the label information into the state-of-the-art graph learning methods, such as the low-rank representation (LRR), and proposes a novel semi-supervised graph learning method called semi- supervised low- rank representation.
Abstract: In the literature, most existing graph-based semi-supervised learning methods only use the label information of observed samples in the label propagation stage, while ignoring such valuable information when learning the graph In this paper, we argue that it is beneficial to consider the label information in the graph learning stage Specifically, by enforcing the weight of edges between labeled samples of different classes to be zero, we explicitly incorporate the label information into the state-of-the-art graph learning methods, such as the low-rank representation (LRR), and propose a novel semi-supervised graph learning method called semi-supervised low-rank representation This results in a convex optimization problem with linear constraints, which can be solved by the linearized alternating direction method Though we take LRR as an example, our proposed method is in fact very general and can be applied to any self-representation graph learning methods Experiment results on both synthetic and real data sets demonstrate that the proposed graph learning method can better capture the global geometric structure of the data, and therefore is more effective for semi-supervised learning tasks
TL;DR: This paper presents the first practical solution for efficient LCR evaluation, leveraging landmark-based indexes for large graphs, and shows through extensive experiments that their indexes are significantly smaller than state-of-the-art LCR indexing techniques, while supporting up to orders of magnitude faster query evaluation times.
Abstract: Consider a directed edge-labeled graph, such as a social network or a citation network. A fundamental query on such data is to determine if there is a path in the graph from a given source vertex to a given target vertex, using only edges with labels in a restricted subset of the edge labels in the graph. Such label-constrained reachability (LCR) queries play an important role in graph analytics, for example, as a core fragment of the so-called regular path queries which are supported in practical graph query languages such as the W3C's SPARQL 1.1, Neo4j's Cypher, and Oracle's PGQL. Current solutions for LCR evaluation, however, do not scale to large graphs which are increasingly common in a broad range of application domains. In this paper we present the first practical solution for efficient LCR evaluation, leveraging landmark-based indexes for large graphs. We show through extensive experiments that our indexes are significantly smaller than state-of-the-art LCR indexing techniques, while supporting up to orders of magnitude faster query evaluation times. Our complete C++ codebase is available as open source for further research.
TL;DR: A deep node embedding method called IGE (Interaction Graph Embedding) is proposed, which can be specifically designed for different datasets as long as it is differentiable, in which case it can be trained together with prediction networks by back-propagation.
Abstract: Node embedding techniques have gained prominence since they produce continuous and low-dimensional features, which are effective for various tasks. Most existing approaches learn node embeddings by exploring the structure of networks and are mainly focused on static non-attributed graphs. However, many real-world applications, such as stock markets and public review websites, involve bipartite graphs with dynamic and attributed edges, called attributed interaction graphs. Different from conventional graph data, attributed interaction graphs involve two kinds of entities (e.g. investors/stocks and users/businesses) and edges of temporal interactions with attributes (e.g. transactions and reviews). In this paper, we study the problem of node embedding in attributed interaction graphs. Learning embeddings in interaction graphs is highly challenging due to the dynamics and heterogeneous attributes of edges. Different from conventional static graphs, in attributed interaction graphs, each edge can have totally different meanings when the interaction is at different times or associated with different attributes. We propose a deep node embedding method called IGE (Interaction Graph Embedding). IGE is composed of three neural networks: an encoding network is proposed to transform attributes into a fixed-length vector to deal with the heterogeneity of attributes; then encoded attribute vectors interact with nodes multiplicatively in two coupled prediction networks that investigate the temporal dependency by treating incident edges of a node as the analogy of a sentence in word embedding methods. The encoding network can be specifically designed for different datasets as long as it is differentiable, in which case it can be trained together with prediction networks by back-propagation. We evaluate our proposed method and various comparing methods on four real-world datasets. The experimental results prove the effectiveness of the learned embeddings by IGE on both node clustering and classification tasks.
TL;DR: BiRank as mentioned in this paper is a ranking algorithm for bipartite graphs, which iteratively assigns scores to vertices and finally converges to a unique stationary ranking, which smooths the graph under the guidance of the query vector.
Abstract: The bipartite graph is a ubiquitous data structure that can model the relationship between two entity types: for instance, users and items, queries and webpages. In this paper, we study the problem of ranking vertices of a bipartite graph, based on the graph's link structure as well as prior information about vertices (which we term a query vector). We present a new solution, BiRank, which iteratively assigns scores to vertices and finally converges to a unique stationary ranking. In contrast to the traditional random walk-based methods, BiRank iterates towards optimizing a regularization function, which smooths the graph under the guidance of the query vector. Importantly, we establish how BiRank relates to the Bayesian methodology, enabling the future extension in a probabilistic way. To show the rationale and extendability of the ranking methodology, we further extend it to rank for the more generic n-partite graphs. BiRank's generic modeling of both the graph structure and vertex features enables it to model various ranking hypotheses flexibly. To illustrate its functionality, we apply the BiRank and TriRank (ranking for tripartite graphs) algorithms to two real-world applications: a general ranking scenario that predicts the future popularity of items, and a personalized ranking scenario that recommends items of interest to users. Extensive experiments on both synthetic and real-world datasets demonstrate BiRank's soundness (fast convergence), efficiency (linear in the number of graph edges) and effectiveness (achieving state-of-the-art in the two real-world tasks).
TL;DR: This work proposes to clarify the expression of knowledge graph as a whole from data, information, knowledge, and wisdom aspects respectively, and specifies knowledge graph in a progressive manner as four basic forms including data graph, information graph, knowledge graph and wisdom graph.
Abstract: Knowledge graphs have been widely adopted, in large part owing to their schema-less nature. It enables knowledge graphs to grow seamlessly and allows for new relationships and entities as needed. Knowledge graph has become a powerful tool to represent knowledge in the form of a labelled directed graph and to give semantics to textual information. A knowledge graph is a graph constructed by representing each item, entity and user as nodes, and linking those nodes that interact with each other via edges. Knowledge graph has abundant natural semantics and can contain various and more complete information. Its expression mechanism is close to natural language. However, we still lack a unified definition and standard expression form of knowledge graph. We propose to clarify the expression of knowledge graph as a whole. We clarify the architecture of knowledge graph from data, information, knowledge, and wisdom aspects respectively. We also propose to specify knowledge graph in a progressive manner as four basic forms including data graph, information graph, knowledge graph and wisdom graph.
TL;DR: The main goal of quantitative graph theory is the structural quantification of information contained in complex networks by employing a measurement approach based on numerical invariants and comparisons.
TL;DR: The results show that URGE attains better effectiveness than current uncertain data mining algorithms, as well as state-of-the-art embedding solutions, and develops novel algorithms to enable fast evaluation.
Abstract: Graph data are prevalent in communication networks, social media, and biological networks. These data, which are often noisy or inexact, can be represented by uncertain graphs, whose edges are associated with probabilities to indicate the chances that they exist. Recently, researchers have studied various algorithms (e.g., clustering, classification, and k-NN) for uncertain graphs. These solutions face two problems: (1) high dimensionality: uncertain graphs are often highly complex, which can affect the mining quality; and (2) low reusability, where an existing mining algorithm has to be redesigned to deal with uncertain graphs. To tackle these problems, we propose a solution called URGE, or UnceRtain Graph Embedding. Given an uncertain graph G, URGE generates G's embedding, or a set of low-dimensional vectors, which carry the proximity information of nodes in G. This embedding enables the dimensionality of G to be reduced, without destroying node proximity information. Due to its simplicity, existing mining solutions can be used on the embedding. We investigate two low- and high-order node proximity measures in the embedding generation process, and develop novel algorithms to enable fast evaluation. To our best knowledge, there is no prior study on the use of embedding for uncertain graphs. We have further performed extensive experiments for clustering, classification, and k-NN on several uncertain graph datasets. Our results show that URGE attains better effectiveness than current uncertain data mining algorithms, as well as state-of-the-art embedding solutions. The embedding and mining performance is also highly efficient in our experiments.
TL;DR: This work introduces the multi-layer coherent subgraph model, which defines clusters of vertices that are densely connected by edges with similar labels in a subset of the graph layers, and introduces the best-first search algorithm MiMAG.
Abstract: Detecting dense subgraphs such as cliques or quasi-cliques is an important graph mining problem. While this task is established for simple graphs, today's applications demand the analysis of more complex graphs: In this work, we consider a frequently observed type of graph where edges represent different types of relations. These multiple edge types can also be viewed as different "layers" of a graph, which is denoted as a "multi-layer graph". Additionally, each edge might be annotated by a label characterizing the given relation in more detail. By simultaneously exploiting all this information, the detection of more interesting subgraphs can be supported. We introduce the multi-layer coherent subgraph model, which defines clusters of vertices that are densely connected by edges with similar labels in a subset of the graph layers. We avoid redundancy in the result by selecting only the most interesting, non-redundant subgraphs for the output. Based on this model, we introduce the best-first search algorithm MiMAG. In thorough experiments, we demonstrate the strengths of MiMAG in comparison with related approaches on synthetic as well as real-world data sets.
TL;DR: A new two-step paradigm for scalable structural graph clustering is developed, aiming to reduce the number of structural similarity computations, and optimization techniques to speed up checking whether two vertices are structure-similar are proposed.
Abstract: We study the problem of structural graph clustering, a fundamental problem in managing and analyzing graph data. Given an undirected unweighted graph, structural graph clustering is to assign vertices to clusters, and to identify the sets of hub vertices and outlier vertices as well, such that vertices in the same cluster are densely connected to each other while vertices in different clusters are loosely connected. In this paper, we develop a new two-step paradigm for scalable structural graph clustering based on our three observations. Then, we present a $\mathsf {pSCAN}$ approach, within the paradigm, aiming to reduce the number of structural similarity computations, and propose optimization techniques to speed up checking whether two vertices are structure-similar. $\mathsf {pSCAN}$ outputs exactly the same clusters as the existing approaches $\mathsf {SCAN}$ and $\mathsf {SCAN\text{++}}$ , and we prove that $\mathsf {pSCAN}$ is worst-case optimal. Moreover, we propose efficient techniques for updating the clusters when the input graph dynamically changes, and we also extend our techniques to other similarity measures, e.g., Jaccard similarity. Performance studies on large real and synthetic graphs demonstrate the efficiency of our new approach and our dynamic cluster maintenance techniques. Noticeably, for the twitter graph with 1 billion edges, our approach takes 25 minutes while the state-of-the-art approach cannot finish even after 24 hours.
TL;DR: An efficient and scalable disk-based graph generator that can generate massive graphs in a short time only using a small amount of memory is proposed, TrillionG, which outperforms the state-of-the-art graph generators by up to orders of magnitude.
Abstract: As many applications encounter exponential growth in graph sizes, a fast and scalable graph generator has become more important than ever before due to lack of large-scale realistic graphs for evaluating the performance of graph processing methods. Although there have been proposed a number of methods to generate synthetic graphs, they are not very efficient in terms of space and time complexities, and so, cannot generate even trillion-scale graphs using a moderate size cluster of commodity machines. Here, we propose an efficient and scalable disk-based graph generator, TrillionG that can generate massive graphs in a short time only using a small amount of memory. It can generate a graph of a trillion edges following the RMAT or Kronecker models within two hours only using 10 PCs. We first generalize existing graph generation models to the scope-based generation model, where RMAT and Kronecker correspond to two extremes. Then, we propose a new graph generation model called the recursive vector model, which compromises two extremes, and so, solves the space and time complexity problems existing in RMAT and Kronecker. We also extend the recursive vector model so as to generate a semantically richer graph database. Through extensive experiments, we have demonstrated that TrillionG outperforms the state-of-the-art graph generators by up to orders of magnitude.
TL;DR: A method for compactly representing the weighted de Bruijn Graph (i.e. with abundance information) with essentially no errors is presented, likely to be of general interest and applicable in most weighted de bruijn Graph‐based systems.
Abstract: Motivation Almost all de novo short-read genome and transcriptome assemblers start by building a representation of the de Bruijn Graph of the reads they are given as input. Even when other approaches are used for subsequent assembly (e.g. when one is using 'long read' technologies like those offered by PacBio or Oxford Nanopore), efficient k -mer processing is still crucial for accurate assembly, and state-of-the-art long-read error-correction methods use de Bruijn Graphs. Because of the centrality of de Bruijn Graphs, researchers have proposed numerous methods for representing de Bruijn Graphs compactly. Some of these proposals sacrifice accuracy to save space. Further, none of these methods store abundance information, i.e. the number of times that each k -mer occurs, which is key in transcriptome assemblers. Results We present a method for compactly representing the weighted de Bruijn Graph (i.e. with abundance information) with essentially no errors. Our representation yields zero errors while increasing the space requirements by less than 18-28% compared to the approximate de Bruijn graph representation in Squeakr. Our technique is based on a simple invariant that all weighted de Bruijn Graphs must satisfy, and hence is likely to be of general interest and applicable in most weighted de Bruijn Graph-based systems. Availability and implementation https://github.com/splatlab/debgr . Contact rob.patro@cs.stonybrook.edu. Supplementary information Supplementary data are available at Bioinformatics online.
TL;DR: In this paper, the authors introduced a graph structure called non-zero component union graph on finite-dimensional vector spaces and showed that the graph is connected and found its domination number, clique number and chromatic number.
Abstract: In this paper, we introduce a graph structure, called non-zero component union graph on finite-dimensional vector spaces. We show that the graph is connected and find its domination number, clique number and chromatic number. It is shown that two non-zero component union graphs are isomorphic if and only if the base vector spaces are isomorphic. In case of finite fields, we study the edge-connectivity and condition under which the graph is Eulerian. Moreover, we provide a lower bound for the independence number of the graph. Finally, we come up with a structural characterization of non-zero component union graph.
TL;DR: This work gives a rigorous formulation of the problem of recovering the arrival order of nodes in a statistical learning framework and ties its feasibility to several sets of permutations associated with the symmetries of the random graph model and graphs generated by it.
Abstract: Many networks in the real world are dynamic in nature: nodes enter, exit, and make and break connections with one another as time passes. Several random graph models of these networks are such that nodes have well-defined arrival times. It is natural to ask if, for a given random graph model, we can recover the arrival order of nodes, given information about the structure of the graph. In this work, we give a rigorous formulation of the problem in a statistical learning framework and tie its feasibility, for a broad class of models, to several sets of permutations associated with the symmetries of the random graph model and graphs generated by it. Moreover, we show how the same quantities are fundamental to the study of the information content of graph structures. We then apply our general results to the special cases of the Erdoos-Renyi and preferential attachment models to derive strong inapproximability results.
TL;DR: An algorithm to enumeratemaximal cliques whose worst-case runtime is near-optimal, and an experimental evaluation showing the practical utility of the algorithm.
Abstract: We consider the enumeration of dense substructures (maximal cliques) from an uncertain graph. For parameter 0 <; α <; 1, we define the notion of an a-maximal clique in an uncertain graph. We present matching upper and lower bounds on the number of a-maximal cliques possible within a (uncertain) graph. We present an algorithm to enumerate a-maximal cliques whose worst-case runtime is near-optimal, and an experimental evaluation showing the practical utility of the algorithm.