TL;DR: A closed graph pattern mining algorithm, CloseGraph, is developed by exploring several interesting pruning methods and shows that it not only dramatically reduces unnecessary subgraphs to be generated but also substantially increases the efficiency of mining, especially in the presence of large graph patterns.
Abstract: Recent research on pattern discovery has progressed form mining frequent itemsets and sequences to mining structured patterns including trees, lattices, and graphs. As a general data structure, graph can model complicated relations among data with wide applications in bioinformatics, Web exploration, and etc. However, mining large graph patterns in challenging due to the presence of an exponential number of frequent subgraphs. Instead of mining all the subgraphs, we propose to mine closed frequent graph patterns. A graph g is closed in a database if there exists no proper supergraph of g that has the same support as g. A closed graph pattern mining algorithm, CloseGraph, is developed by exploring several interesting pruning methods. Our performance study shows that CloseGraph not only dramatically reduces unnecessary subgraphs to be generated but also substantially increases the efficiency of mining, especially in the presence of large graph patterns.
TL;DR: This paper introduces two techniques for graph-based anomaly detection, and introduces a new method for calculating the regularity of a graph, with applications to anomaly detection.
Abstract: Anomaly detection is an area that has received much attention in recent years. It has a wide variety of applications, including fraud detection and network intrusion detection. A good deal of research has been performed in this area, often using strings or attribute-value data as the medium from which anomalies are to be extracted. Little work, however, has focused on anomaly detection in graph-based data. In this paper, we introduce two techniques for graph-based anomaly detection. In addition, we introduce a new method for calculating the regularity of a graph, with applications to anomaly detection. We hypothesize that these methods will prove useful both for finding anomalies, and for determining the likelihood of successful anomaly detection within graph-based data. We provide experimental results using both real-world network intrusion data and artificially-created data.
TL;DR: Two algorithms for determining expertise from email were compared: a content-based approach that takes account only of email text, and a graph-based ranking algorithm (HITS) that take account both of text and communication patterns.
Abstract: A common method for finding information in an organization is to use social networks---ask people, following referrals until someone with the right information is found. Another way is to automatically mine documents to determine who knows what. Email documents seem particularly well suited to this task of "expertise location", as people routinely communicate what they know. Moreover, because people explicitly direct email to one another, social networks are likely to be contained in the patterns of communication. Can these patterns be used to discover experts on particular topics? Is this approach better than mining message content alone? To find answers to these questions, two algorithms for determining expertise from email were compared: a content-based approach that takes account only of email text, and a graph-based ranking algorithm (HITS) that takes account both of text and communication patterns. An evaluation was done using email and explicit expertise ratings from two different organizations. The rankings given by each algorithm were compared to the explicit rankings with the precision and recall measures commonly used in information retrieval, as well as the d' measure commonly used in signal-detection theory. Results show that the graph-based algorithm performs better than the content-based algorithm at identifying experts in both cases, demonstrating that the graph-based algorithm effectively extracts more information than is found in content alone.
TL;DR: A method of assigning functions based on a probabilistic analysis of graph neighborhoods in a protein-protein interaction network that exploits the fact that graph neighbors are more likely to share functions than nodes which are not neighbors.
Abstract: Motivation: The development of experimental methods for genome scale analysis of molecular interaction networks has made possible new approaches to inferring protein function. This paper describes a method of assigning functions based on a probabilistic analysis of graph neighborhoods in a protein-protein interaction network. The method exploits the fact that graph neighbors are more likely to share functions than nodes which are not neighbors. A binomial model of local neighbor function labeling probability is combined with a Markov random field propagation algorithm to assign function probabilities for proteins in the network. Results: We applied the method to a protein-protein interaction dataset for the yeast Saccharomyces cerevisiae using the Gene Ontology (GO) terms as function labels. The method reconstructed known GO term assignments with high precision, and produced putative GO assignments to 320 proteins that currently lack GO annotation, which represents about 10% of the unlabeled proteins in S. cere
TL;DR: In this article, a promising approach to graph clustering is based on the intuitive notion of intra-cluster density vs. intercluster sparsity, and a new approach that compares favorably with graph partitioning and geometric clustering.
Abstract: A promising approach to graph clustering is based on the intuitive notion of intra-cluster density vs. inter-cluster sparsity. While both formalizations and algorithms focusing on particular aspects of this rather vague concept have been proposed no conclusive argument on their appropriateness has been given. As a first step towards understanding the consequences of particular con- ceptions, we conducted an experimental evaluation of graph clustering approaches. By combining proven techniques from graph partitioning and geometric clustering, we also introduce a new approach that compares favorably.
TL;DR: A new generic graph model for traffic grooming in heterogeneous WDM mesh networks, based on the auxiliary graph, is proposed which can achieve various objectives using different grooming policies, while taking into account various constraints such as transceivers, wavelengths, wavelength-conversion capabilities, and grooming capabilities.
Abstract: As the operation of our fiber-optic backbone networks migrates from interconnected SONET rings to arbitrary mesh topology, traffic grooming on wavelength-division multiplexing (WDM) mesh networks becomes an extremely important research problem. To address this problem, we propose a new generic graph model for traffic grooming in heterogeneous WDM mesh networks. The novelty of our model is that, by only manipulating the edges of the auxiliary graph created by our model and the weights of these edges, our model can achieve various objectives using different grooming policies, while taking into account various constraints such as transceivers, wavelengths, wavelength-conversion capabilities, and grooming capabilities. Based on the auxiliary graph, we develop an integrated traffic-grooming algorithm (IGABAG) and an integrated grooming procedure (INGPROC) which jointly solve several traffic-grooming subproblems by simply applying the shortest-path computation method. Different grooming policies can be represented by different weight-assignment functions, and the performance of these grooming policies are compared under both nonblocking scenario and blocking scenario. The IGABAG can be applied to both static and dynamic traffic grooming. In static grooming, the traffic-selection scheme is key to achieving good network performance. We propose several traffic-selection schemes based on this model and we evaluate their performance for different network topologies.
TL;DR: The D(k) index is introduced, an adaptive structural summary for general graph structured documents based on the concept of bisimilarity, and is shown to be a more effective structural summary than previous static ones, as a result of its query load sensitivity.
Abstract: To facilitate queries over semi-structured data, various structural summaries have been proposed. Structural summaries are derived directly from the data and serve as indices for evaluating path expressions on semi-structured or XML data. We introduce the D(k) index, an adaptive structural summary for general graph structured documents. Building on previous work, 1-index and A(k) index, the D(k)-index is also based on the concept of bisimilarity. However, as a generalization of the 1-index and A(k)-index, the D(k) index possesses the adaptive ability to adjust its structure according to the current query load. This dynamism also facilitates efficient update algorithms, which are crucial to practical applications of structural indices, but have not been adequately addressed in previous index proposals. Our experiments show that the D(k) index is a more effective structural summary than previous static ones, as a result of its query load sensitivity. In addition, update operations on the D(k) index can be performed more efficiently than on its predecessors.
TL;DR: In this paper, the authors propose an approach to topology control based on the principle of maintaining the number of neighbors of every node equal to or slightly below a specific value k. The approach enforces symmetry on the resulting communication graph, thereby easing the operation of higher layer protocols.
Abstract: We propose an approach to topology control based on the principle of maintaining the number of neighbors of every node equal to or slightly below a specific value k. The approach enforces symmetry on the resulting communication graph, thereby easing the operation of higher layer protocols. To evaluate the performance of our approach, we estimate the value of k that guarantees connectivity of the communication graph with high probability. We then define k-Neigh, a fully distributed, asynchronous, and localized protocol that follows the above approach and uses distance estimation. We prove that k-Neigh terminates at every node after a total of 2n messages have been exchanged (with n nodes in the network) and within strictly bounded time. Finally, we present simulations results which show that our approach is about 20% more energy-efficient than a widely-studied existing protocol.
TL;DR: This paper presents the notion of Semantic Associations as complex relationships between resource entities based on a specific notion of similarity called r-isomorphism, and formalizes these notions for the RDF data model, by introducing a notion of a Property Sequence as a type.
Abstract: This paper presents the notion of Semantic Associations as complex relationships between resource entities. These relationships capture both a connectivity of entities as well as similarity of entities based on a specific notion of similarity called r-isomorphism. It formalizes these notions for the RDF data model, by introducing a notion of a Property Sequence as a type. In the context of a graph model such as that for RDF, Semantic Associations amount to specific certain graph signatures. Specifically, they refer to sequences (i.e. directed paths) here called Property Sequences, between entities, networks of Property Sequences (i.e. undirected paths), or subgraphs of r-isomorphic Property Sequences.The ability to query about the existence of such relationships is fundamental to tasks in analytical domains such as national security and business intelligence, where tasks often focus on finding complex yet meaningful and obscured relationships between entities. However, support for such queries is lacking in contemporary query systems, including those for RDF.
TL;DR: In this article, a general weak law of large numbers for functionals of binomial point processes in d-dimensional space is established, with a limit that depends explicitly on the density of the point process.
Abstract: Using a coupling argument, we establish a general weak law of large numbers for functionals of binomial point processes in d-dimensional space, with a limit that depends explicitly on the (possibly nonuniform) density of the point process. The general result is applied to the minimal spanning tree, the k-nearest neighbors graph, the Voronoi graph and the sphere of influence graph. Functionals of interest include total edge length with arbitrary weighting, number of vertices of specified degree and number of components. We also obtain weak laws of large numbers functionals of marked point processes, including statistics of Boolean models.
TL;DR: This work defines graph algebras and reveals their applicability to automata theory and explores assorted monoids, semigroups, rings, codes, and other algebraic structures to outline theorems and algorithms for finite state automata and grammars.
Abstract: Graph algebras possess the capacity to relate fundamental concepts of computer science, combinatorics, graph theory, operations research, and universal algebra. They are used to identify nontrivial connections across notions, expose conceptual properties, and mediate the application of methods from one area toward questions of the other four. After a concentrated review of the prerequisite mathematical background, Graph Algebras and Automata defines graph algebras and reveals their applicability to automata theory. It proceeds to explore assorted monoids, semigroups, rings, codes, and other algebraic structures and to outline theorems and algorithms for finite state automata and grammars.
TL;DR: Methods characterizing average measures of connectivity, similarity of connection patterns, connectedness and components, paths, walks and cycles, distances, cluster indices, ranges and shortcuts, and node and edge cut sets are introduced and discussed in a neurobiological context.
Abstract: This paper summarizes a set of graph theory methods that are of special relevance to the computational analysis of neural connectivity patterns. Methods characterizing average measures of connectivity, similarity of connection patterns, connectedness and components, paths, walks and cycles, distances, cluster indices, ranges and shortcuts, and node and edge cut sets are introduced and discussed in a neurobiological context. A set of Matlab functions implementing these methods is available for download at http://php.indiana.edu/~osporns/graphmeasures.htm.
TL;DR: Algorithms in C++ makes the case that information on pattern matching algorithms is not well understood except by experts in the area, and that for non-experts useful, practical implementations are nearly impossible to construct from available literature.
Abstract: deeper into graph theory, thereby generating algorithms that are more challenging to the reader. Topics such as Depth-First Search, Hamiltonian Paths, Kruskal's Algorithm and Euclidean Networks are explored in detail. I have studied graph theory and therefore I was able to appreciate the examples and algorithms given in the text. However, I believe the author gives enough of an introduction in the beginning and explanations throughout the text so that a reader without any prior exposure to graph theory can still gain valuable experience in developing algorithms to solve complex problems. This book would be an excellent tool for a graph theory course (assuming the student is familiar with programming) or perhaps an advanced programming course dealing with algorithms or object oriented design methods. I found that the explanations of theorems and proofs in this text were excellent and helped me to further my knowledge and appreciation of graph theory. The object-oriented approach to implementing algorithms in C++ broadened my programming experience and helped to keep my interest in the topic. Occasionally the author assumes that the reader either has read the first volume, or has the text available for review. The first two volumes can be purchased as a bundle, and I suggest the reader consider obtaining both texts. However the programs from both volumes are available for download on the author's website, so it is not necessary to have both books if the reader is comfortable with programming topics such as queues. Overall, I enjoyed Algorithms in C++, and I plan to purchase the first and third volumes to compliment this text. I am certain that I will refer to all three in the future when I am in need of guidance, or perhaps even diversion. Pattern matching in strings is a basic problem in many areas of computer science, but particularly in applications that deal with text searching and genetic sequences. Information retrieval and computational biology are generating dramatic increases both in the size of texts to search and in the sophistication of the searches. The authors are two academics with bioinformatics industry experience. They use this book to make the case that information on pattern matching algorithms is not well understood except by experts in the area, and that for non-experts useful, practical implementations are nearly impossible to construct from available literature. Further , they claim that the only way to truly determine the fastest algorithm …
TL;DR: The DFuse architectural framework, DFuse, consists of a data fusion API and a distributed algorithm for energy-aware role assignment that enables an application to be specified as a coarse-grained dataflow graph, and eases application development and deployment.
Abstract: Simple in-network data aggregation (or fusion) techniques for sensor networks have been the focus of several recent research efforts, but they are insufficient to support advanced fusion applications. We extend these techniques to future sensor networks and ask two related questions: (a) what is the appropriate set of data fusion techniques, and (b) how do we dynamically assign aggregation roles to the nodes of a sensor network. We have developed an architectural framework, DFuse, for answering these two questions. It consists of a data fusion API and a distributed algorithm for energy-aware role assignment. The fusion API enables an application to be specified as a coarse-grained dataflow graph, and eases application development and deployment. The role assignment algorithm maps the graph onto the network, and optimally adapts the mapping at run-time using role migration. Experiments on an iPAQ farm show that, the fusion API has low-overhead, and the role assignment algorithm with role migration significantly increases the network lifetime compared to any static assignment.
TL;DR: In this application, the flow of a temporal graph is utilized to group similar clusters into scenes, while the attention values are used as guidelines to select appropriate subshots in scenes for summarization.
Abstract: We propose a unified approach for summarization based on the analysis of video structures and video highlights. Our approach emphasizes both the content balance and perceptual quality of a summary. Normalized cut algorithm is employed to globally and optimally partition a video into clusters. A motion attention model based on human perception is employed to compute the perceptual quality of shots and clusters. The clusters, together with the computed attention values, form a temporal graph similar to Markov chain that inherently describes the evolution and perceptual importance of video clusters. In our application, the flow of a temporal graph is utilized to group similar clusters into scenes, while the attention values are used as guidelines to select appropriate subshots in scenes for summarization.
TL;DR: A new statistical method for constructing a genetic network from microarray gene expression data by using a Bayesian network is proposed and a new graph selection criterion from Bayesian approach in general situations is theoretically derived.
Abstract: We propose a new statistical method for constructing a genetic network from microarray gene expression data by using a Bayesian network. An essential point of Bayesian network construction is the estimation of the conditional distribution of each random variable. We consider fitting nonparametric regression models with heterogeneous error variances to the microarray gene expression data to capture the nonlinear structures between genes. Selecting the optimal graph, which gives the best representation of the system among genes, is still a problem to be solved. We theoretically derive a new graph selection criterion from Bayes approach in general situations. The proposed method includes previous methods based on Bayesian networks. We demonstrate the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae gene expression data newly obtained by disrupting 100 genes.
TL;DR: In this article, the authors present a method for reporting data network monitoring information, which includes accessing performance metrics values for a network component and generating a trace of graph data points for the performance metric values.
Abstract: A method for reporting data network monitoring information. The method includes accessing performance metric values for a network component and generating a trace of graph data points for the performance metric values. For a range of the trace, a histogram is built and displayed corresponding to the graph data points (step 430). For a user interface, a performance monitoring display is generated including a graph of the trace relative to an x-axis and a y-axis and a representation of the histogram. Using the graphical user interface (GUI), the user can access a selection mechanism by a moving the range selector to define the selected histogram range (steps 440 and 470). The graph data points in the trace corresponds to a histogram previously built from the performance metric values, and the trace is generated by determining and plotting an average value of each of the graph data point histograms. The building of the histogram for the performance monitoring display involves combining the graph data point histograms corresponding to the graph data points in selected histogram range (step 460).
TL;DR: An adaptive and decentralized algorithm that progressively refines the placement of operators by walking through neighbor nodes is described, which can achieve near optimal placement onto various graph topologies despite the risks of local minima.
Abstract: In-network query processing is critical for reducing network traffic when accessing and manipulating sensor data It requires placing a tree of query operators such as filters and aggregations but also correlations onto sensor nodes in order to minimize the amount of data transmitted in the network In this paper, we show that this problem is a variant of the task assignment problem for which polynomial algorithms have been developed These algorithms are however centralized and cannot be used in a sensor network We describe an adaptive and decentralized algorithm that progressively refines the placement of operators by walking through neighbor nodes Simulation results illustrate the potential benefits of our approach They also show that our placement strategy can achieve near optimal placement onto various graph topologies despite the risks of local minima
TL;DR: In this article, the authors propose two methods for inferring semantic similarity between terms from a corpus, one based on word-similarity and the other based on document similarity, giving rise to a system of equations whose equilibrium point they use to obtain a semantic similarity measure.
Abstract: The standard representation of text documents as bags of words suffers from well known limitations, mostly due to its inability to exploit semantic similarity between terms. Attempts to incorporate some notion of term similarity include latent semantic indexing [8], the use of semantic networks [9], and probabilistic methods [5]. In this paper we propose two methods for inferring such similarity from a corpus. The first one defines word-similarity based on document-similarity and viceversa, giving rise to a system of equations whose equilibrium point we use to obtain a semantic similarity measure. The second method models semantic relations by means of a diffusion process on a graph defined by lexicon and co-occurrence information. Both approaches produce valid kernel functions parametrised by a real number. The paper shows how the alignment measure can be used to successfully perform model selection over this parameter. Combined with the use of support vector machines we obtain positive results.
TL;DR: In this article, a method for identifying related pages among a plurality of pages in a linked database such as the World Wide Web is described, in which an initial page is selected from the plurality of web pages and pages linked to the initial page are represented as a graph in a memory.
Abstract: A method is described for identifying related pages among a plurality of pages in a linked database such as the World Wide Web. An initial page is selected from the plurality of pages. Pages linked to the initial page are represented as a graph in a memory. The pages represented in the graph are scored on content, and a set of pages is selected, the selected set of pages having scores greater than a first predetermined threshold. The selected set of pages is scored on connectivity, and a subset of the set of pages that have scores greater than a second predetermined threshold are selected as related pages.
TL;DR: The co-authorship graph obtained from all papers published at SIGMOD between 1975 and 2002 is investigated, finding some interesting facts, for instance, the identity of the authors who, on average, are "closest" to all other authors at a given time.
Abstract: In this paper we investigate the co-authorship graph obtained from all papers published at SIGMOD between 1975 and 2002. We find some interesting facts, for instance, the identity of the authors who, on average, are "closest" to all other authors at a given time. We also show that SIGMOD's co-authorship graph is yet another example of a small world---a graph topology which has received a lot of attention recently. A companion web site for this paper can be found at http://db.cs.ualberta.ca/coauthorship.
TL;DR: It is shown that the Gaussian random fields and harmonic energy minimizing function framework for semi-supervised learning can be viewed in terms of Gaussian processes, with covariance matrices derived from the graph Laplacian, to derive hyperparameter learning with evidence maximization.
Abstract: "We show that the Gaussian random fields and harmonic energy minimizing function framework for semi-supervised learning can be viewed in terms of Gaussian processes, with covariance matrices derived from the graph Laplacian. We derive hyperparameter learning with evidence maximization, and give an empirical study of various ways to parameterize the graph weights."
TL;DR: This paper proposes three theoretical methods for taking into account this distribution P(x) for regularization and provides links to existing graph-based semi-supervised learning algorithms.
Abstract: We address in this paper the question of how the knowledge of the marginal distribution P(x) can be incorporated in a learning algorithm. We suggest three theoretical methods for taking into account this distribution for regularization and provide links to existing graph-based semi-supervised learning algorithms. We also propose practical implementations.
TL;DR: This work proposes a new graph embedding scheme called Big-Bang simulation (BBS), which simulates an explosion of particles under force field derived from embedding error and is shown to be significantly more accurate, compared to all other embedding methods including GNP.
Abstract: Embedding of a graph metric in Euclidean space efficiently and accurately is an important problem in general with applications in topology aggregation, closest mirror selection, and application level routing. We propose a new graph embedding scheme called Big-Bang simulation (BBS), which simulates an explosion of particles under force field derived from embedding error. BBS is shown to be significantly more accurate, compared to all other embedding methods including GNP. We report an extensive simulation study of BBS compared with several known embedding scheme and show its advantage for distance estimation (as in the IDMaps project), mirror selection and topology aggregation.
TL;DR: Despite their simplicity, it is shown how the latter networks might be used for solving an NP-complete problem, namely the “3-colorability problem”, in linear time and linear resources (nodes, symbols, rules).
Abstract: In this paper we consider networks of evolutionary processors as language generating and computational devices. When the filters are regular languages one gets the computational power of Turing machines with networks of size at most six, depending on the underlying graph. When the filters are defined by random context conditions, we obtain an incomparability result with the families of regular and context-free languages. Despite their simplicity, we show how the latter networks might be used for solving an NP-complete problem, namely the “3-colorability problem”, in linear time and linear resources (nodes, symbols, rules).
TL;DR: This tree-dependent component analysis (TCA) provides a tractable and flexible approach to weakening the assumption of independence in ICA, and is able to fit models that incorporate tree-structured dependencies among multiple time series.
Abstract: We present a generalization of independent component analysis (ICA), where instead of looking for a linear transform that makes the data components independent, we look for a transform that makes the data components well fit by a tree-structured graphical model. This tree-dependent component analysis (TCA) provides a tractable and flexible approach to weakening the assumption of independence in ICA. In particular, TCA allows the underlying graph to have multiple connected components, and thus the method is able to find "clusters" of components such that components are dependent within a cluster and independent between clusters. Finally, we make use of a notion of graphical models for time series due to Brillinger (1996) to extend these ideas to the temporal setting. In particular, we are able to fit models that incorporate tree-structured dependencies among multiple time series.
TL;DR: Stochastic roadmap simulation (SRS) is introduced as a new computational approach for exploring the kinetics of molecular motion by simultaneously examining multiple pathways and converges to the same distribution as Monte Carlo simulation.
Abstract: Classic molecular motion simulation techniques, such as Monte Carlo (MC) simulation, generate motion pathways one at a time and spend most of their time in the local minima of the energy landscape defined over a molecular conformation space. Their high computational cost prevents them from being used to compute ensemble properties (properties requiring the analysis of many pathways). This paper introduces stochastic roadmap simulation (SRS) as a new computational approach for exploring the kinetics of molecular motion by simultaneously examining multiple pathways. These pathways are compactly encoded in a graph, which is constructed by sampling a molecular conformation space at random. This computation, which does not trace any particular pathway explicitly, circumvents the local-minima problem. Each edge in the graph represents a potential transition of the molecule and is associated with a probability indicating the likelihood of this transition. By viewing the graph as a Markov chain, ensemble properties can be efficiently computed over the entire molecular energy landscape. Furthermore, SRS converges to the same distribution as MC simulation. SRS is applied to two biological problems: computing the probability of folding, an important order parameter that measures the "kinetic distance" of a protein's conformation from its native state; and estimating the expected time to escape from a ligand-protein binding site. Comparison with MC simulations on protein folding shows that SRS produces arguably more accurate results, while reducing computation time by several orders of magnitude. Computational studies on ligand-protein binding also demonstrate SRS as a promising approach to study ligand-protein interactions.
TL;DR: This work represents the 3D human body as a graphical model in which the relationships between the body parts are represented by conditional probability distributions, and exploits a recently introduced generalization of the particle filter to approximate belief propagation in such a graph.
Abstract: The detection and pose estimation of people in images and video is made challenging by the variability of human appearance, the complexity of natural scenes, and the high dimensionality of articulated body models. To cope with these problems we represent the 3D human body as a graphical model in which the relationships between the body parts are represented by conditional probability distributions. We formulate the pose estimation problem as one of probabilistic inference over a graphical model where the random variables correspond to the individual limb parameters (position and orientation). Because the limbs are described by 6-dimensional vectors encoding pose in 3-space, discretization is impractical and the random variables in our model must be continuous-valued. To approximate belief propagation in such a graph we exploit a recently introduced generalization of the particle filter. This framework facilitates the automatic initialization of the body-model from low level cues and is robust to occlusion of body parts and scene clutter.
TL;DR: This approach emphasizes reachability via an algorithm within the implicit graph structure underlying a recommender dataset and allows us to consider questions relating algorithmic parameters to properties of the datasets.
Abstract: We present a novel framework for studying recommendation algorithms in terms of the ‘jumps’ that they make to connect people to artifacts. This approach emphasizes reachability via an algorithm within the implicit graph structure underlying a recommender dataset and allows us to consider questions relating algorithmic parameters to properties of the datasets. For instance, given a particular algorithm ‘jump,’ what is the average path length from a person to an artifact? Or, what choices of minimum ratings and jumps maintain a connected graph? We illustrate the approach with a common jump called the ‘hammock’ using movie recommender datasets.