Journal Article10.48550/arXiv.2210.00415
Metric Distribution to Vector: Constructing Data Representation via Broad-Scale Discrepancies
TL;DR: A novel embedding strategy named MetricDistribution2vec is presented to extract distribution characteristics into the vectorial representation for each data to conduct pattern classification for graph-structured data.
read more
Abstract: —Graph embedding provides a feasible methodology to conduct pattern classification for graph-structured data by mapping each data into the vectorial space. Various pioneering works are essentially coding method that concentrates on a vectorial representation about the inner properties of a graph in terms of the topological constitution, node attributions, link relations, etc. However, the classification for each targeted data is a qualitative issue based on understanding the overall discrepancies within the dataset scale. From the statistical point of view, these discrepancies manifest a metric distribution over the dataset scale if the distance metric is adopted to measure the pairwise similarity or dissimilarity. Therefore, we present a novel embedding strategy named MetricDistribution2vec to extract such distribution characteristics into the vectorial representation for each data. We demonstrate the application and effectiveness of our representation method in the supervised prediction tasks on extensive real-world structural graph datasets. The results have gained some unexpected increases compared with a surge of baselines on all the datasets, even if we take the lightweight models as classifiers. Moreover, the proposed methods also conducted experiments in Few-Shot classification scenarios, and the results still show attractive discrimination in rare training samples based inference.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

TABLE 1 Statistics of the benchmark graph datasets. The columns are the name of the dataset, the number of graphs, the number of classes, the average number of nodes, and the average number of edges. Here we mention that each dataset is balanced in the number of different labeled parts, and the NCI-1, NCI-33, NCI-83, and NCI-109 are all sampled randomly from the original datasets as their vast volumes. 
Fig. 6. The visualization of the high-dimensional embedded data derived from MetricDistribution2vec in a plane by t-SNE for 12 datasets. In each subplot, different colored nodes represent different labeled graphs, and similar embedded graphs are clustered nearby on the plot. 
Fig. 1. A classification example in two dimensions to illustrate the metric distribution. In the scatter plot, each point is labeled with an allocated class denoted by binary colors (i.e., blue and orange). All the instances are separated by a segmentation boundary, denoted as the dark curve. In each class, we use dark lines to denote the distance between intra-group points and use dark dotted lines to represent the distance between inter-group points. In particular, we take the Euclidean distance as the metric in this case. In addition, the metric distributions for v1, v2, v3, and v4 are also shown. Among these four data instances, v1 and v2 are in one class, while v3 and v4 belong to another category. In each subplot, the histogram reports the distance between the targeted point with each instance (colored according to its class) within the dataset. The red curve exhibits the overall metric distribution trend. The data belonging to the same class clearly possess approximate metric distance distributions. 
Fig. 4. The illustration of the optimal transportation between frequent fragment decompositions and between vectorial frequent fragment decompositions. The cluster of red lines denotes the transference plan for this transportation scenario. 
Fig. 5. The classification accuracy sensitivities of MetricDistribution2vec using kNN, Logistic Regression, and SVM (RBF Kernel) as classifiers over the min-sup hyper-parameter are reported with different curves. The blue dotted horizontal line denotes the best result in baselines from Table 3. The different colored vertical lines reflect the best results and the corresponding values of min-sup for MetricDistribution2vec using different classifiers. In addition, The number of frequent fragments (fgs) under different min-sup is shown by the blue histograms. 
Fig. 8. This figure shows the similarity between different metric distributions of the same graph under different sampling rates. In each subplot, the horizontal axis denotes the index of each graph, and the vertical axis denotes the distance between different metric distributions. There are four types of symbols on each graph to represent the differences between metric distributions derived by 90% sampling rate and 50%, 20%, 5% sampling rates, respectively.
References
•Posted Content
Semi-Supervised Classification with Graph Convolutional Networks
Thomas Kipf,Max Welling +1 more
TL;DR: A scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs which outperforms related methods by a significant margin.
22.7K
DeepWalk: online learning of social representations
Bryan Perozzi,Rami Al-Rfou,Steven Skiena +2 more
- 24 Aug 2014
TL;DR: DeepWalk as mentioned in this paper uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences, which encode social relations in a continuous vector space, which is easily exploited by statistical models.
•Posted Content
node2vec: Scalable Feature Learning for Networks
Aditya Grover,Jure Leskovec +1 more
TL;DR: In node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks, a flexible notion of a node's network neighborhood is defined and a biased random walk procedure is designed, which efficiently explores diverse neighborhoods.
6.6K
•Book
Topics in Optimal Transportation
Cédric Villani
- 01 Mar 2003
TL;DR: In this paper, the metric side of optimal transportation is considered from a differential point of view on optimal transportation, and the Kantorovich duality of the optimal transportation problem is investigated.
6.1K
•Proceedings Article
Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering
Mikhail Belkin,Partha Niyogi +1 more
- 03 Jan 2001
TL;DR: The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality preserving properties and a natural connection to clustering.