1. What is Wasserstein distance?
Wasserstein distance, also known as Earth-Mover distance, is a popular metric in machine learning research for measuring the distance between distributions. It is suitable for estimating distributions with disjoint supports, unlike other commonly used metrics such as KL divergence and Jensen-Shannon (JS) divergence. Wasserstein distance has gained popularity in various applications, including Wasserstein Generative Adversarial Networks (WGAN) and Wasserstein Autoencoder (WAE). In Wasserstein Adversarially Regularized Graph Autoencoder (WARGA), Wasserstein distance is used to directly regularize the encoded latent distribution to a target distribution, providing a more natural explanation for regularization compared to adversarial methods that use artificially designed discriminators.
read more
2. What is the optimization goal of Variational Graph Autoencoder (VGAE)?
The optimization goal of Variational Graph Autoencoder (VGAE) is to maximize the variational lower bound L. This is achieved by minimizing the cross entropy loss and the KL divergence between the encoded latent distribution q(z|A,X) and the prior distribution p(z). By doing so, VGAE aims to reconstruct the adjacency matrix A with high accuracy while ensuring that the encoded latent embeddings z follow the specified prior distribution. This optimization process allows VGAE to perform link prediction tasks effectively by capturing the underlying structure and features of the graph.
read more
3. What is the 1-Wasserstein distance?
The 1-Wasserstein distance is a measure between two distributions P r and P g, defined as the expected value of the distance between random variables r and g. It is intractable in its original form, but can be reformulated using Kantorovich-Rubinstein duality. This reformulation involves a continuous function f that satisfies 1-Lipschitz continuity. In Wasserstein GAN, the Wasserstein distance is solved using a specific equation with a parameterized function f, which is constrained to be 1-Lipschitz continuous. In practice, the parameters of f are clipped into a fixed range to facilitate optimization during iterations.
read more
4. How does the Graph Autoencoder use GCN for node feature encoding?
The Graph Autoencoder utilizes a 2-layer GCN as generator G w (A, X) to encode the original node features X R N xc with the topological structure A into a low-dimensional representation. The new 'weighted' adjacency matrix A is obtained by adding the degree matrix Dii to the adjacency matrix A, where Dii is the degree of each node in A. ReLU activation function is used in the neural networks with weights W 1 R cxd and W 2 R dxe. The output matrix Z contains latent embeddings z i for each node v i V as row-vectors. The latent representation follows a standard Gaussian distribution N (0, I), denoted by P r. The distribution of the embedding generated by G w is denoted by P g (z|A, X). The objective is to minimize the cross entropy loss over the parameters w in G by reconstructing the adjacency matrix using an inner-product decoder. The sigmoid function s(t) = 1/(1 + exp(-t)) is used to restrain the output into the range (0,1).
read more