TL;DR: The concept vectors produced by the spherical k-means algorithm constitute a powerful sparse and localized “basis” for text data sets and are localized in the word space, are sparse, and tend towards orthonormality.
Abstract: Unlabeled document collections are becoming increasingly common and availables mining such data sets represents a major contemporary challenge. Using words as features, text documents are often represented as high-dimensional and sparse vectors–a few thousand dimensions and a sparsity of 95 to 99% is typical. In this paper, we study a certain spherical k-means algorithm for clustering such document vectors. The algorithm outputs k disjoint clusters each with a concept vector that is the centroid of the cluster normalized to have unit Euclidean norm. As our first contribution, we empirically demonstrate that, owing to the high-dimensionality and sparsity of the text data, the clusters produced by the algorithm have a certain “fractal-like” and “self-similar” behavior. As our second contribution, we introduce concept decompositions to approximate the matrix of document vectorss these decompositions are obtained by taking the least-squares approximation onto the linear subspace spanned by all the concept vectors. We empirically establish that the approximation errors of the concept decompositions are close to the best possible, namely, to truncated singular value decompositions. As our third contribution, we show that the concept vectors are localized in the word space, are sparse, and tend towards orthonormality. In contrast, the singular vectors are global in the word space and are dense. Nonetheless, we observe the surprising fact that the linear subspaces spanned by the concept vectors and the leading singular vectors are quite close in the sense of small principal angles between them. In conclusion, the concept vectors produced by the spherical k-means algorithm constitute a powerful sparse and localized “basis” for text data sets.
TL;DR: In this paper, Kaczmarz and Steinhaus [I, pp. 143-144] showed that the equality W 1~~~~~~~~~~~~ |G a, ot(t) dx(t), *,Iap(t)-dx(t)] dwx (2.5) c 00 -p/2 L G(ui, *, up)euhu du,... du.
Abstract: (see, for example, Kaczmarz and Steinhaus [I, pp. 143-144]). Let (2.4) tap(t)} p = 1, 2, 3, be any orthonormal set of real functions, each belonging to L2(0, 1). Paley and 1 Wiener [II] have shown for each index p = 1, 2, that f ap(t) dx(t) exist as a generalized Stieltjes integral for almost all functions x(&) of C and that the equality W 1~~~~~~~~~~~~ |G a, ot(t) dx(t), * ,Iap(t) dx(t)] dwx (2.5) c 00 -p/2 L G(ui, * , up)euhu du, ... du.
TL;DR: In this paper, a closed-form solution to the least square problem for three or more points is presented, which requires the computation of the square root of a symmetric matrix, and the best scale is equal to the ratio of the root-mean-square deviations of the coordinates in the two systems from their respective centroids.
Abstract: Finding the relationship between two coordinate systems by using pairs of measurements of the coordinates of a number of points in both systems is a classic photogrammetric task. The solution has applications in stereophotogrammetry and in robotics. We present here a closed-form solution to the least-squares problem for three or more points. Currently, various empirical, graphical, and numerical iterative methods are in use. Derivation of a closed-form solution can be simplified by using unit quaternions to represent rotation, as was shown in an earlier paper [ J. Opt. Soc. Am. A4, 629 ( 1987)]. Since orthonormal matrices are used more widely to represent rotation, we now present a solution in which 3 × 3 matrices are used. Our method requires the computation of the square root of a symmetric matrix. We compare the new result with that obtained by an alternative method in which orthonormality is not directly enforced. In this other method a best-fit linear transformation is found, and then the nearest orthonormal matrix is chosen for the rotation. We note that the best translational offset is the difference between the centroid of the coordinates in one system and the rotated and scaled centroid of the coordinates in the other system. The best scale is equal to the ratio of the root-mean-square deviations of the coordinates in the two systems from their respective centroids. These exact results are to be preferred to approximate methods based on measurements of a few selected points.
TL;DR: In this paper, the Haar system is used to compute the Schauder Hierarchical basis for multiresolution and multilevel preconditioning, which is a nonlinear approximation in Besov spaces.
Abstract: Introduction. Notations. 1. Basic examples. 1.1 Introduction. 1.2 The Haar system. 1.3 The Schauder hierarchical basis. 1.4 Multivariate constructions. 1.5 Adaptive approximation. 1.6 Multilevel preconditioning. 1.7 Conclusions. 1.8 Historical notes. 2. Multiresolution approximation. 2.1 Introduction. 2.2 Multiresolution analysis. 2.3 Refinable functions. 2.4 Subdivision schemes. 2.5 Computing with refinable functions. 2.6 Wavelets and multiscale algorithms. 2.7 Smoothness analysis. 2.8 Polynomial exactness. 2.9 Duality, orthonormality and interpolation. 2.10 Interpolatory and orthonormal wavelets. 2.11 Wavelets and splines. 2.12 Bounded domains and boundary conditions. 2.13 Point values, cell averages, finite elements. 2.14 Conclusions. 2.15 Historical notes. 3. Approximation and smoothness. 3.1 Introduction. 3.2 Function spaces. 3.3 Direct estimates. 3.4 Inverse estimates. 3.5 Interpolation and approximation spaces. 3.6 Characterization of smoothness classes. 3.7 Lp-unstable approximation and 0 1. 3.8 Negative smoothness and Lp-spaces. 3.9 Bounded domains. 3.10 Boundary conditions. 3.11 Multilevel preconditioning. 3.12 Conclusions. 3.13 Historical notes. 4. Adaptivity. 4.1 Introduction. 4.2 Nonlinear approximation in Besov spaces. 4.3 Nonlinear wavelet approximation in Lp. 4.4 Adaptive finite element approximation. 4.5 Other types of nonlinear approximations. 4.6 Adaptive approximation of operators. 4.7 Nonlinear approximation and PDE's. 4.8 Adaptive multiscale processing. 4.9 Adaptive space refinement. 4.10 Conclusions. 4.11 Historical notes. References. Index.
TL;DR: In this paper, it was shown that for certain self-similar measures μ with support in the interval 0≤x≤1, the analytic functions (ei2πnx:n=0,1,2, …) contain an orthonormal basis in L2 (μ).
Abstract: We show that for certain self-similar measures μ with support in the interval 0≤x≤1, the analytic functions {ei2πnx:n=0,1,2, …} contain an orthonormal basis inL2 (μ). Moreover, we identify subsetsP ⊂ ℕ0 = {0,1,2,...} such that the functions {en:n ∈ P} form an orthonormal basis forL2 (μ). We also give a higher-dimensional affine construction leading to self-similar measures μ with support in ℝν, obtained from a given expansivev-by-v matrix and a finite set of translation vectors. We show that the correspondingL2 (μ) has an orthonormal basis of exponentialsei2πλ·x, indexed by vectors λ in ℝν, provided certain geometric conditions (involving the Ruelle transfer operator) hold for the affine system.