TL;DR: This survey describes and classify top-k processing techniques in relational databases including query models, data access methods, implementation levels, data and query certainty, and supported scoring functions, and shows the implications of each dimension on the design of the underlying techniques.
Abstract: Efficient processing of top-k queries is a crucial requirement in many interactive environments that involve massive amounts of data. In particular, efficient top-k processing in domains such as the Web, multimedia search, and distributed systems has shown a great impact on performance. In this survey, we describe and classify top-k processing techniques in relational databases. We discuss different design dimensions in the current techniques including query models, data access methods, implementation levels, data and query certainty, and supported scoring functions. We show the implications of each dimension on the design of the underlying techniques. We also discuss top-k queries in XML domain, and show their connections to relational approaches.
TL;DR: A new spectral-embedding algorithm, namely, multiview spectral embedding (MSE), which can encode different features in different ways, to achieve a physically meaningful embedding and explores the complementary property of different views.
Abstract: In computer vision and multimedia search, it is common to use multiple features from different views to represent an object. For example, to well characterize a natural scene image, it is essential to find a set of visual features to represent its color, texture, and shape information and encode each feature into a vector. Therefore, we have a set of vectors in different spaces to represent the image. Conventional spectral-embedding algorithms cannot deal with such datum directly, so we have to concatenate these vectors together as a new vector. This concatenation is not physically meaningful because each feature has a specific statistical property. Therefore, we develop a new spectral-embedding algorithm, namely, multiview spectral embedding (MSE), which can encode different features in different ways, to achieve a physically meaningful embedding. In particular, MSE finds a low-dimensional embedding wherein the distribution of each view is sufficiently smooth, and MSE explores the complementary property of different views. Because there is no closed-form solution for MSE, we derive an alternating optimization-based iterative algorithm to obtain the low-dimensional embedding. Empirical evaluations based on the applications of image retrieval, video annotation, and document clustering demonstrate the effectiveness of the proposed approach.
TL;DR: In this article, a database search system that retrieves multimedia information in a flexible, user friendly system is presented, which uses a multimedia database consisting of text, picture, audio, and animated data.
Abstract: A database search system that retrieves multimedia information in a flexible, user friendly system. The search system uses a multimedia database consisting of text, picture, audio and animated data. That database is searched through multiple graphical and textual entry paths. Those entry paths include an idea search, a title finder search, a topic tree search, a picture explorer search, a history timeline search, a world atlas search, a researcher's assistant search, and a feature articles search.
TL;DR: Self-Paced Reranking (SPaR) is proposed, a novel reranking approach for multimodal data that offers a unified framework providing theoretical justifications for current reranking methods, and generates a spectrum of new reranking schemes.
Abstract: Reranking has been a focal technique in multimedia retrieval due to its efficacy in improving initial retrieval results. Current reranking methods, however, mainly rely on the heuristic weighting. In this paper, we propose a novel reranking approach called Self-Paced Reranking (SPaR) for multimodal data. As its name suggests, SPaR utilizes samples from easy to more complex ones in a self-paced fashion. SPaR is special in that it has a concise mathematical objective to optimize and useful properties that can be theoretically verified. It on one hand offers a unified framework providing theoretical justifications for current reranking methods, and on the other hand generates a spectrum of new reranking schemes. This paper also advances the state-of-the-art self-paced learning research which potentially benefits applications in other fields. Experimental results validate the efficacy and the efficiency of the proposed method on both image and video search tasks. Notably, SPaR achieves by far the best result on the challenging TRECVID multimedia event search task.
TL;DR: This paper proposes a novel cross-modal hashing approach with a linear time complexity to the training data size, to enable scalable indexing for multimedia search across multiple modals and proves that this new representation preserves the intra-similarity in each modal.
Abstract: Most existing cross-modal hashing methods suffer from the scalability issue in the training phase. In this paper, we propose a novel cross-modal hashing approach with a linear time complexity to the training data size, to enable scalable indexing for multimedia search across multiple modals. Taking both the intra-similarity in each modal and the inter-similarity across different modals into consideration, the proposed approach aims at effectively learning hash functions from large-scale training datasets. More specifically, for each modal, we first partition the training data into $k$ clusters and then represent each training data point with its distances to $k$ centroids of the clusters. Interestingly, such a k-dimensional data representation can reduce the time complexity of the training phase from traditional O(n2) or higher to O(n), where $n$ is the training data size, leading to practical learning on large-scale datasets. We further prove that this new representation preserves the intra-similarity in each modal. To preserve the inter-similarity among data points across different modals, we transform the derived data representations into a common binary subspace in which binary codes from all the modals are "consistent" and comparable. nThe transformation simultaneously outputs the hash functions for all modals, which are used to convert unseen data into binary codes. Given a query of one modal, it is first mapped into the binary codes using the modal's hash functions, followed by matching the database binary codes of any other modals. Experimental results on two benchmark datasets confirm the scalability and the effectiveness of the proposed approach in comparison with the state of the art.