TL;DR: An efficient optimal feature selection algorithm by optimizing the objective function of Orthogonal Centroid (OC) subspace learning algorithm in a discrete solution space, called OCFS is proposed.
Abstract: Text categorization is an important research area in many Information Retrieval (IR) applications. To save the storage space and computation time in text categorization, efficient and effective algorithms for reducing the data before analysis are highly desired. Traditional techniques for this purpose can generally be classified into feature extraction and feature selection. Because of efficiency, the latter is more suitable for text data such as web documents. However, many popular feature selection techniques such as Information Gain (IG) andχ2-test (CHI) are all greedy in nature and thus may not be optimal according to some criterion. Moreover, the performance of these greedy methods may be deteriorated when the reserved data dimension is extremely low. In this paper, we propose an efficient optimal feature selection algorithm by optimizing the objective function of Orthogonal Centroid (OC) subspace learning algorithm in a discrete solution space, called Orthogonal Centroid Feature Selection (OCFS). Experiments on 20 Newsgroups (20NG), Reuters Corpus Volume 1 (RCV1) and Open Directory Project (ODP) data show that OCFS is consistently better than IG and CHI with smaller computation time especially when the reduced dimension is extremely small.
TL;DR: In this paper, the location of the image centroid of Sgr A* should depend on observing frequency because of relativistic and radiative transfer effects, and the same effects introduce a generic dependence of the source polarization on frequency.
Abstract: The inferred black hole in the Galactic center spans the largest angle on the sky among all known black holes. Forthcoming observational programs plan to localize or potentially resolve the image of Sgr A* to an exquisite precision, comparable to the scale of the black hole horizon. Here we show that the location of the image centroid of Sgr A* should depend on observing frequency because of relativistic and radiative transfer effects. The same effects introduce a generic dependence of the source polarization on frequency. Future detection of the predicted centroid shift and the polarization dependence on frequency can be used to determine the unknown black hole spin and verify the validity of General Relativity.
TL;DR: This paper presents an efficient algorithm to implement a k-means clustering that produces clusters comparable to slower methods, but with much better performance.
Abstract: The k-means algorithm is one of the most widely used methods to partition a dataset into groups of patterns. However, most k-means methods require expensive distance calculations of centroids to achieve convergence. In this paper, we present an efficient algorithm to implement a k-means clustering that produces clusters comparable to slower methods. In our algorithm, we partition the original dataset into blocks; each block unit, called a unit block (UB), contains at least one pattern. We can locate the centroid of a unit block (CUB) by using a simple calculation. All the computed CUBs form a reduced dataset that represents the original dataset. The reduced dataset is then used to compute the final centroid of the original dataset. We only need to examine each UB on the boundary of candidate clusters to find the closest final centroid for every pattern in the UB. In this way, we can dramatically reduce the time for calculating final converged centroids. In our experiments, this algorithm produces comparable clustering results as other k-means algorithms, but with much better performance.
TL;DR: A fast centroid molecular dynamics methodology is proposed in which the effective centroid forces are predetermined through a force-matching algorithm applied to a standard path integral molecular dynamics simulation, which greatly reduces the computational cost of generating centroid trajectories, thus extending the applicability of CMD.
Abstract: A fast centroid molecular dynamics (CMD) methodology is proposed in which the effective centroid forces are predetermined through a force-matching algorithm applied to a standard path integral molecular dynamics simulation. The resulting method greatly reduces the computational cost of generating centroid trajectories, thus extending the applicability of CMD. The method is applied to the study of liquid para-hydrogen at two state points and liquid ortho-deuterium at one state point. The static and dynamical results are compared to those obtained from full adiabatic CMD simulations and found to be in excellent agreement for all three systems; the transport properties are also compared to experiment and found to have a similar level of agreement.
TL;DR: A new adaptive approach of color quantization that can significantly reduce the time consumption during the process compared with available methods but still maintains a good quality (greater than 30dB of PSNR) and is significantly faster than existing algorithms.
TL;DR: This work addresses the consensus clustering problem of combining multiple partitions of a set of objects into a single consolidated partition and presents two combining methods based on similarity-based graph partitioning and evaluates its effectiveness on both artificial and real datasets.
Abstract: We address the consensus clustering problem of combining multiple partitions of a set of objects into a single consolidated partition. The input here is a set of cluster labelings and we do not access the original data or clustering algorithms that determine these partitions. After introducing the distribution-based view of partitions, we propose a series of entropy-based distance functions for comparing various partitions. Given a candidate partition set, consensus clustering is then formalized as an optimization problem of searching for a centroid partition with the smallest distance to that set. In addition to directly selecting the local centroid candidate, we also present two combining methods based on similarity-based graph partitioning. Under certain conditions, the centroid partition is likely to be top/middle-ranked in terms of closeness to the true partition. Finally we evaluate its effectiveness on both artificial and real datasets, with candidates from either the full space or the subspace.
TL;DR: A general algorithm for fitting arbitrary channel width transistors in a two-dimensional common centroid MOS transistor matrix is presented and it is shown that this algorithm guarantees the layout of the transistor unit-circuit not only to be completeCommon centroid, but also optimal in all thecommon centroid structures.
Abstract: A general algorithm for fitting arbitrary channel width transistors in a two-dimensional common centroid MOS transistor matrix is presented. The proposed algorithm guarantees the layout of the transistor unit-circuit not only to be complete common centroid, but also optimal in all the common centroid structures. A novel channel routing algorithm to implement common centroid routing is also proposed. Feasibility of the algorithm is demonstrated by practical analog transistor unit-circuits.
TL;DR: In this paper, a fault detection and data reduction method for inverter-fed induction motors is developed using a centroid determination method, location and type of the fault in a three-phase system, specifically in an induction motor and the PWM current-controlled inverter that feeds it.
Abstract: A novel, simpler, and non-invasive fault detection and data reduction method for inverter-fed induction motors is developed in this paper. Using a centroid determination method, location and type of the fault in a three-phase system, specifically in an induction motor and the PWM current-controlled inverter that feeds it, is identified. MATLAB simulations and the DSpace DSP based experiments are discussed in detail to prove the effectiveness of the proposed new method. The fault monitoring algorithms use pattern symmetry across the positive and negative alpha-beta axes after the three-phase currents are transformed from a-b-c to alpha-beta plane using Concordia transform. This allows detection method to view motor and inverter AC currents in the time independent realm by using cycle-by-cycle single point symmetry of the current spectrum. The proposed method detects a variety of probable faults in real-time and requires reduced computational efforts. Therefore, it is simpler, cost-effective, and free from hardware complexity associated with traditional fault diagnosis methods
TL;DR: In this article, a weighted average or centroid of the intensity or hue associated with pixels vs the horizontal and vertical position of each pixel is calculated for a reference frame in the video data stream.
Abstract: An apparatus and method for stabilizing image frames in a video data stream. A weighted average or centroid of the intensity or hue associated with pixels vs. the horizontal and vertical position of each pixel is calculated for a reference frame in the video data stream. A corresponding centroid is calculated for a subsequent frame in the stream. This image frame is then translated so that the centroid of the subsequent frame and the centroid of the reference frame coincide, reducing artifacts from shaking of the video capture device. Alternatively, the video stream frames may be divided into tiles and centroids calculated for each tile. The centroids of the tiles of a subsequent frame are curve fit to the centroids of tiles in a reference frame. An affine transform is then performed on the subsequent frame to reduce artifacts in the image from movements of the video capture device.
TL;DR: In this article, a method for determining axial alignment between the centroid of an end effector and the effective center of a specimen held by an end-effector coupled to a robot arm is presented.
Abstract: A method determines axial alignment between the centroid of an end effector and the effective center of a specimen held by the end effector. The method is implemented with use of an end effector coupled to a robot arm and having a controllable supination angle. A condition in which two locations of the effective center of the specimen measured at 180° displaced supination angles do not lie on the supination axis indicates that the centroid is offset from the actual effective center of the specimen.
TL;DR: By utilizing Monte-Carlo simulation technology, the centroid algorithms have been compared in detail as discussed by the authors, and the numerical results will be helpful for further improving the measurement accuracy of the wavefront sensor.
Abstract: By utilizing Monte-Carlo simulation technology, the centroid algorithms have been compared in detail. The factors such as the detected window size, threshold and weighting power factor, which affect the detected accuracy of the wavefront sensor, have been studied and the optimal parameters for each algorithm have been found. The numerical results will be helpful for further improving the measurement accuracy of the wavefront sensor.
TL;DR: A particular interpretation of centroid weight is given and this idea is extended to introduce a new weight, the gravitational weight, to improve the estimation of normal vectors.
Abstract: The weighted normal vector method was applied to estimate the curvatures on a surface in the 1990s. However, this estimation method still causes serious problems, such as when two adjacent triangles are of coplanarity. In this paper, our main goals are to provide a geometric interpretation of weighted normal vectors and then give an improvement to handle this problem. In 2004, we pointed out that the normal vector estimation with area weights cannot distinguish the difference between contributions when two different triangles have the same area. To deal with this drawback, we presented the centroid weight to improve the estimation of normal vectors. Here, we give a particular interpretation of centroid weight and extend this idea to introduce a new weight, the gravitational weight.
TL;DR: A triangulation-based method to triangulate each posture to different triangle meshes from which two important posture features are then extracted, i.e., the ones of skeleton and centroid context.
Abstract: This paper presents a new posture classification system to analyze different human behaviors directly from video sequences using the technique of triangulation. For well analyzing each posture in the video sequences, we propose a triangulation-based method to triangulate it to different triangle meshes from which two important posture features are then extracted, i.e., the ones of skeleton and centroid context. The first one is used for a coarse search and the second one is for a finer classification to classify postures in more details. For the first descriptor, we take advantages of a dfs (depth-first search) scheme to extract the skeleton features of a posture from its triangulation result. Then, with the help of skeleton information, we can define a new shape descriptor, i.e., centroid context, to describe a posture up to a semantic level. That is, the centroid context is a finer descriptor to describe a posture not only from its whole shape but also from its body parts. Since the two descriptors are complement to each other, all desired human postures can be compared and classified very accurately. The nice ability of posture classification can help us generate a set of key postures for transferring a behavior sequence to a set of symbols. Then, a novel string matching scheme is proposed to analyze different human behaviors. Experimental results have proved that the proposed method is robust, accurate, and powerful in human behavior analysis
TL;DR: An analytical study of error was introduced, and the error correction means by bilinear interpolation and adaptive centroid window was described, and it was shown that the accuracy of this algorithm is better than the traditional centroid algorithm.
Abstract: For star sensors in CCD, the accuracy of centroid location for star image affects star map identification and determines the effectiveness of measurement. The centroid algorithm is the traditional method for subpixel location,but it is shown that exist a systematic error and a random one. In this research, an analytical study of error was introduced, and the error correction means by bilinear interpolation and adaptive centroid window was described. The experiments showed that the accuracy of this algorithm is better than the traditional centroid algorithm.
TL;DR: This paper proposes a new approach to optimize the initial centroids for Kmeans, started from the center of the data, and chooses each initialCentroids those reside in distant position among them so that the distance among them are as far as possible.
Abstract: Performance of K-means algorithm which depends highly on initial starting points can be trapped in local minima and led to incorrect clustering results. The lack of Kmeans algorithm that generates the initial centroids randomly does not consider the placement of them spreading in the feature space. In this paper we propose a new approach to optimize the initial centroids for Kmeans. This approach spreads the initial centroids in the feature space so that the distance among them are as far as possible. Started from the center of the data, this approach chooses each initial centroids those reside in distant position among them. The experimental results show the improved solution using the proposed approach.
TL;DR: In this paper, the centroid coordinates are used to determine horizontal grid lines and vertical grid lines that are superimposed on the microarray image so that intersections of the grid lines coincide with features of the image.
Abstract: The present invention provides various embodiments that are directed to methods and systems for determining a feature-coordinate grid of a microarray image so that individual features can be located and isolated for statistical analysis. The method receives microarray-image data and determines centroid coordinates for each feature of the microarray image. The methods and systems of the present invention determines uses the centroid coordinates to determine horizontal grid lines and vertical grid lines that are superimposed on the microarray image so that intersections of the grid lines coincide with features of the microarray image. The horizontal grid lines and vertical grid lines provide grid lines of the feature-coordinate grid.
TL;DR: Evaluation experiments conducted on two benchmark collections show that the DragPushing algorithm is comparable to that of more complex methods, such as support vector machines (SVM), and is computationally very efficient.
Abstract: We present a novel algorithm, DragPushing, for automatic text classification. Using a training data set, the algorithm first calculates the prototype vectors, or centroids, for each of the available document classes. Using misclassified examples, it then iteratively refines these centroids; by dragging the centroid of a correct class towards a misclassified example and in the same time pushing the centroid of an incorrect class away from the misclassified example. The algorithm is simple to implement and is computationally very efficient. Evaluation experiments conducted on two benchmark collections show that its classification accuracy is comparable to that of more complex methods, such as support vector machines (SVM).
TL;DR: In this article, a voice recognition device and a method that enhances the function of noise adaptation processing in voice recognition processing and reduce the capacity of a memory being used is provided. And the centroid optimal to the environment estimated by the utterance environmental estimation is extracted from the memory, and model restoration is carried out on the extracted centroid by using the differential vector stored in the memory.
Abstract: There is provided a voice recognition device and a voice recognition method that enhance the function of noise adaptation processing in voice recognition processing and reduce the capacity of a memory being used. Acoustic models are subjected to clustering processing to calculate the centroid of each cluster and the differential vector between the centroid and each model, model composition between each kind of assumed noise model and the calculated centroid is carried out, and the centroid of each composition model and the differential vector are stored in a memory. In the actual recognition processing, the centroid optimal to the environment estimated by the utterance environmental estimation is extracted from the memory, model restoration is carried out on the extracted centroid by using the differential vector stored in the memory, and noise adaptation processing is executed on the basis of the restored model.
TL;DR: A set of complete procedures is proposed to locate and check the BGA image and the cyclic redundancy check (CRC) algorithm is used to check the correspondence between each pin and its pin type.
Abstract: A machine vision system for SMT-mounting machine applications usually involves a two-stage algorithm. It first measures the centroid and rotation angle of the SMD, and then checks each pin’s area, position error, and grid coordinate. In this paper a set of complete procedures is proposed to locate and check the BGA image. During the locating procedures, one first calculates a threshold for the frame using an iterative threshold algorithm. If an object is found under this threshold, then the pin area is calculated by a local threshold. After that, whether this object is a pin or not is decided by its neighbouring pins’ relative positions, then the approximate rotation angle for finding the outer pins is calculated, and the centroid as well as the rotation angle of a BGA component is calculated by the rectangular least-squares algorithm. The checking procedure also measures each pin’s area using the moment algorithm, it then calculates the radius of the moving sum using each pin’s area, and finally measures the position error using a moving-sum algorithm and judges each pin’s type by gray level. The new method uses the gray level statistic information to solve the empty pad problem and utilizes the symmetrical property of a circle to deal with the shape problem. Lastly, the cyclic redundancy check (CRC) algorithm is used to check the correspondence between each pin and its pin type. This new method has a high accuracy and reduced execution time and meets the crucial time requirement of a high-speed SMT machine through experimental verification.
TL;DR: Fuzzy c-means with feature partitions uses a generalized metric on feature subsets to increase centroid robustness and is demonstrated on synthetic and real datasets.
TL;DR: In this article, a system and method for performing and accelerating cluster analysis of large data sets is presented, where the data set is formatted into binary bit sequential (bSQ) format and then structured into a Peano Count tree (P-tree) format which represents a lossless tree representation of the original data.
Abstract: A system and method for performing and accelerating cluster analysis of large data sets is presented. The data set is formatted into binary bit Sequential (bSQ) format and then structured into a Peano Count tree (P-tree) format which represents a lossless tree representation of the original data. A P-tree algebra is defined and used to formulate a vertical set inner product (VSIP) technique that can be used to efficiently and scalably measure the mean value and total variation of a set about a fixed point in the large dataset. The set can be any projected subspace of any vector space, including oblique sub spaces. The VSIPs are used to determine the closeness of a point to a set of points in the large dataset making the VSIPs very useful in classification, clustering and outlier detection. One advantage is that the number of centroids (k) need not be pre-specified but are effectively determined. The high quality of the centroids makes them useful in partitioning clustering methods such as the k-means and the k-medoids clustering. The present invention also identifies the outliers.
TL;DR: This work describes the optimal choice of features for subsets of a given size, corresponding to those yielding the smallest misclassification rate, and proposes an algorithm for estimating this optimal subset in practice.
Abstract: Nearest centroid classifiers have recently been successfully employed in high-dimensional applications. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is typically carried out by computing univariate statistics for each feature individually, without consideration for how a subset of features performs as a whole. For subsets of a given size, we characterize the optimal choice of features, corresponding to those yielding the smallest misclassification rate. Furthermore, we propose an algorithm for estimating this optimal subset in practice. Finally, we investigate the applicability of shrinkage ideas to nearest centroid classifiers. We use gene-expression microarrays for our illustrative examples, demonstrating that our proposed algorithms can improve the performance of a nearest centroid classifier.
TL;DR: In this paper, a method for clustering large datasets in which a number N of data instances with a number n fields is linearly weighted to an n-dimensional mesh with (for example) m grid points per dimension, a number of "intelligent agents" is placed randomly on the mesh.
Abstract: A method for clustering large datasets in which a number N of data instances with a number n fields is linearly weighted to an n-dimensional mesh with (for example) m grid points per dimension, a number of “intelligent agents” is placed randomly on the mesh. These agents move along the grid according to special rules that cause them to find grid points that have the largest weight. All clusters can be determined in this fashion and the clusters can be ranked in “strength”, these maxima are then used as the “centroid” of each cluster. If desired, the mesh can be gridded finer around these “centroids” to obtain finer scaling, and all data points within a certain specified distance of these centroids are considered to form a cluster.
TL;DR: In this paper, a new method is presented, which takes advantage of the fact that, in a SH-WFS-based AO system, there are usually more measurements than actuators.
Abstract: Shack–Hartmann wavefront sensors (SH WFS) are used by many adaptive optics (AO) systems to measure the wavefront. In this WFS, the centroid of the spots is proportional to the wavefront slope. If the detectors consist of 2×2 quad cells, as is the case in most astronomical AO systems, then the centroid measurement is proportional to the centroid gain. This quantity varies with the strength of the atmospheric turbulence and the angular extent of the beacon. The benefits of knowing the centroid gain and current techniques to measure it are discussed. A new method is presented, which takes advantage of the fact that, in a SH-WFS-based AO system, there are usually more measurements than actuators. Centroids in the null space of the wavefront reconstructor, called slope discrepancy measurements, contain information about the centroid gain. Tests using the W. M. Keck Observatory AO system demonstrate the accuracy of the algorithm.
TL;DR: This paper presents an intuitive and effective SOM projection method with comparatively low computational complexity for the purpose of cluster visualization that maps data vectors on the output space based on their responses to different prototype vectors.
Abstract: The self-organizing map (SOM) is an efficient tool for visualizing high-dimensional data as it performs a topology-presenting projection of the input space on a low-dimensional grid. To utilize the information provided by the SOM and obtain an approximation of the data structure, a separate data projection method is usually needed. However, most of the SOM projection methods are computationally expensive when the size of the data set becomes large. In this paper, we present an intuitive and effective SOM projection method with comparatively low computational complexity for the purpose of cluster visualization. This method maps data vectors on the output space based on their responses to different prototype vectors. High-resolution maps can be obtained with a relatively small network size. The proposed method is demonstrated using both an artificial and a real world data set.
TL;DR: An improved MPP algorithm for more effective representation of leaf images is proposed and a new dynamic matching algorithm is shown that basically revises the Nearest Neighbor search to reduce the matching time.
Abstract: This paper presents an effective and robust leaf image retrieval system based on shape feature. Specifically, we propose an improved MPP algorithm for more effective representation of leaf images and show a new dynamic matching algorithm that basically revises the Nearest Neighbor search to reduce the matching time. In particular, both leaf shape and leaf arrangement can be sketched in the query for better accuracy and efficiency. In the experiment, we compare our proposed method with other methods including Centroid Contour Distance(CCD), Fourier Descriptor, Curvature Scale Space Descriptor(CSSD), Moment Invariants, and MPP. Experimental results on one thousand leaf images show that our approach achieves a better performance than other methods.
TL;DR: The procedure is simple, completely automated, efficient and flexible and can be easily implemented on a personal computer and used to test growth or communication strategies among cells.
Abstract: Objective To describe a simple and quick procedure for modeling samples of tissue with Voronoi diagrams. Study design Instead of calculating the centers of the so-called Dirichlet domains (i.e., the polygonal areas occupied by individual cells), the centroid of such areas is used to generate Voronoi diagrams. The coordinates of the centroids are calculated by simply averaging the coordinates of the points of the cell contours; that is much simpler and faster than any geometric procedure for locating the Dirichlet centers. Using the centroids as centers, circles are allowed to grow until no space on the surface is available. With this procedure it is easy to control the rate of growth of individual cells or groups of cells according to any rule or rules. It is also possible to simulate the effects of removing > or = 1 cells from the sample. Conclusion The procedure was successfully applied to modeling some of the changes that can occur in a real sample of human corneal endothelium. The procedure is simple, completely automated, efficient and flexible and can be easily implemented on a personal computer. It can be used to test growth or communication strategies among cells.
TL;DR: In this article, a drive arrangement for a micromachine includes a plurality of fixed electrodes arranged so as to have a common centroid, which is called a centroid-free drive arrangement.
Abstract: A drive arrangement for a micromachine includes a plurality of fixed electrodes arranged so as to have a common centroid.
TL;DR: A framework, KACU, is proposed to enhance the speed of k-means clustering algorithm by integrating a hardware centroid updating mechanism into the procedure of continuous k-Means algorithm.
Abstract: In this paper, we propose a framework, KACU (standing for k-means with hardware centroid updating), to enhance the speed of k-means clustering algorithm by integrating a hardware centroid updating mechanism into the procedure of continuous k-means algorithm. To facilitate performance measurement, KACU is implemented in a commercial field programmable gate array (abbreviated as FPGA) device. The experimental results show that KACU is able to achieve considerably higher performance.