Top 8 papers published in the topic of Ward's method in 2014

Journal Article•10.1007/S00357-014-9161-Z•

Ward's Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward's Criterion?

[...]

Fionn Murtagh¹, Pierre Legendre²•Institutions (2)

De Montfort University¹, Université de Montréal²

01 Oct 2014-Journal of Classification

TL;DR: The survey work and case studies will be useful for all those involved in developing software for data analysis using Ward’s hierarchical clustering method.

...read moreread less

Abstract: The Ward error sum of squares hierarchical clustering method has been very widely used since its first description by Ward in a 1963 publication. It has also been generalized in various ways. Two algorithms are found in the literature and software, both announcing that they implement the Ward clustering method. When applied to the same distance matrix, they produce different results. One algorithm preserves Ward's criterion, the other does not. Our survey work and case studies will be useful for all those involved in developing software for data analysis using Ward's hierarchical clustering method.

...read moreread less

3,338 citations

Proceedings Article•10.1109/INFOS.2014.7036702•

Clustering of chemical data sets for drug discovery

[...]

Mohamed G. Malhat, Hamdy M. Mousa, Ashraf B. El-Sisi

1 Dec 2014

TL;DR: Compared clustering algorithms for compound selection, virtual library generation, High-Throughput Screening, Quantitative Structure-Activity Relationship (QSAR) analysis and Absorption, Distribution, Metabolism, Elimination and Toxicity prediction.

...read moreread less

Abstract: Chemoinformatics clustering algorithms are important issues for drug discovery process. So, there are many clustering algorithms that are available for analyzing large chemical data sets of medium and high dimensionality. The quality of these algorithms depends on the nature of data sets and the accuracy needed by the application. The applications of clustering algorithms in the drug discovery process are compound selection, virtual library generation, High-Throughput Screening (HTS), Quantitative Structure-Activity Relationship (QSAR) analysis and Absorption, Distribution, Metabolism, Elimination and Toxicity (ADMET) prediction. Based on Structure-Activity Relationship (SAR) model, compounds with similar structure have similar biological activities. So, clustering algorithms must group more similar compounds in one cluster. K-Means, bisecting K-Means and Ward clustering algorithms are the most popular clustering algorithms that have a wide range of applications in chemoinformatics. In this paper, a comparative study between these algorithms is presented. These algorithms are applied over homogeneous and heterogeneous chemical data sets. The results are compared to determine which algorithms are more suitable depending on the nature of data sets, computation time and accuracy of produced clusters. Accuracy is evaluated using standard deviation metric. Experimental results show that K-Means algorithm is preferable for small number of clusters for homogeneous and heterogeneous data sets in terms of time and standard deviation. Bisecting K-Means and Ward algorithms are preferable for large number of clusters for homogeneous and heterogeneous data sets in term of standard deviation, but bisecting K-Means algorithm is preferable in term of time.

...read moreread less

17 citations

Proceedings Article•10.1109/GRC.2014.6982850•

A method of two stage clustering using agglomerative hierarchical algorithms with one-pass k-means++ or k-median++

[...]

Yusuke Tamura¹, Sadaaki Miyamoto¹•Institutions (1)

University of Tsukuba¹

1 Oct 2014

TL;DR: This paper compared proposed method of clustering in which the first stage uses one-pass k-median++ and the second stage uses an agglomerative hierarchical clustering to examine the effectiveness of L1 distance in two-stage methods.

...read moreread less

Abstract: The aim of this paper is to propose a two-stage method of clustering in which the first stage uses one-pass k-median++ and the second stage uses an agglomerative hierarchical clustering. To handle medians in the second stage, we proposed two calculation methods. One method uses L 1 distance as similarity. Another uses error of L 1 distance like the Ward method. In this paper, we compared proposed method and a two-stage method of our study which uses k-means++ in the first stage to examine the effectiveness of L 1 distance in two-stage methods. Numerical experiments have been done using two criteria: objective function values and the Rand index.

...read moreread less

10 citations

Book Chapter•10.1201/B16741-20•

Group Average Linkage Compared to Ward’s Method in Hierarchical Clustering

[...]

Maurice Roux

10 Apr 2014

4 citations

Book Chapter•10.1007/978-3-319-06569-4_6•

The Confrontation of Two Clustering Methods in Portfolio Management: Ward’s Method Versus DCA Method

[...]

Hoai An Le Thi¹, Pascal Damel¹, Nadège Peltre¹, Nguyen Trong Phuc²•Institutions (2)

University of Lorraine¹, École Normale Supérieure²

1 Jan 2014

TL;DR: This paper presents a new methodology to cluster asset in the portfolio theory based on DCA (Difference of Convex functions), an innovative approach in nonconvex optimization framework which has been successfully used on various industrial complex systems.

...read moreread less

Abstract: This paper presents a new methodology to cluster asset in the portfolio theory. This new methodology is compare with the classical ward cluster in SAS software. The method is based on DCA (Difference of Convex functions), an innovative approach in nonconvex optimization framework which has been successfully used on various industrial complex systems. The cluster can be used in an empirical example in the context of multi-managers portfolio management, and to identify the one that seems to best fit the objectives of portfolio management of a fund of funds or funds. The cluster is useful to reduce the choice of asset class and to facilitate the optimization of Markowitz frontier.

...read moreread less

3 citations

Journal Article•10.7319/KOGSIS.2014.22.4.175•

Selection of Optimal Variables for Clustering of Seoul using Genetic Algorithm

[...]

Hyung Jin Kim, Jaehoon Jung, Jung-Bin Lee, Sangmin Kim, Joon Heo - Show less +1 more

31 Dec 2014

TL;DR: This study acquired 718 attribute dataset from Statistics Korea and conducted an analysis to select the most suitable variables, which differentiate Gangnam from other districts, using the Genetic algorithm and Dunn’s index and K-means algorithm.

...read moreread less

Abstract: Korean government proposed a new initiative ‘government 3.0’ with which the administration will open its dataset to the public before requests. City of Seoul is the front runner in disclosure of government data. If we know what kind of attributes are governing factors for any given segmentation, these outcomes can be applied to real world problems of marketing and business strategy, and administrative decision makings. However, with respect to city of Seoul, selection of optimal variables from the open dataset up to several thousands of attributes would require a humongous amount of computation time because it might require a combinatorial optimization while maximizing dissimilarity measures between clusters. In this study, we acquired 718 attribute dataset from Statistics Korea and conducted an analysis to select the most suitable variables, which differentiate Gangnam from other districts, using the Genetic algorithm and Dunn’s index. Also, we utilized the Microsoft Azure cloud computing system to speed up the process time. As the result, the optimal 28 variables were finally selected, and the validation result showed that those 28 variables effectively group the Gangnam from other districts using the Ward’s minimum variance and K-means algorithm.Keywords: Clustering, Dunn’s Index, Ward’s Minimum Variance, K-means Algorithm, Genetic Algorithm

...read moreread less

Journal Article•10.1007/S00357-014-9157-8•

Minkowski Generalizations of Ward's Method in Hierarchical Clustering

[...]

Alan Lee¹, Bobby Willcox¹•Institutions (1)

University of Auckland¹

01 Jul 2014-Journal of Classification

TL;DR: This work was motivated by clustering software, such as the R function hclust, which accepts a distance matrix as input and applies Ward’s definition of inter-cluster distance to produce a clustering.

...read moreread less

Abstract: In this paper, we consider several generalizations of the popular Ward's method for agglomerative hierarchical clustering. Our work was motivated by clustering software, such as the R function hclust, which accepts a distance matrix as input and applies Ward's definition of inter-cluster distance to produce a clustering. The standard version of Ward's method uses squared Euclidean distance to form the distance matrix. We explore the effect on the clustering of using other definitions of distance, such as the Minkowski distance.

...read moreread less

Hierarchical speaker clustering methods for the NIST i-vector Challenge

[...]

Elie Khoury¹, Laurent El Shafey¹, Marc Ferras¹, Sébastien Marcel¹•Institutions (1)

Idiap Research Institute¹

1 Jan 2014

TL;DR: The experimental results show that the use of the automatically labeled i-vectors to train supervised methods such as LDA, PLDA or linear logistic regression-based fusion, decreases the minimum decision cost function by up to 22%.

...read moreread less

Abstract: The process of manually labeling data is very expensive and sometimes infeasible due to privacy and security issues This paper investigates the use of two algorithms for clustering unlabeled training i-vectors This aims at improving speaker recognition performance by using state-of-the-art supervised techniques in the context of the NIST i-vector Machine Learning Challenge 2014 The first algorithm is the well-known Ward clustering that aims at optimizing an objective function across all clusters The second one is a cascade clustering, which benefits from the latest advances in speaker modeling and session compensation techniques, and relies on both the cosine similarity and probabilistic linear discriminant analysis (PLDA) Furthermore, this paper investigates the multi-clustering fusion that opens the door for further improvements The experimental results show that the use of the automatically labeled i-vectors to train supervised methods such as LDA, PLDA or linear logistic regression-based fusion, decreases the minimum decision cost function by up to 22%

...read moreread less

Showing papers on "Ward's method published in 2014"

Ward's Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward's Criterion?

Clustering of chemical data sets for drug discovery

A method of two stage clustering using agglomerative hierarchical algorithms with one-pass k-means++ or k-median++

Group Average Linkage Compared to Ward’s Method in Hierarchical Clustering

The Confrontation of Two Clustering Methods in Portfolio Management: Ward’s Method Versus DCA Method

Selection of Optimal Variables for Clustering of Seoul using Genetic Algorithm

Minkowski Generalizations of Ward's Method in Hierarchical Clustering

Hierarchical speaker clustering methods for the NIST i-vector Challenge