TL;DR: This paper argues that the generalisation of Ward’s linkage method to incorporate Manhattan distances is theoretically sound and provides an example of where this method outperforms the method using Euclidean distances.
Abstract: The claim that Ward's linkage algorithm in hierarchical clustering is limited to use with Euclidean distances is investigated. In this paper, Ward's clustering algorithm is generalised to use with l1 norm or Manhattan distances. We argue that the generalisation of Ward's linkage method to incorporate Manhattan distances is theoretically sound and provide an example of where this method outperforms the method using Euclidean distances. As an application, we perform statistical analyses on languages using methods normally applied to biology and genetic classification. We aim to quantify differences in character traits between languages and use a statistical language signature based on relative bi-gram (sequence of two letters) frequencies to calculate a distance matrix between 32 Indo-European languages. We then use Ward's method of hierarchical clustering to classify the languages, using the Euclidean distance and the Manhattan distance. Results obtained from using the different distance metrics are compared to show that the Ward's algorithm characteristic of minimising intra-cluster variation and maximising inter-cluster variation is not violated when using the Manhattan metric.
TL;DR: Requirement of passengers regarding their journeys and journey planners were the topic of this paper, where a framework of aspects for multimodal journey planners was created, where passengers were divided into 5 user groups, where the variance within the groups is minimized.
Abstract: Requirement of passengers regarding their journeys and journey planners were the topic of this paper First a framework of aspects for multimodal journey planners was created, where passengers were divided into 5 user groups In order to receive real answers a survey was conducted Considering the main aspects (route planning, booking and payment, handled data, complementary information, supplementary information) the established user groups showed no significant differences The connection of single aspects was investigated using correlation analysis Therefore the need of the creation of new user groups has been arisen Ward method, as a hierarchical clustering method, was used to create groups, where the variance within the groups is minimized The clustering algorithm was implemented in MatLab environment working with the original survey answers As a result 5 new user groups were presented with special features, as alternative journey planning group, visualization on the map group, dynamic data group, no mobile payment interested group, no WiFi interested group These groups showed significant difference regarding the main aspects Using the new group allocation, passengers were categorized according to real requirement into homogeneous user groups In order to apply the results of the Ward user groups, the evaluation of multimodal journey planners was also performed, where requirements of the user groups were taken into account during the process
TL;DR: It is shown that clustering with Ward's method produces better or equivalent cross-validated MSMs for protein folding than other clustering algorithms.
Abstract: Markov state models (MSMs) are a powerful framework for analyzing protein dynamics. MSMs require the decomposition of conformation space into states via clustering, which can be cross-validated when a prediction method is available for the clustering method. We present an algorithm for predicting cluster assignments of new data points with Ward’s minimum variance method. We then show that clustering with Ward’s method produces better or equivalent cross-validated MSMs for protein folding than other clustering algorithms.
TL;DR: The analysis suggests that a unified approach to population problems’ solving is far from being effective and a differentiated approach is needed to achieve greater results in the implementation of family policy.
Abstract: The predicted negative trends in Russian demography (falling birth rates, population decline) actualize the need to strengthen measures of family and population policy. Our research purpose is to identify groups of Russian regions with similar characteristics in the family sphere using cluster analysis. The findings should make an important contribution to the field of family policy. We used hierarchical cluster analysis based on the Ward method and the Euclidean distance for segmentation of Russian regions. Clustering is based on four variables, which allowed assessing the family institution in the region. The authors used the data of Federal State Statistics Service from 2010 to 2015. Clustering and profiling of each segment has allowed forming a model of Russian regions depending on the features of the family institution in these regions. The authors revealed four clusters grouping regions with similar problems in the family sphere. This segmentation makes it possible to develop the most relevant family policy measures in each group of regions. Thus, the analysis has shown a high degree of differentiation of the family institution in the regions. This suggests that a unified approach to population problems’ solving is far from being effective. To achieve greater results in the implementation of family policy, a differentiated approach is needed. Methods of multidimensional data classification can be successfully applied as a relevant analytical toolkit. Further research could develop the adaptation of multidimensional classification methods to the analysis of the population problems in Russian regions. In particular, the algorithms of nonparametric cluster analysis may be of relevance in future studies.
TL;DR: A comprehensive evaluation model of regional clustering on the basis of analysis and investigation about the regional differences of natural environmental factors and user habit factors that influence the reliability of air-conditioning systems is established.
Abstract: With an aim to minimize the difference of use reliability of air-conditioning systems in different regions, a clustering method for classifying regions based on factors of the reliability of air-conditioning systems is proposed. This study establishes a comprehensive evaluation model of regional clustering on the basis of analysis and investigation about the regional differences of natural environmental factors and user habit factors that influence the reliability of air-conditioning systems. A judging criterion of refrigeration and heating of air conditioners is set to accurately quantify seven clustering indicators. The Ward method is used to calculate weight coefficients of different distance. The weighted Ward clustering algorithm is used for clustering regions from the perspectives of natural environmental factors and user habit factors, respectively. Two kinds of clustering results are integrated together to obtain the final results by secondary clustering analysis. Finally, the regional clustering ...
TL;DR: In this article, the authors performed a less-dimensional visualization process for the purpose of determining the images of the students on the concept of number using the Ward clustering analysis combined with the self-organizing map (SOM).
Abstract: The purpose of the study is to perform a less-dimensional visualization process for the purpose of determining the images of the students on the concept of number. The Ward clustering analysis combined with the self-organizing map (SOM) was used for this purpose. The conceptual understanding tool, which consisted of the open-ended question “write the first ten things you remember when the term number is mentioned” to the study group, which consisted of 212 fifth grade students. The analysis results showed that students mostly explained the concept of number by associating with mathematics and other sciences with the terms like “addition, subtraction, division, multiplication, fraction, share, equation, cluster, angle, square, rectangle, plus, minus, shape, operation, step, graphics, value, equal, equals”, which are related with “operations-calculations” and “mathematical terms”. The students mostly established relations with mathematical side of the number. At the end of the study, the dataset obtained from the conceptual understanding tool was used for training the SOM, and a visualization approach that revealed the images of the students on number concept was recommended.
TL;DR: In this paper, an air conditioner reliability influence factor-based regional clustering method is presented. But the authors focus on the reliability of air conditioners and do not consider the user's characteristics.
Abstract: The invention discloses an air conditioner reliability influence factor-based regional clustering method. The method includes the following steps that: a system analyzes the regional differences of working environment factors and user use habit factors that influence the reliability of air conditioners and extracts working environment reliability key influence factors and user use habit reliability key influence factors; an air conditioner reliability influence factor-based regional clustering analysis comprehensive evaluation model is constructed; judgment criterion of air conditioner startup refrigeration and heating are formulated, the average consumption tendency indexes of the air conditioners are accurately quantified; and a weighted Ward clustering algorithm is adopted to carry out clustering analysis in the aspects of the working environment influence factors and the user use habit factor influence factors, so that an working environment influence factor clustering analysis result and a user use habit factor influence factor clustering analysis result are obtained, a secondary clustering method is adopted to integrate the working environment influence factor clustering analysis result and the user use habit factor influence factor clustering analysis result, so that final regional distribution can be obtained. With the air conditioner reliability influence factor-based regional clustering method of the invention adopted, the use reliability difference of air conditioners distributed in different areas can be minimum, and more scientific and more precise regional classification results can be obtained.
TL;DR: A result of estimate of a user's behavior from sensor data by Ward's method is evaluated, which is an unsupervised learning for estimation of auser's behavior.
Abstract: Home Energy Management System(HEMS) is standard as a system for reducing power consumption in ordinary homes. The system prevents the users from forgetting to turn off home appliances. However the system is too simple to more reduce power consumption. Therefore we aim to reduce power consumption by figuring out a user's behavior to control home appliances. However to estimate user's behavior is difficult for the system. So we develop Multifunction Outlet System into a function to control home appliances. The function uses Ward's method which is an unsupervised learning for estimation of a user's behavior. In this paper, we evaluated a result of estimate of a user's behavior from sensor data by Ward's method.
TL;DR: Graphical abstract 3-D representation of high dimensional data following ESOM projection and visualization of group (cluster) structures using the U-matrix, which employs a geographical map analogy of valleys where members of the same cluster are located, separated by mountain ranges marking cluster borders.