Top 162 papers presented at Soft Computing in 2004

Showing papers presented at "Soft Computing in 2004"

Journal Article•10.1016/J.ASOC.2004.04.001•

Evaluation of Services Using a Fuzzy Analytic Hierarchy Process

[...]

Ludmil Mikhailov¹, Petco Tsvetinov²•Institutions (2)

University of Manchester¹, Queensland University of Technology²

1 Dec 2004

TL;DR: The proposed fuzzy prioritisation method uses fuzzy pairwise comparison judgements rather than exact numerical values of the comparison ratios and transforms the initial fuzzy prioritisations problem into a non-linear program, which eliminates the need of additional aggregation and ranking procedures.

...read moreread less

Abstract: This paper proposes a new approach for tackling the uncertainty and imprecision of the service evaluation process. Identifying suitable service offers, evaluating the offers and choosing the best alternatives are activities that set the scene for the consequent stages in negotiations and influence in a unique manner the following deliberations. The pre-negotiation problem in negotiations over services is regarded as decision-making under uncertainty, based on multiple criteria of quantitative and qualitative nature, where the imprecise decision-maker’s judgements are represented as fuzzy numbers. A new fuzzy modification of the analytic hierarchy process is applied as an evaluation technique. The proposed fuzzy prioritisation method uses fuzzy pairwise comparison judgements rather than exact numerical values of the comparison ratios and transforms the initial fuzzy prioritisation problem into a non-linear program. Unlike the known fuzzy prioritisation techniques, the proposed method derives crisp weights from consistent and inconsistent fuzzy comparison matrices, which eliminates the need of additional aggregation and ranking procedures. A detailed numerical example, illustrating the application of our approach to service evaluation is given.

...read moreread less

440 citations

Journal Article•10.1016/J.INS.2003.03.013•

A hybrid decision tree/genetic algorithm method for data mining

[...]

Deborah Ribeiro Carvalho, Alex A. Freitas¹•Institutions (1)

University of Kent¹

14 Jun 2004

TL;DR: This paper addresses the well-known classification task of data mining, where the objective is to predict the class which an example belongs to, and proposes a hybrid decision tree/genetic algorithm method specifically designed for discovering rules covering examples belonging to small disjuncts.

...read moreread less

Abstract: This paper addresses the well-known classification task of data mining, where the objective is to predict the class which an example belongs to. Discovered knowledge is expressed in the form of high-level, easy-to-interpret classification rules. In order to discover classification rules, we propose a hybrid decision tree/genetic algorithm method. The central idea of this hybrid method involves the concept of small disjuncts in data mining, as follows. In essence, a set of classification rules can be regarded as a logical disjunction of rules. so that each rule can be regarded as a disjunct. A small disjunct is a rule covering a small number of examples. Due to their nature, small disjuncts are error prone. However, although each small disjunct covers just a few examples, the set of all small disjuncts can cover a large number of examples, so that it is important to develop new approaches to cope with the problem of small disjuncts. In our hybrid approach, we have developed two genetic algorithms (GA) specifically designed for discovering rules covering examples belonging to small disjuncts, whereas a conventional decision tree algorithm is used to produce rules covering examples belonging to large disjuncts. We present results evaluating the performance of the hybrid method in 22 real-world data sets.

...read moreread less

174 citations

Journal Article•10.1007/S00500-004-0385-4•

Using Factor Oracles for Machine Improvisation

[...]

Gérard Assayag¹, Shlomo Dubnov¹•Institutions (1)

IRCAM¹

1 Sep 2004

TL;DR: The factor oracle, a data structure proposed by Crochemore & al for string matching, is presented and the relation between this structure and the previous models is shown and how it can be adapted for learning musical sequences and generating improvisations in a real-time context.

...read moreread less

Abstract: We describe variable markov models we have used for statistical learning of musical sequences, then we present the factor oracle, a data structure proposed by Crochemore & al for string matching. We show the relation between this structure and the previous models and indicate how it can be adapted for learning musical sequences and generating improvisations in a real-time context.

...read moreread less

155 citations

Journal Article•10.1016/J.INS.2003.03.018•

A framework for multi-source data fusion

[...]

Ronald R. Yager¹•Institutions (1)

Iona College¹

14 Jun 2004

TL;DR: A general framework for data fusion based on a voting like process that tries to adjudicate conflict among the data, and presents a concept of reasonableness as a means for including in the fusion process any information available other then that provided by the sources.

...read moreread less

Abstract: A general view of the multi-source data fusion process is presented. Some of the considerations and information that must go into the development of a multisource data fusion algorithm are described. Features that play a role in expressing user requirements are also described. We provide a general framework for data fusion based on a voting like process that tries to adjudicate conflict among the data. We discuss idea of a compatibility relationship and introduce several important examples of these relationships. We show that our formulation results in some conditions on the fused value implying that the fusion process has the nature of a mean type aggregation. Situations in which the sources have different credibility weights are considered. We present a concept of reasonableness as a means for including in the fusion process any information available other then that provided by the sources. We consider the situation where we allow our fused values to be granular objects such as linguistic terms or subsets.

...read moreread less

103 citations

Journal Article•10.1016/J.INS.2003.03.019•

Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning

[...]

William H. Hsu¹•Institutions (1)

National Center for Supercomputing Applications¹

14 Jun 2004

TL;DR: A framework for selection and reordering of input variables to reduce generalization error in classification and probabilistic inference is presented and a generic fitness function for validation of input specification is designed and used to develop two genetic algorithm wrappers.

...read moreread less

Abstract: In this paper, we address the automated tuning of input specification for supervised inductive learning and develop combinatorial optimization solutions for two such tuning problems. First, we present a framework for selection and reordering of input variables to reduce generalization error in classification and probabilistic inference. One purpose of selection is to control overfitting using validation set accuracy as a criterion for relevance. Similarly, some inductive learning algorithms, such as greedy algorithms for learning probabilistic networks, are sensitive to the evaluation order of variables. We design a generic fitness function for validation of input specification, then use it to develop two genetic algorithm wrappers: one for the variable selection problem for decision tree inducers and one for the variable ordering problem for Bayesian network structure learning. We evaluate the wrappers, using real-world data for the selection wrapper and synthetic data for both, and discuss their limitations and generalizability to other inducers.

...read moreread less

103 citations

Journal Article•10.1016/J.ASOC.2004.03.009•

A fuzzy AprioriTid mining algorithm with reduced computational time

[...]

Tzung-Pei Hong¹, Chan-Sheng Kuo², Shyue-Liang Wang³•Institutions (3)

National University of Kaohsiung¹, National Chengchi University², New York Institute of Technology³

1 Dec 2004

TL;DR: A new fuzzy mining algorithm based on the AprioriTid approach to find fuzzy association rules from given quantitative transactions by focusing on the most important linguistic terms for reduced time complexity is proposed.

...read moreread less

Abstract: Due to the increasing use of very large databases and data warehouses, mining useful information and helpful knowledge from transactions is evolving into an important research area. Most of conventional data mining algorithms identify the relation among transactions with binary values. Transactions with quantitative values are, however, commonly seen in real world applications. In the past, we proposed a fuzzy mining algorithm based on the Apriori approach to explore interesting knowledge from the transactions with quantitative values. This paper proposes another new fuzzy mining algorithm based on the AprioriTid approach to find fuzzy association rules from given quantitative transactions. Each item uses only the linguistic term with the maximum cardinality in later mining processes, thus making the number of fuzzy regions to be processed the same as that of the original items. The algorithm therefore focuses on the most important linguistic terms for reduced time complexity. Experimental results from the data in a supermarket of a department store show the feasibility of the proposed mining algorithm.

...read moreread less

88 citations

Journal Article•10.1007/S00500-004-0388-1•

Syntactical and semantical aspects of Faust

[...]

Yann Orlarey, Dominique Fober, Stéphane Letz

1 Sep 2004

TL;DR: This paper presents some syntactical and semantical aspects of FAUST (Functional AUdio STreams), a programming language for real-time sound processing and synthesis, based on a block-diagram algebra.

...read moreread less

Abstract: This paper presents some syntactical and semantical aspects of FAUST (Functional AUdio STreams), a programming language for real-time sound processing and synthesis. The programming model of FAUST combines two approaches: functional programming and block-diagrams composition. It is based on a block-diagram algebra. It as a well defined formal semantic and can be compiled into efficient C/C++ code.

...read moreread less

87 citations

Journal Article•10.1007/S00500-003-0310-2•

Genetic algorithms for outlier detection and variable selection in linear regression models

[...]

Jussi Tolvi¹•Institutions (1)

University of Turku¹

1 Aug 2004

TL;DR: A genetic algorithm is presented which considers different possible groupings of the data into outlier and non-outlier observations, and a genetic algorithm for a simultaneous outlier detection and variable selection is suggested.

...read moreread less

Abstract: This article addresses some problems in outlier detection and variable selection in linear regression models. First, in outlier detection there are problems known as smearing and masking. Smearing means that one outlier makes another, non-outlier observation appear as an outlier, and masking that one outlier prevents another one from being detected. Detecting outliers one by one may therefore give misleading results. In this article a genetic algorithm is presented which considers different possible groupings of the data into outlier and non-outlier observations. In this way all outliers are detected at the same time. Second, it is known that outlier detection and variable selection can influence each other, and that different results may be obtained, depending on the order in which these two tasks are performed. It may therefore be useful to consider these tasks simultaneously, and a genetic algorithm for a simultaneous outlier detection and variable selection is suggested. Two real data sets are used to illustrate the algorithms, which are shown to work well. In addition, the scalability of the algorithms is considered with an experiment using generated data.

...read moreread less

80 citations

Journal Article•10.1016/J.ASOC.2004.03.013•

Hybrid evolutionary algorithm-based real-world flexible job shop scheduling problem: application service provider approach

[...]

Ivan Tanev¹, Takashi Uozumi², Yoshiharu Morotome•Institutions (2)

Doshisha University¹, Muroran Institute of Technology²

1 Dec 2004

TL;DR: The results obtained for evolving a schedule of 400 customers’ orders on experimental model of FPIM indicate that the business delays in order of half-an-hour can be achieved.

...read moreread less

Abstract: This paper presents an approach for scheduling of customers’ orders in factories of plastic injection machines (FPIM) as a case of real-world flexible job shop scheduling problem. The objective of discussed work is to provide FPIM with high business speed which implies (a) providing a customers with convenient way for remote online access to the factory’s database and (b) developing an efficient scheduling routine for planning the assignment of the submitted customers’ orders to FPIM machines. Remote online access to FPIM database, approached via delivering the software as a Web-service in accordance with the application service provider (ASP) paradigm is proposed. As an approach addressing the issue of efficient scheduling routine a hybrid evolutionary algorithm (HEA) combining priority-dispatching rules (PDRs) with GA is developed. An implementation of HEA as a database stored procedure is discussed. Performance evaluation results are presented. The results obtained for evolving a schedule of 400 customers’ orders on experimental model of FPIM indicate that the business delays in order of half-an-hour can be achieved.

...read moreread less

78 citations

Journal Article•10.1016/J.INS.2003.03.014•

Soft data mining, computational theory of perceptions, and rough-fuzzy approach

[...]

Sankar K. Pal¹•Institutions (1)

Indian Statistical Institute¹

14 Jun 2004

TL;DR: Merits of fuzzy granular computation, in terms of performance and computation time, for the task of case generation in large scale case-based reasoning systems are illustrated through an example.

...read moreread less

Abstract: Data mining and knowledge discovery is described from pattern recognition point of view along with the relevance of soft computing. Key features of the computational theory of perceptions and its significance in pattern recognition and knowledge discovery problems are explained. Role of fuzzy-granulation (f-granulation) in machine and human intelligence, and its modeling through rough-fuzzy integration are discussed. Merits of fuzzy granular computation, in terms of performance and computation time, for the task of case generation in large scale case-based reasoning systems are illustrated through an example.

...read moreread less

60 citations

Journal Article•10.1016/J.INS.2003.03.015•

CLIP4: hybrid inductive machine learning algorithm that generates inequality rules

[...]

Krzysztof J. Cios¹, Lukasz Kurgan•Institutions (1)

University of Colorado Boulder¹

14 Jun 2004

TL;DR: The paper describes a hybrid inductive machine learning algorithm called CLIP4 that first partitions data into subsets using a tree structure and then generates production rules only from subsets stored at the leaf nodes, which is a unique feature of the algorithm.

...read moreread less

Abstract: The paper describes a hybrid inductive machine learning algorithm called CLIP4. The algorithm first partitions data into subsets using a tree structure and then generates production rules only from subsets stored at the leaf nodes. The unique feature of the algorithm is generation of rules that involve inequalities. The algorithm works with the data that have large number of examples and attributes, can cope with noisy data, and can use numerical, nominal continuous, and missing-value attributes. The algorithm's flexibility and efficiency are shown on several well-known benchmarking data sets, and the results are compared with other machine learning algorithms. The benchmarking results in each instance show the CLIP4's accuracy, CPU time, and rule complexity, CLIP4 has built-in features like tree pruning, methods for partitioning the data (for data with large number of examples and attributes, and for data containing noise), data-independent mechanism for dealing with missing values, genetic operators to improve accuracy on small data, and the discretization schemes. CLIP4 generates model of data that consists of well-generalized rules, and ranks attributes and selectors that can be used for feature selection.

...read moreread less

Journal Article•10.1007/S00500-003-0303-1•

Improved generalized neuron model for short-term load forecasting

[...]

Devendra K. Chaturvedi, Man Mohan, Ravindra Kumar Singh, Prem Kalra¹•Institutions (1)

Indian Institutes of Technology¹

1 Apr 2004

TL;DR: The new improved generalized neuron model is proposed, which has both summation and product as aggregation function and which has flexibility at both the aggregation and activation function level to cope with the non-linearity involved in the type of applications dealt with.

...read moreread less

Abstract: The conventional neural networks consisting of simple neuron models have various drawbacks like large training time for complex problems, huge data requirement to train a non linear complex problems, unknown ANN structure, the relatively larger number of hidden nodes required, problem of local minima etc. To make the Artificial Neural Network more efficient and to overcome the above-mentioned problems the new improved generalized neuron model is proposed in this work. The proposed neuron models have both summation (Σ) and product (π) as aggregation function. The generalized neuron models have flexibility at both the aggregation and activation function level to cope with the non-linearity involved in the type of applications dealt with. The training and testing performance of these models have been compared for Short Term Load Forecasting Problem.

...read moreread less

Book Chapter•10.1007/978-3-540-39879-0_12•

Restricted Bayesian Network Structure Learning

[...]

Peter J. F. Lucas¹•Institutions (1)

Radboud University Nijmegen¹

1 Jan 2004

TL;DR: This paper proposes a restricted, polynomial time structure learning algorithm that is not as restrictive as both other approaches, and allows researchers to determine the right balance between classification performance and quality of the underlying probability distribution.

...read moreread less

Abstract: Learning the structure of a Bayesian network from data is a difficult problem, as its associated search space is superexponentially large. As a consequence, researchers have studied learning Bayesian networks with a fixed structure, notably naive Bayesian networks and tree-augmented Bayesian networks, which involves no search at all. There is substantial evidence in the literature that the performance of such restricted networks can be surprisingly good. In this paper, we propose a restricted, polynomial time structure learning algorithm that is not as restrictive as both other approaches, and allows researchers to determine the right balance between classification performance and quality of the underlying probability distribution. The results obtained with this algorithm allow drawing some conclusions with regard to Bayesian-network structure learning in general.

...read moreread less

Journal Article•10.1007/S00500-004-0387-2•

Periodic musical sequences and Lyndon words

[...]

Marc Chemillier¹•Institutions (1)

IRCAM¹

1 Sep 2004

TL;DR: When one enumerates periodic musical structures, the computation is done up to a cyclic shift, which means that two solutions which are cyclic shifts of one another are considered the same.

...read moreread less

Abstract: When one enumerates periodic musical structures, the computation is done up to a cyclic shift. This means that two solutions which are cyclic shifts of one another are considered the same. Lyndon words provide a powerful way to do so. We illustrate this by two examples taken from African traditional music.

...read moreread less

Journal Article•10.1007/S00500-003-0315-X•

Domination of ordered weighted averaging operators over t-norms

[...]

Radko Mesiar¹, Susanne Saminger²•Institutions (2)

Slovak University of Technology in Bratislava¹, Johannes Kepler University of Linz²

1 Aug 2004

TL;DR: The aim of this contribution is to investigate the domination of OWA operators over t-norms whereas the main emphasis is on the domination over the Łukasiewicz t- norm.

...read moreread less

Abstract: The fusion of transitive fuzzy relations preserving the transitivity is linked to the domination of the involved aggregation operator. The aim of this contribution is to investigate the domination of OWA operators over t-norms whereas the main emphasis is on the domination over the Łukasiewicz t-norm. The domination of OWA operators and related operators over continuous Archimedean t-norms will also be discussed.

...read moreread less

Proceedings Article•

Learning by Exchanging Advice

[...]

Eugénio Oliveira, Luís Nunes

1 Jan 2004

Journal Article•10.1007/S00500-003-0312-0•

On the strict logic foundation of fuzzy reasoning

[...]

Daowu Pei¹•Institutions (1)

Xi'an Jiaotong University¹

1 Aug 2004

TL;DR: The triple I methods of FMP and FMT for fuzzy reasoning and their consistency are formalized, thus fuzzy reasoning is put completely and rigorously into the logic framework of fuzzy logic.

...read moreread less

Abstract: This paper focuses on the logic foundation of fuzzy reasoning. At first, a new complete first-order fuzzy predicate calculus system K* corresponding to the formal system L* is built. Based on the many-sort system Kms* corresponding to K*, the triple I methods of FMP and FMT for fuzzy reasoning and their consistency are formalized, thus fuzzy reasoning is put completely and rigorously into the logic framework of fuzzy logic.

...read moreread less

Journal Issue•10.1002/INT.V19:11•

Fuzzy sets approaches to statistical parametric and nonparametric tests: Research Articles

[...]

Cengiz Kahraman¹, Cafer Erhan Bozdag¹, Da Ruan, Ahmet Fahri Özok¹•Institutions (1)

Istanbul Technical University¹

1 Nov 2004

TL;DR: Some industrial applications of fuzzy parametric tests are reviewed and some new algorithms for fuzzy nonparametric tests, namely a fuzzy sign test and a fuzzy Wilcoxon signed-ranks test are presented.

...read moreread less

Abstract: The parametric tests often require that the population distributions be normal or approximately so. Statistical methods that do not require the knowledge of the population distribution or its parameters are called nonparametric tests. In this article, first we review some industrial applications of fuzzy parametric tests. Then we present some new algorithms for fuzzy nonparametric tests, namely a fuzzy sign test and a fuzzy Wilcoxon signed-ranks test. Later, we further give fuzzy parametric tests, fuzzy nonparametric tests, and their numerical applications, and also provide a comparison study on crisp and fuzzy nonparametric tests. When the data are vague, the result of the fuzzy nonparametric tests may be different from that of the crisp nonparametric tests. © 2004 Wiley Periodicals, Inc. Int J Int Syst 19: 1069–1087, 2004.

...read moreread less

Journal Article•10.1016/J.INS.2003.03.016•

GeneScout: a data mining system for predicting vertebrate genes in genomic DNA sequences

[...]

Michael M. Yin¹, Jason T. L. Wang¹•Institutions (1)

University Heights, Newark¹

14 Jun 2004

TL;DR: A new system, called GeneScout, for predicting gene structures in vertebrate genomic DNA, which contains specially designed hidden Markov models (HMMs) for detecting functional sites including proteintranslation start sites, mRNA splicing junction donor and acceptor sites, etc.

...read moreread less

Abstract: Automated detection or prediction of coding sequences from within genomic DNA has been a major rate-limiting step in the pursuit of vertebrate genes. Programs currently available are far from being powerful enough to elucidate a gent structure completely. In this paper, we present a new system, called GeneScout, for predicting gene structures in vertebrate genomic DNA. The system contains specially designed hidden Markov models (HMMs) for detecting functional sites including proteintranslation start sites, mRNA splicing junction donor and acceptor sites, etc. An HMM model is also proposed for exon coding potential computation. Our main hypothesis is that, given a vertebrate genomic DNA sequence S, it is always possible to construct a directed acyclic graph G such that the path for the actual coding region of S is in the set of all paths on G. Thus, the gene detection problem is reduced to that of analyzing the paths in the graph G. A dynamic programming algorithm is used to lind the optimal path in G. The proposed system is trained using an expectation-maximization algorithm and its performance on vertebrate gene prediction is evaluated using the 10-way cross-validation method. Experimental results show that the proposed system performs well and is comparable to existing gene discovery tools.

...read moreread less

Journal Article•10.1007/S00500-003-0317-8•

Some issues of designing genetic algorithms for traveling salesman problems

[...]

Huai-Kuang Tsai¹, Jinn-Moon Yang², Y.-F. Tsai³, Cheng-Yan Kao¹•Institutions (3)

National Taiwan University¹, National Chiao Tung University², College of Business Administration³

1 Nov 2004

TL;DR: It is demonstrated that a robust genetic algorithm for the traveling salesman problem (TSP) should preserve and add good edges efficiently, and at the same time, maintain the population diversity well.

...read moreread less

Abstract: This paper demonstrates that a robust genetic algorithm for the traveling salesman problem (TSP) should preserve and add good edges efficiently, and at the same time, maintain the population diversity well. We analyzed the strengths and limitations of several well-known genetic operators for TSPs by the experiments. To evaluate these factors, we propose a new genetic algorithm integrating two genetic operators and a heterogeneous pairing selection. The former can preserve and add good edges efficiently and the later will be able to keep the population diversity. The proposed approach was evaluated on 15 well-known TSPs whose numbers of cities range from 101 to 13509. Experimental results indicated that our approach, somewhat slower, performs very robustly and is very competitive with other approaches in our best surveys. We believe that a genetic algorithm can be a stable approach for TSPs if its operators can preserve and add edges efficiently and it maintains population diversity.

...read moreread less

Journal Article•10.1007/S00500-003-0290-2•

Digital filter design using multiple pareto fronts

[...]

T. Schnier¹, Xin Yao¹, P. Liu•Institutions (1)

University of Birmingham¹

1 Apr 2004

TL;DR: In this article, clustering and Pareto optimisation are combined into a single evolutionary design algorithm to prevent the system from converging prematurely to a local minimum and to encourage a number of different designs that fulfil the design criteria.

...read moreread less

Abstract: Evolutionary approaches have been used in a large variety of design domains, from aircraft engineering to the designs of analog filters. Many of these approaches use measures to improve the variety of solutions in the population. One such measure is clustering. In this paper, clustering and Pareto optimisation are combined into a single evolutionary design algorithm. The population is split into a number of clusters, and parent and offspring selection, as well as fitness calculation, are performed on a per-cluster basis. The objective of this is to prevent the system from converging prematurely to a local minimum and to encourage a number of different designs that fulfil the design criteria. Our approach is demonstrated in the domain of digital filter design. Using a polar coordinate based pole-zero representation, two different lowpass filter design problems are explored. The results are compared to designs created by a human expert. They demonstrate that the evolutionary process is able to create designs that are competitive with those created using a conventional design process by a human expert. They also demonstrate that each evolutionary run can produce a number of different designs with similar fitness values, but very different characteristics.

...read moreread less

Journal Article•10.1016/J.INS.2003.03.012•

Guest editorial: soft computing data mining

[...]

Sankar K. Pal¹, Ashish Ghosh¹•Institutions (1)

Indian Statistical Institute¹

14 Jun 2004

TL;DR: The objective of this issue is to assemble a set of high-quality original contributions that reflect the advances and the state-of-the-art in the area of Data Mining and Knowledge Discovery with Soft Computing Methodologies, thereby presenting a consolidated view to the interested researchers in the aforesaid fields and readers of the journal Information Sciences.

...read moreread less

Abstract: Soft computing is a consortium of methodologies, (like fuzzy logic, neural networks, genetic algorithms, rough sets), that works synergistically and provides, in one form or another, flexible information processing capabilities for handling real life problems. Its aim is to exploit the tolerance for imprecision, uncertainty, approximate reasoning and partial truth in order to achieve tractability, robustness, low solution cost, and close resemblance with human like decision-making. The process of knowledge discovery from data bases (KDD), on the other hand, is a real life problem solving paradigm and is defined as the non-trivial process of identifying valid, novel, potentially useful and understandable patterns from large data bases, where the data is frequently ambiguous, incomplete, noisy, redundant and changes with time. Data mining is one of the fundamental steps in the KDD process and is concerned with the algorithmic means by which patterns or structures are enumerated from the data under acceptable computational efficiency. Soft computing tools, individually or in integrated manner, are turning out to be strong candidates for performing data mining tasks efficiently. At present, the results on these investigations, integrating soft computing and data mining, both theory and applications, are being available in different journals and conference proceedings mainly in the fields of computer science, information technology, engineering and mathematics. The objective of this issue is to assemble a set of high-quality original contributions that reflect the advances and the state-of-the-art in the area of Data Mining and Knowledge Discovery with Soft Computing Methodologies; thereby presenting a consolidated view to the interested researchers in the aforesaid fields, in general, and readers of the journal Information Sciences, in particular. It has ten articles. The first one is a title article. While the next four articles deal with classificatory rule analysis, the sixth one concerns with association rule mining. Utility of Self Organizing Map (SOM) in the context of text mining and clustering are discussed in seventh and eighth articles. The next contribution describes a novel multi-source data fusion strategy. The last article demonstrates an application of data mining in biological data analysis.

...read moreread less

Journal Article•10.1007/S00500-003-0271-5•

On the optimal choice of quality metric in image compression: a soft computing approach

[...]

Olga Kosheleva¹•Institutions (1)

University of Texas at El Paso¹

1 Feb 2004

TL;DR: It is shown that under certain reasonable symmetry conditions, Lp metrics d(I,Ĩ)=∫|I(x)−Ĩ(x)|pdx are the best, and that the optimal value of p can be selected depending on the expected relative size r of the informative part of the image.

...read moreread less

Abstract: Images take lot of computer space; in many practical situations, we cannot store all original images, we have to use compression. Moreover, in many such situations, compression ratio provided by even the best lossless compression is not sufficient, so we have to use lossy compression. In a lossy compression, the reconstructed image Ĩ is, in general, different from the original image I. There exist many different lossy compression methods, and most of these methods have several tunable parameters. In different situations, different methods lead to different quality reconstruction, so it is important to select, in each situation, the best compression method. A natural idea is to select the compression method for which the average value of some metric d(I,Ĩ) is the smallest possible. The question is then: which quality metric should we choose? In this paper, we show that under certain reasonable symmetry conditions, Lp metrics d(I,Ĩ)=∫|I(x)−Ĩ(x)|pdx are the best, and that the optimal value of p can be selected depending on the expected relative size r of the informative part of the image.

...read moreread less

Journal Article•10.3233/KES-2004-8405•

Apriori algorithm and game-of-life for predictive analysis in materials science

[...]

Aparna S. Varde¹, Makiko Takahashi¹, Elke A. Rundensteiner¹, Matthew O. Ward¹, Mohammed Maniruzzaman¹, Richard D. Sisson¹ - Show less +2 more•Institutions (1)

Worcester Polytechnic Institute¹

1 Dec 2004

TL;DR: To the best of the knowledge, this is the first tool performing Web-based predictive analysis in Materials Science, using the Apriori Algorithm to derive association rules that represent relationships between input conditions and results of domain experiments.

...read moreread less

Abstract: Experimental data in many domains serves as a basis for predicting useful trends. If the data and analysis are available over the Web this promotes E-Business by connecting clientele worldwide. This paper describes such a predictive tool "QuenchMiner™" in the domain "Materials Science". Data mining, more specifically the "Apriori Algorithm", is used to derive association rules that represent relationships between input conditions and results of domain experiments. This enables the tool to answer questions such as "Given cooling medium and agitation during material heat treatment, predict cooling rate". This allows users to perform case studies on the Web and use their results to optimize the involved processes, thus increasing customer satisfaction. Another interesting aspect is predicting material microstructure during heat treatment. Microstructure controls material properties such as hardness. Hence its prediction helps in making decisions about materials selection. Microstructure prediction has similarities to an artificial intelligence process called "Game-of-Life". Some challenges in our work are incorporating domain expert judgement while mining association rules, simulating microstructure evolution under different conditions, and dealing with uncertainty. These challenges and associated research issues are outlined here. To the best of our knowledge, this is the first tool performing Web-based predictive analysis in Materials Science.

...read moreread less

Proceedings Article•

Reducts and Constructs in Attribute Reduction

[...]

Robert Susmaga¹•Institutions (1)

Poznań University of Technology¹

1 Apr 2004

TL;DR: The topological aspects of such reducts are focused on, identifying some of their limitations and introducing alternative definitions that do not suffer from these limitations.

...read moreread less

Abstract: One of the main notions in the Rough Sets Theory (RST) is that of a reduct. According to its classic definition, the reduct is a minimal subset of the attributes that retains some important properties of the whole set of attributes. The idea of the reduct proved to be interesting enough to inspire a great deal of research and resulted in introducing various reduct-related ideas and notions. First of all, depending on the character of the attributes involved in the analysis, so called absolute and relative reducts can be defined. The more interesting of these, relative reducts, are minimal subsets of attributes that retain discernibility between objects belonging to different classes. This paper focuses on the topological aspects of such reducts, identifying some of their limitations and introducing alternative definitions that do not suffer from these limitations. The modified subsets of attributes, referred to as constructs, are intended to assist the subsequent inductive process of data generalisation and knowledge acquisition, which, in the context of RST, usually takes the form of decision rule generation. Usefulness of both reducts and constructs in this role is examined and evaluated in a massive computational experiment, which was carried out for a collection of real-life data sets.

...read moreread less

Journal Article•10.1007/S00500-003-0287-X•

Evolutionary design and adaptation of high performance digital filters within an embedded reconfigurable fault tolerant hardware platform

[...]

B. I. Hounsell¹, Tughrul Arslan¹, R. Thomson¹•Institutions (1)

University of Edinburgh¹

1 Apr 2004

TL;DR: This paper presents an evolvable hardware platform for the automated design and adaptation of multiplierless digital filters based on the Primitive Operator Filter design principle and shows that the functionality of filters evolved on the PLA was maintained despite an increasing number of faults.

...read moreread less

Abstract: Finite impulse response filters (FIRs) are crucial devices for robust data communication and manipulation Multiplierless filters have been shown to produce high performance systems with fast signal processing and reduced area Furthermore, the distributed architecture inherent in multiplierless filters makes it a suitable candidate for fault tolerant design Alternative approaches to the design of fault tolerant systems have been proposed using evolutionary algorithms (EAs) and the concept of evolvable hardware (EHW) This paper presents an evolvable hardware platform for the automated design and adaptation of multiplierless digital filters Filters are realised within a dedicated programmable logic array (PLA) based on the Primitive Operator Filter design principle The platform employs a genetic algorithm to autonomously configure the PLA for a given set of coefficients The ability of the platform to adapt to increasing numbers of faults was investigated through the “evolution” of a 31-tap low-pass FIR filter Results show that the functionality of filters evolved on the PLA was maintained despite an increasing number of faults covering up to 25% of the PLA area Additionally, three PLA initialisation methods were investigated to ascertain which produced the fastest fault recovery times It was shown that seeding a population of random configuration-strings with the best configuration currently obtained resulted in a 6 fold increase in fault recovery speed over other methods investigated

...read moreread less

Journal Article•10.1007/S00500-003-0305-Z•

Fuzzy probabilities for web planning

[...]

James J. Buckley¹, K. Reilly¹, Xidong Zheng¹•Institutions (1)

University of Alabama at Birmingham¹

1 Jul 2004

TL;DR: This work applies a new approach to modeling uncertain probabilities to queuing theory and the optimal design of web servers by using fuzzy, finite, regular Markov chains to determine the fuzzy steady state probabilities and then computing the fuzzy numbers for system performance.

...read moreread less

Abstract: We apply our new approach to modeling uncertain probabilities to queuing theory and the optimal design of web servers. This involves using fuzzy, finite, regular Markov chains to determine the fuzzy steady state probabilities and then computing the fuzzy numbers for system performance. We first ignore revenues and costs in determining an optimal system and then we incorporate these factors for optimal design. Then we add two new phenomena associated with the web in our optimization models: “burstiness” and “long tailed distributions”.

...read moreread less

Journal Article•10.1007/S00500-003-0293-Z•

Recent advances in exploratory data analysis with neuro-fuzzy methods

[...]

R. Kruse, A. Klose

1 May 2004

TL;DR: The papers in this special issue of the Soft Computing Journal deal with quite different aspects of neuro-fuzzy techniques, with a stress on data analysis and rule based systems.

...read moreread less

Abstract: Modern information technology makes it possible today to collect, store, transfer, and combine huge amounts of data at very low costs. Thus more and more companies, and scientific and governmental institutions build up large archives of all kinds of data like numbers, tables, documents, images, sounds, etc. Although users often have a vague understanding of their data and can usually formulate hypotheses and guess dependencies, turning these – often abundantly available – data into useful information turns out to be rather difficult. In response to these challenges a new area of research has emerged, called ‘‘knowledge discovery in databases’’ or ‘‘data mining’’, that tries to provide tools to extract valid, useful, understandable, unknown, and unexpected relationships from large databases [1]. This current form of data analysis is an interdisciplinary field, and employs methods from statistics, soft computing, artificial intelligence and machine learning. The stress lies on the development of techniques that produce human-understandable results, and are suited for large, real world datasets. The data in real world applications has in most cases characteristics that challenge classical analyzing approaches. Besides being heterogeneous – which in its simplest form can mean that we have to deal with numerical and symbolic features – , the data is often of low quality. Algorithms must thus be able to deal with uncertainty and imprecision. The characteristics of the data sources – their quantity, complexity, dimensionality and imperfection – , the essential of extracting understandable patterns from these, and the need to incorporate available background knowledge in that process, makes us assume that (neuro-) fuzzy techniques will play a considerable role in the future of data mining [3]. It is the ability of fuzzy sets to transform between computer representations and (naturally linguistic) human concepts that makes them so valuable to meet the advanced data mining demands, and that Zadeh meant, when he promoted his idea of computing with words [4]. The inherent imprecision of words is not necessarily a weakness, but, on the contrary, can be crucial to model complex systems. From our own experience we observed that many practical applications have this certain robustness where full precision is not necessary. In such cases, exaggerated precision can be a waste of resources, and solutions obtained using fuzzy approaches might be easier to understand and to apply, and gain their strengths by explicitly taking into account vagueness, imprecision or uncertainty. One prominent way to use fuzzy systems in data analysis, is to induce fuzzy if-then rules from data by neurofuzzy techniques. The use of linguistic variables eases the readability and interpretability of the rule base. If we apply such techniques, we must be aware of the trade-off between precision and interpretability. However, the results in data mining are not only judged for their accuracy, but also for their interpretability, as the ultimate goal is to extract human understandable patterns [2]. The papers in this special issue of the Soft Computing Journal deal with quite different aspects of neuro-fuzzy techniques, with a stress on data analysis and rule based systems. Although the intention of this issue cannot be to give a survey of this wide area, it shall give an insight in current trends and research topics in this field. The papers can roughly be divided into three groups:

...read moreread less

Proceedings Article•

Locus Oriented Adaptive Genetic Algorithm: Application to the Zero/One Knapsack Problem

[...]

Chun Wai Ma¹, Kwok Yip Szeto•Institutions (1)

Hong Kong University of Science and Technology¹

1 Jan 2004

TL;DR: A heuristic argument is given to show how the statistical information inside the population can be used to tune the mutation rate at individual locus, resulting in higher overall performance.

...read moreread less

Abstract: The biological observation of the difference in the mutation rates of allele on different loci is implemented in genetic algorithm so that the mutation rate is both time and locus dependent. The performance of this new locus oriented adaptive genetic algorithm (LOAGA) is evaluated on the test problem of zero/one knapsack for various sizes. It is found that LOAGA can solve the single constraint zero/one knapsack with high speed, high success rate, and small memory requirement. A heuristic argument is given to show how the statistical information inside the population can be used to tune the mutation rate at individual locus, resulting in higher overall performance.

...read moreread less

Journal Article•10.1007/S00500-003-0302-2•

Note on the relationship between probabilistic and fuzzy clustering

[...]

W. Shitong, Korris Fu-Lai Chung¹, S. Hongbin, Z. Ruiqiang•Institutions (1)

Hong Kong Polytechnic University¹

1 Apr 2004

TL;DR: Algorithm A and the well-known fuzzy clustering algorithm FCM have the same clustering track, and this fact builds the very bridge between probabilistic clustering and fuzzy clusters, and fruitful research results on Renyi entropy measure may help to further understand the essence of fuzzy clusters.

...read moreread less

Abstract: In this short communication, based on Renyi entropy measure, a new Renyi information based clustering algorithm A is presented. Algorithm A and the well-known fuzzy clustering algorithm FCM have the same clustering track. This fact builds the very bridge between probabilistic clustering and fuzzy clustering, and fruitful research results on Renyi entropy measure may help us to further understand the essence of fuzzy clustering.

...read moreread less

...

Expand