Conference
Intelligent Data Analysis
About: Intelligent Data Analysis is an academic conference. The conference publishes majorly in the area(s): Computer science & Cluster analysis. Over the lifetime, 2217 publications have been published by the conference receiving 39990 citations.
Topics: Computer science, Cluster analysis, Feature selection, Association rule learning, Artificial neural network
Papers published on a yearly basis
Papers
1 May 1997
TL;DR: This survey identifies the future research areas in feature selection, introduces newcomers to this field, and paves the way for practitioners who search for suitable methods for solving domain-specific real-world applications.
Abstract: Feature selection has been the focus of interest for quite some time and much work has been done. With the creation of huge databases and the consequent requirements for good machine learning techniques, new problems arise and novel approaches to feature selection are in demand. This survey is a comprehensive overview of many existing methods from the 1970's to the present. It identifies four steps of a typical feature selection method, and categorizes the different existing methods in terms of generation procedures and evaluation functions, and reveals hitherto unattempted combinations of generation procedures and evaluation functions. Representative methods are chosen from each category for detailed explanation and discussion via example. Benchmark datasets with different characteristics are used for comparative study. The strengths and weaknesses of different methods are explained. Guidelines for applying feature selection methods are given based on data types and domain characteristics. This survey identifies the future research areas in feature selection, introduces newcomers to this field, and paves the way for practitioners who search for suitable methods for solving domain-specific real-world applications.
3,443 citations
1 Oct 2002
TL;DR: The assumption that the class imbalance problem does not only affect decision tree systems but also affects other classification systems such as Neural Networks and Support Vector Machines is investigated.
Abstract: In machine learning problems, differences in prior class probabilities -- or class imbalances -- have been reported to hinder the performance of some standard classifiers, such as decision trees. This paper presents a systematic study aimed at answering three different questions. First, we attempt to understand the nature of the class imbalance problem by establishing a relationship between concept complexity, size of the training set and class imbalance level. Second, we discuss several basic re-sampling or cost-modifying methods previously proposed to deal with the class imbalance problem and compare their effectiveness. The results obtained by such methods on artificial domains are linked to results in real-world domains. Finally, we investigate the assumption that the class imbalance problem does not only affect decision tree systems but also affects other classification systems such as Neural Networks and Support Vector Machines.
3,439 citations
1 Oct 2007
TL;DR: This paper introduces FastDTW, an approximation of DTW that has a linear time and space complexity and shows a large improvement in accuracy over existing methods.
Abstract: Dynamic Time Warping (DTW) has a quadratic time and space complexity that limits its use to small time series. In this paper we introduce FastDTW, an approximation of DTW that has a linear time and space complexity. FastDTW uses a multilevel approach that recursively projects a solution from a coarser resolution and refines the projected solution. We prove the linear time and space complexity of FastDTW both theoretically and empirically. We also analyze the accuracy of FastDTW by comparing it to two other types of existing approximate DTW algorithms: constraints (such as Sakoe-Chiba Bands) and abstraction. Our results show a large improvement in accuracy over existing methods.
1,733 citations
1 Aug 2009
TL;DR: Knowledge Discovery from Data Streams as mentioned in this paper presents a coherent overview of state-of-the-art research in learning from data streams, covering the fundamentals that are imperative to understand data streams and describes important applications, such as TCP/IP traffic, GPS data, sensor networks and customer click streams.
Abstract: Since the beginning of the Internet age and the increased use of ubiquitous computing devices, the large volume and continuous flow of distributed data have imposed new constraints on the design of learning algorithms. Exploring how to extract knowledge structures from evolving and time-changing data, Knowledge Discovery from Data Streams presents a coherent overview of state-of-the-art research in learning from data streams. The book covers the fundamentals that are imperative to understanding data streams and describes important applications, such as TCP/IP traffic, GPS data, sensor networks, and customer click streams. It also addresses several challenges of data mining in the future, when stream mining will be at the core of many applications. These challenges involve designing useful and efficient data mining solutions applicable to real-world problems. In the appendix, the author includes examples of publicly available software and online data sets. This practical, up-to-date book focuses on the new requirements of the next generation of data mining. Although the concepts presented in the text are mainly about data streams, they also are valid for different areas of machine learning and data mining.
828 citations
13 Sep 2001
TL;DR: This paper considers the more challenging task of learning hidden Markov models (HMMs) when only partially (sparsely) labeled documents are available for training, and describes an EM style algorithm for learning HMMs from partially labeled data.
Abstract: Information extraction from HTML documents requires a classifier capable of assigning semantic labels to the words or word sequences to be extracted. If completely labeled documents are available for training, well-known Markov model techniques can be used to learn such classifiers. In this paper, we consider the more challenging task of learning hidden Markov models (HMMs) when only partially (sparsely) labeled documents are available for training. We first give detailed account of the task and its appropriate loss function, and show how it can be minimized given an HMM. We describe an EM style algorithm for learning HMMs from partially labeled data. We then present an active learning algorithm that selects "difficult" unlabeled tokens and asks the user to label them. We study empirically by how much active learning reduces the required data labeling effort, or increases the quality of the learned model achievable with a given amount of user effort.
674 citations
Performance Metrics
| Year | Papers |
|---|---|
| 2021 | 106 |
| 2020 | 146 |
| 2019 | 82 |
| 2018 | 192 |
| 2017 | 161 |
| 2016 | 151 |