Multi-Relational Learning, Text Mining, and Semi-Supervised Learning for Functional Genomics
Mark-A. Krogel,Tobias Scheffer +1 more
TL;DR: The goal is to study the effectiveness of approaches that utilize all data sources that are available in this problem setting, including relational data, abstracts of research papers, and unlabeled data, and a propositionalization approach which uses relational gene interaction data.
read more
Abstract: We focus on the problem of predicting functional properties of the proteins corresponding to genes in the yeast genome. Our goal is to study the effectiveness of approaches that utilize all data sources that are available in this problem setting, including relational data, abstracts of research papers, and unlabeled data. We investigate a propositionalization approach which uses relational gene interaction data. We study the benefit of text classification and information extraction for utilizing a collection of scientific abstracts. We study transduction and co-training for using unlabeled data. We report on both, positive and negative results on the investigated approaches. The studied tasks are KDD Cup tasks of 2001 and 2002. The solutions which we describe achieved the highest score for task 2 in 2001, the fourth rank for task 3 in 2001, the highest score for one of the two subtasks and the third place for the overall task 2 in 2002.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Semi-Supervised Learning
Olivier Chapelle,Bernhard Schlkopf,Alexander Zien +2 more
- 31 Mar 2010
TL;DR: Semi-supervised learning (SSL) as discussed by the authors is the middle ground between supervised learning (in which all training examples are labeled) and unsupervised training (where no label data are given).
Multimodal Machine Learning: A Survey and Taxonomy
TL;DR: This paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy to enable researchers to better understand the state of the field and identify directions for future research.
3.4K
Overview of BioCreative II gene mention recognition
Larry Smith,Lorraine K. Tanabe,Rie Johnson nee Ando,Cheng-Ju Kuo,I-Fang Chung,Chun-Nan Hsu,Yu-Shi Lin,Roman Klinger,Christoph M. Friedrich,Kuzman Ganchev,Manabu Torii,Hongfang Liu,Barry Haddow,Craig A. Struble,Richard J. Povinelli,Andreas Vlachos,William A. Baumgartner,Lawrence Hunter,Bob Carpenter,Richard Tzong-Han Tsai,Richard Tzong-Han Tsai,Hong-Jie Dai,Hong-Jie Dai,Feng Liu,Yifei Chen,Chengjie Sun,Sophia Katrenko,Pieter Adriaans,Christian Blaschke,Rafael Torres,Mariana Neves,Preslav Nakov,Preslav Nakov,Anna Divoli,Manuel Maña-López,Jacinto Mata,W. John Wilbur +36 more
TL;DR: It is demonstrated that, by combining the results from all submissions, an F score of 0.9066 is feasible, and furthermore that the best result makes use of the lowest scoring submissions.
•Posted Content
Data Programming: Creating Large Training Sets, Quickly
TL;DR: A paradigm for the programmatic creation of training sets called data programming is proposed in which users express weak supervision strategies or domain heuristics as labeling functions, which are programs that label subsets of the data, but that are noisy and may conflict.
460
Open set face recognition using transduction
Fayin Li,Harry Wechsler +1 more
TL;DR: Open set TCM-kNN (transduction confidence machine-k nearest neighbors), suitable for multiclass authentication operational scenarios that have to include a rejection option for classes never enrolled in the gallery, is shown to be suitable for PSEI (pattern specific error inhomogeneities) error analysis in order to identify difficult to recognize faces.
References
An algorithm for suffix stripping
TL;DR: An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL, and performs slightly better than a much more elaborate system with which it has been compared.
9.1K
Advances in kernel methods: support vector learning
Bernhard Schölkopf,Christopher John Burges,Alexander J. Smola +2 more
- 08 Feb 1999
TL;DR: Support vector machines for dynamic reconstruction of a chaotic system, Klaus-Robert Muller et al pairwise classification and support vector machines, Ulrich Kressel.
7.3K
The use of the area under the ROC curve in the evaluation of machine learning algorithms
TL;DR: AUC exhibits a number of desirable properties when compared to overall accuracy: increased sensitivity in Analysis of Variance (ANOVA) tests; a standard error that decreased as both AUC and the number of test samples increased; decision threshold independent; and it is invariant to a priori class probabilities.
7K
Data mining
TL;DR: The graduate certificate’s narrow focus allows you to dig deep into this specific topic, and start applying your knowledge sooner.
6.8K
Combining labeled and unlabeled data with co-training
Avrim Blum,Tom M. Mitchell +1 more
- 24 Jul 1998
TL;DR: A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples.
6.4K
Related Papers (5)
Avrim Blum,Tom M. Mitchell +1 more
- 24 Jul 1998
Olivier Chapelle,Bernhard Schlkopf,Alexander Zien +2 more
- 31 Mar 2010
Vladimir Vapnik
- 01 Jan 1998