A Holistic, Decision-Theoretic Framework for Pool-Based Active Learning
BücherAngebote / Angebote:
Efficient labeling is an important topic in machine learning research as classifiers need labeled data. Whereas unlabeled data is easily gathered, labeling is exhausting, time-consuming, or expensive and should, therefore, be reduced to a minimum. Active learning aims to actively select useful, unlabeled instances for label acquisition to reduce the labeling effort while providing labeled training data such that the classifier performs well. This thesis proposes Probabilistic Active Learning, a holistic, decision-theoretic framework for active learning that enables optimization for every performance measure and classifier. Using the holistic mathematical description, we can define an upper baseline for active learning and identify theoretical similarities to other selection strategies. We evaluate our approach on 22 datasets for six different performance measures. Moreover, we show that our approach can be applied to multiple classifiers and can be used for batch selection. In another scenario, called transductive active learning, we provide a set of unlabeled instances and ask the active learning algorithm to return the correct labels only for this set. In contrast to the standard (inductive) scenario, where we aim to build a general classifier, we can either ask an oracle to provide the correct labels or use the classifier to predict the label. We introduce a new cost-based performance measure for transductive active learning and show the superiority of our probabilistic approach.
Folgt in ca. 5 Arbeitstagen