Positive and Unlabeled Learning Algorithms and Applications | Slides Algorithms and Programming

POSITIVE AND UNLABELED LEARNING ALGORITHMS AND APPLICATIONS:

A SURVEY

Kristen Jaskie and Andreas Spanias

SenSIP Center, School of ECEE, Arizona State University, USA

ABSTRACT

This paper will address the Positive and Unlabeled learning

problem (PU learning) and its importance in the growing field

of semi-supervised lear ning. In most real-world classification

applications, well labeled data is expensive or impossible to

obtain. We can often label a small subset of data as belonging

to the class of interest. It is frequently impractical to manually

label all data we are not interested in. We are left with a small

set of positive labeled items of interest and a large set of

unknown and unlabeled data. Learning a model for this is the

PU learning problem.

In this paper, we explore several applications for PU

learning including examples in biological/medical, business,

security, and signal processing. We then survey the literature

for new and existing solutions to the PU learning problem.

Index Terms—PU learning, positive unlabeled learning,

machine learning, artificial intelligence, classification

1. INTRODUCTION

In the modern world, big data and machine learning provide

many opportunities for knowledge extraction. Data

accumulation is exploding. By some estimates, 2.5 quintillion

bytes of data are generated daily at our current pace, and some

90% of the data in the world was generated in the last two years

[1]. The Internet of Things (IoT) is only accelerating this growth

[2], [3]. Machine learning algorithms use this data to identify

patterns, make connections, and learn representative models.

The most frequently used machine learning algorithms are

supervised or unsupervised. Supervised learning is used when

the “ground truth” or labels of the data are known. Given the

data and the known labels, a model can be created to predict the

unknown label of a new, unfamiliar data sample. When no labels

are available for training, unsupervised learning methods are

used [4]–[6]. See section 5.1 for additional information and

resources on general Machine Learning.

A common task in machine learning is supervised binary

classification. Given data sample and binary label pairs (𝑥, 𝑦 ),

where 𝑦 is typically considered either positive (𝑦 = 1) or

negative (𝑦 = 0) , a classification algorithm learns a model

𝑓(𝑥) from the features of these labeled samples. Given a new

data sample with no label, this model then places that unlabeled

sample into either the positive or the negative class. Several

algorithms work well with this problem, including logistic

regression, support vector machines (SVMs), and artificial

neural networks (ANNs) among others.

While enormous quantities of data are accumulated in the

modern world, labeling these can be prohibitively expensive and

time-consuming, if even possible, making supervised algorithms

unusable. Semi-supervised learning addresses the problem

where labels are not known for all training samples. Many real-

world classification problems inherently have a small number of

known positive data, no known negative data, and a large

quantity of unlabeled data. This is known as the Positive and

Unlabeled learning problem (PU learning). The difference

between traditional binary classification and PU classification is

illustrated in Figure 1.

Figure 1: Illustration of a Positive Unlabeled binary classification problem.

Positive and Unlabeled Learning Algorithms and Applications, Slides of Algorithms and Programming

Related documents

Partial preview of the text

Download Positive and Unlabeled Learning Algorithms and Applications and more Slides Algorithms and Programming in PDF only on Docsity!

978 - 1 - 7281 - 4959 - 2/19/$31.00 ©2019 IEEE