




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
A common task in machine learning is supervised binary classification. Given data sample and binary label pairs ( , ), where is typically considered ...
Typology: Slides
1 / 8
This page cannot be seen from the preview
Don't miss anything!
This paper will address the Positive and Unlabeled learning
problem (PU learning) and its importance in the growing field
of semi-supervised learning. In most real-world classification
applications, well labeled data is expensive or impossible to
obtain. We can often label a small subset of data as belonging
to the class of interest. It is frequently impractical to manually
label all data we are not interested in. We are left with a small
set of positive labeled items of interest and a large set of
unknown and unlabeled data. Learning a model for this is the
PU learning problem.
In this paper, we explore several applications for PU
learning including examples in biological/medical, business,
security, and signal processing. We then survey the literature
for new and existing solutions to the PU learning problem.
Index Terms —PU learning, positive unlabeled learning,
machine learning, artificial intelligence, classification
In the modern world, big data and machine learning provide
many opportunities for knowledge extraction. Data
accumulation is exploding. By some estimates, 2.5 quintillion
bytes of data are generated daily at our current pace, and some
90% of the data in the world was generated in the last two years
[1]. The Internet of Things (IoT) is only accelerating this growth
[2], [3]. Machine learning algorithms use this data to identify
patterns, make connections, and learn representative models.
The most frequently used machine learning algorithms are
supervised or unsupervised. Supervised learning is used when
the “ground truth” or labels of the data are known. Given the
data and the known labels, a model can be created to predict the
unknown label of a new, unfamiliar data sample. When no labels
are available for training, unsupervised learning methods are
used [4]–[6]. See section 5.1 for additional information and
resources on general Machine Learning.
A common task in machine learning is supervised binary
classification. Given data sample and binary label pairs (𝑥, 𝑦),
where 𝑦 is typically considered either positive
or
negative (𝑦 = 0 ) , a classification algorithm learns a model
𝑓(𝑥) from the features of these labeled samples. Given a new
data sample with no label, this model then places that unlabeled
sample into either the positive or the negative class. Several
algorithms work well with this problem, including logistic
regression, support vector machines (SVMs), and artificial
neural networks (ANNs) among others.
While enormous quantities of data are accumulated in the
modern world, labeling these can be prohibitively expensive and
time-consuming, if even possible, making supervised algorithms
unusable. Semi-supervised learning addresses the problem
where labels are not known for all training samples. Many real-
world classification problems inherently have a small number of
known positive data, no known negative data, and a large
quantity of unlabeled data. This is known as the Positive and
Unlabeled learning problem (PU learning). The difference
between traditional binary classification and PU classification is
illustrated in Figure 1.
Figure 1: Illustration of a Positive Unlabeled binary classification problem.
One application of PU learning involves remote sensing -
including identifying objects in satellite images. In a database of
satellite images, some number of images may be labeled as
containing the item of interest, but there are too many images
for all to be labeled. We would like to classify all the images in
the database as either containing the item of interest or not. This
and many more PU learning applications will be described in
more detail in Section 2.
This paper will address the PU learning problem as follows.
Section 2 will describe several applications for PU learning.
Section 3 will discuss some of the assumptions that are
fundamental to several of the problem solutions. Section 4
surveys the existing methods of solving the PU learning
problem. Section 5 concludes and provides a brief overview into
Machine Learning along with suggested reading and a
discussion of future work and opportunities.
A surprisingly large number of real-world classification
problems naturally fall into the Positive and Unlabeled learning
scenario (PU learning). This section provides a brief survey of
those topics found in the literature and ideas for potential future
applications. Current uses emphasize medical/biological and
business applications. Signal processing and security
applications are underrepresented in the literature and are
discussed in section 2.3.
2.1 Medical and Biological Applications
Many medical situations are a natural fit for the PU learning
problem. One example is identifying [7]–[9] or priority ranking
[10]–[12] genes or gene combinations that influence disease
incidence. Many diseases including cancer, Alzheimer’s
disease, cystic fibrosis, sickle cell anemia, and even anxiety
disorders are influenced by genetics. A small set of genes are
known to cause or influence the specified diseases - these
compose the positive set, and very little is known about how all
other genes influence the disease of interest - these are unknown.
In addition to genetic analysis, diseases such as cancer may be
detectable by examining general patient records including blood
test results and patient history [13]–[15].
Virtual screening of drug compounds – identifying which
drug compounds could be useful in treating a given disease is
another application for PU learning [11]. Drugs and chemicals
known to be effective against a given disease make up the
positive set. All other compounds in a drug or chemical database
make up the unknown set. [16] uses PU learning to identify
transport proteins from a general protein database. [17]
reconstructs gene regulatory networks from gene expression
data, identifying which gene pairs are most likely to interact.
[18] identifies small non-coding RNA (ncRNA) genes from
intergenic sequences. A related, though more difficult learning
problem – that of learning only from the proportions of the
positive and unlabeled sets – is being used for embryo selection
in assisted reproduction [19]. In this scenario, only the most
viable embryos should be implanted, and a variant of PU
learning can be used to do this. Growing medical fields
concerning gut microbiome analysis and epigenetics could also
benefit from this type of learning.
PU learning for ecological [20] and environmental
monitoring [21], [22] are scenarios that have been lightly
touched upon in the literature. [20] discusses using PU learning
to identify species presence. Determining species absence in a
region is difficult and expensive. Geographical regions with
reported sightings of the species of interest make up the positive
set, while all other areas remain in the unknown set.
2.2 Business Applications
Some of the best studied applications for PU learning involve
text classification or document classification [10], [23], [32]–
[37], [24]–[31]. This can include categorizing the subject of a
paper, webpage, or email. One common application is email
spam identification. Users identify some emails as spam, and
these make up the positive class. All other emails are considered
unknown. Search topic identification/classification [10] and
web page text retrieval and classification [11], [15], [30]–[32]
are other important text applications.
Recommender systems can be considered as PU learning
opportunities. For web page recommenders, a user’s browsing
history or bookmarks make up the positive set. All other
webpages constitute the unknown set. From this, web pages of
interest can be recommended [13], [33], [38]. This could also be
useful in recommending movies/tv shows or social media
contacts or posts that a user would enjoy. Every show, post, or
group that is “liked” by a user would make up the positive set,
while all others would be unknown. [39] attempts to predict
which subjects would interest a politician by looking at their past
work. Recommender systems can also suffer from deceptive
reviews. [40] and [41] use PU learning to identify these.
Reject inference for loan approval and other tasks learns
from the applications of both accepted and rejected individuals.
The behavior of rejected individuals is unknown and thus fits
within the Positive and Unlabeled framework. [42] applies PU
learning to reject inference problems which can include
epidemiology, econometrics, and clinical trial evaluation along
with the more standard financial applications.
Direct, or targeted, marketing allows a business to save a
significant amount of money and is a natural match for PU
learning. Known customer profiles compose the positive set, and
large unknown databases of customer information make up the
negative set [13], [38], [43], [44]. Fraud detection is another
application for which PU learning could be useful [45], [46].
The positive set could be composed of known fraudulent
transactions, while all others would be unlabeled.
2.3 Security and Signal Processing Applications
Security and signal processing applications are severely
underrepresented in the Positive and Unlabeled learning
domain. Initial forays into image classification [10], [15], [22],
[47]–[50], have explored satellite image land-type classification
and facial authentication. Classifying satellite images, radar
images, and others is a natural fit for PU learning. A subset of
objects from the desired class are manually labeled, and all
others are classified using PU learning. This could be used to
identify man-made objects in satellite imagery such as unknown
archeological sites or new military installations and build-ups.
distance metric and SVM to identify negatives in PSoL. [15]
creates an algorithm called SVMC that uses the margin
maximization property of an SVM. [67] proposes NPSVM, a
Nonparallel hyperplane SVM to improve performance. A more
recent algorithm described in [30], builds on [32]’s 1-DNF
strategy while introducing an iterative classification and a
voting, or bagging, method for final classification (discussed
further below). [68] proposes A-EM which adds additional
unlabeled samples that are expected to be mostly negative to
better separate the classes.
Step two of this approach involves applying a standard
supervised learning algorithm, possibly iterated, to these
estimated negatives and known positives. The S-EM algorithm
uses an EM (Expectation Maximization) algorithm with a Naive
Bayes classifier. PEBL, Roc-SVM, PSol, and SVMC use SVMs.
[43] provides an analysis and comparison of several of the
algorithms in this approach. The overall percentage of positive
samples is usually considered unknown or is estimated using
domain knowledge. As such, there is no way to independently
verify the final classification results of the algorithm and
convergence is usually used to cease iteration between labeling
and learning.
4.2 The weighted approach to PU learning
The second approach to positive and unlabeled classification
assigns real valued weights to each unlabeled sample. These
weights represent the likelihood, or conditional probability, that
an unlabeled sample 𝑥 belongs to the positive set or negative set
[20], [33], [69], or both [13], [63]. A standard learning algorithm
then uses the weighted, unlabeled samples as either constantly
weighted negative samples [33], variably weighted negative
samples [20], [69], or as variably weighted positive and negative
samples concurrently [13], [63] to learn a classifier, 𝑓(𝑥) =
Various methods are used to estimate these likelihoods such
as generalized linear models [33], logistic regression [63],
boosted trees nonlinear logit model [20], soft-margin SVMs
with linear kernels and Platt scaling [63], minimum distance to
the positive set [69], and validation set experimentation and
PrTFIDF [13].
Some algorithms, such as [43], use a more general weighting
scheme where weights represent the cost of mislabeling an
unlabeled positive sample as negative and vice versa, rather than
directly weighting each sample’s likelihood. These cost weights
vary according to 𝑝(𝑦 = 1 ) and are determined experimentally,
using a F-score performance measure [43] to determine which
values result in the highest performing algorithm. This method
avoids the need to estimate the likelihood, 𝑝(𝑦 = 1 | 𝑥) , or
𝑝( 𝑦 = 1 ) directly but is quite slow.
A hybrid method called PUDI is used by [8], combining
aspects of the two-step and weighted learning approaches to
identify likely disease genes. Instead of using weights, bins are
used to partition the unlabeled set into four subsets: reliably
negative samples, likely negative samples, weak negative
samples, and likely positive samples. Samples are placed into
each of these bins based on the Euclidean distance between the
feature vectors of the unlabeled sample and a “positive
representative vector” determined by averaging the genes in the
known positive set. Weighted SVMs are then trained on these
sets to create a final classifier.
A new variant of the weighted approach to PU learning,
created by the authors of this paper and described in depth
elsewhere [70], uses a modified logistic regression to build on
[63] with an adaptive upper bound in place of a standard logistic
regression. This produces improved results more than 87% of
the time using an F-Score performance metric.
4.3 The noisy negative approach to PU learning
The third approach to the Positive and Unlabeled learning
problem involves treating the unlabeled set as noisy negatives.
Occasionally the positive set is assumed to be noisy as well,
though the levels of noise are usually class-conditional. A
classifier for this noisy data is then learned using techniques
developed to deal with such scenarios [71]. [33] discusses this
problem and develop a classifier using a weighted logistic
regression algorithm. The authors of [43] offer a classifier they
call the Biased-SVM with a modified cost function to account
for the noisy data. [72] demonstrate that surrogate loss functions
with importance reweighting allow any traditional classifier to
be modified to work on noisy data. [21] uses a tree augmented
naïve Bayes algorithm called UPTAN to deal with the
uncertainty of the noisy data by creating a Bayesian network
using dependence information among uncertain attributes to
create a classifier. [73] uses mixture proportion estimation
(MPE) to deal with noisy labels. [74] assumes that most of the
unlabeled data are negative and employs a Laplacian unit-
hyperplane classifier to deal with the resultant noisy data.
4.4 Weakening SCAR to deal with Selection Bias
One of the obstacles in PU learning is dealing with selection
bias. The SCAR assumption described in Section 3 assumes
that there is no selection bias which is often found to be
unrealistic in practice. Papers such as [47], [48], [66], [75], and
[76], attempt to compensate for this selection bias.
Rather than SCAR, [66] and [77] propose a weaker
assumption, Selected at Random (SAR), which no longer
assumes that positive regions are sampled with consistent
frequency. They introduce an algorithm, SAR-EM, that uses a
propensity score as a function of the attributes and the
expectation maximization (EM) algorithm to solve the PU
learning problem.
Other authors such as [50] attempt to weaken the SCAR
assumption and deal with selection bias by minimizing the
classification risk function using what they call the invariance
of order. That is, they assume that the order of the probability
of a sample being labeled positive is the same as the probability
of it being positive. They also use a “partial identification”
technique to extract some useful information from a function,
without attempting to identify the entire function.
The authors in [64] also modify the classification risk
function to deal with selection bias. They claim that unbiased
risk estimators will produce negative empirical risks if the
model being used is very flexible. To deal with this, they
introduce a non-negative risk estimator that they argue
produces better results in these situations and allows for
hyperflexible deep learning solutions.
4.5 Other techniques
In recent years, heuristic optimizations for the Positive and
Unlabeled learning such as bagging have gained traction.
Bootstrap aggregating, or bagging, has become a popular
approach to increase classifier performance [10], [54]. [10]
proposes that “by nature, PU learning problems have a particular
structure that leads to instability of classifiers, which can be
advantageously exploited by a bagging-like procedure”.
Bagging involves repeatedly generating random subsets of the
training data to classify. Each classifier then votes on the correct
classification of a new sample. [10] and [11] both use bagging
SVMs, while [30] adds bagging to the 1 - DNF approach
described in [32].
Traditional PU learning methods have not usually
considered scalability concerns such as computational
efficiency and overfitting. [78] propose an iterative algorithm
USMO (Unlabeled data in Sequential Minimal Optimization)
using Gram matrices and breaking the problem down into
smaller subproblems that they argue is both computationally
efficient and theoretically optimal. [79] also focuses on
scalability by introducing a double hinge convex loss function.
Improved computational efficiency is shown experimentally. In
addition to softening the SCAR assumption and selection bias as
described above, [64] also handles overfitting by introducing a
non-negative risk estimator that allows the use of flexible, deep
learning algorithms without the propensity to overfitting that
many methods have. [80] proposes a solution that deals with
large-scale datasets by introducing a closed-form classifier
under certain conditions. [81] analyzes some of the theory
behind PU learning and discusses how different convex
representations may be more effective than others due to bias.
A new and extremely promising method uses a generative
approach to PU learning using a deep learning Generative
Adversarial Network (GAN) to identify both positive and
negative data distributions [65]. Their algorithm, GenPU, uses
two generative agents, one to generate positive samples and one
for negative samples, and three discriminator agents – one each
for the positive, unlabeled, and negative classes. The authors
provide a theoretical analysis that claims GenPU can recover
both positive and negative data distributions at equilibrium.
This paper illustrates the importance of both the Positive and
Unlabeled learning problem and its applications. Surprisingly
many important scenarios naturally contain a small amount of
positively labeled data and a large amount of unlabeled data.
Some of these applications have been extensively studied using
PU learning such as text classification (see section 2.2 above).
Other problems such as disease gene, molecular classification,
and drug identification are still in their infancy (see section 2.1).
In addition, we have included several problems that have never
or only rarely been investigated using PU learning such as
image, sound, and video classification and various security
applications (see section 2.3).
In addition to applications, we have surveyed a variety of old
and new algorithms for solving the PU Learning problem. We
have attempted to provide a broad overview of methods in the
literature.
There is significant opportunity for further research in this
field. A useful next step would be a comparative evaluation and
benchmarking study to apply the existing algorithms to a variety
of known and labeled datasets for comparative purposes.
5.1 Additional Reading on Machine Learning
Machine Learning (ML) is the science of finding patterns in big
data. Many algorithms exist today, but most can be generally
grouped in to one of three learning paradigms: supervised
learning, unsupervised learning, and reinforcement learning.
As mentioned briefly in the introduction, the appropriate
type of learning algorithm depends on the data available for
learning. If the training data includes the desired outputs to be
learned (labels), then a supervised learning algorithm is used. If
the desired outputs or labels are unknown, the problem falls into
the unsupervised paradigm. Reinforcement learning is different
in that it is generally agent based using a stochastic
exploratory/reward structure to learn a behavior or path in such
a way as to maximize a reward.
Many different learning tasks can be accomplished within
these general ML archetypes. Supervised learning algorithms
usually provide real-valued predictions (called regression) or
discrete class identification (classification). For each task,
various algorithms exist to solve it. For example, regression
problems for supervised, real-valued predictions can be learned
using linear regression, simple artificial neural networks
(ANNs), and various deep learning techniques such as recurrent
neural networks (RNNs). Classification problems can be solved
using support vector machines (SVMs), logistic regression, and
various ANNs and deep learning algorithms such as
convolutional neural networks (CNNs).
Unlike supervised learning algorithms, unsupervised
learning is often used to find hidden or unknown patterns in
data. This can be used for dimensionality reduction such as in
principal component analysis (PCA) or through using complex
deep learning neural networks to create autoencoders to more
effectively encode learning data. Unsupervised algorithms such
as k-means, spectral clustering, ANNs, and deep learning
algorithms can also be used for clustering and other pattern
identification purposes.
Reinforcement learning algorithms can be used for non-
convex optimization problems using search algorithms such as
genetic algorithms and simulated annealing, and for learning
behavior in areas such as robotics. ANNs and deep learning
techniques can be used for reinforcement learning, along with
more recent algorithms such as generative adversarial networks
(GANs) [65]. GANs also fit well in both supervised and
unsupervised scenarios.
Combinations of many of these supervised, unsupervised,
and even reinforcement learning algorithms are used for solving
the semi-supervised Positive and Unlabeled learning problem
described in this paper. Further approaches and combinations
of methods and algorithms will continue to improve solutions
to the PU learning problem. To learn more about Machine
Learning, we recommend looking into these books [4], [5],
[82], [83] and papers [6], [84]–[87] on the subject. Additional
papers on ML and the PU problem can be found in the
references [88]–[107].
[47] Y. Xu, C. Xu, C. Xu, and D. Tao, “Multi-positive and unlabeled
learning,” in IJCAI , Melbourne, Australia, IJCAI, pp. 3182–88, Aug.
[48] F. Chiaroni, M. C. Rahal, N. Hueber, and F. Dufaux, “Learning with
a generative adversarial network from a positive unlabeled dataset for
image classification,” in ICIP , Athens, Greece, IEEE, pp. 1368–
1372, Oct. 2018.
[49] C. Gong, H. Shi, J. Yang, J. Yang, and J. Yanga, “Multi-Manifold
Positive and Unlabeled Learning for Visual Analysis,” TCSVT , pp.
1 – 14, Mar. 2019.
[50] M. Kato, T. Teshima, and J. Honda, “Learning from Positive and
Unlabeled Data with a Selection Bias,” ICLR , May 2019.
[51] L. de Carvalho Pagliosa and R. F. de Mello, “Semi-supervised time
series classification on positive and unlabeled problems using cross-
recurrence quantification analysis,” Pattern Recognit. , vol. 80, pp.
53 – 63, Aug. 2018.
[52] M. N. Nguyen, X. L. Li, and S. K. Ng, “Positive unlabeled learning
for time series classification,” IJCAI , pp. 1421–1426, 2011.
[53] X.-L. Li, P. S. Yu, B. Liu, and S.-K. Ng, “Positive Unlabeled
Learning for Data Stream Classification,” in SDM , Sparks, SIAM, pp.
259 – 270, Apr. 2009.
[54] J. Zhang, Z. Wang, J. Meng, Y. P. Tan, and J. Yuan, “Boosting
positive and unlabeled learning for anomaly detection with multi-
features,” IEEE Trans. Multimed. , vol. 21, no. 5, pp. 1332–1344, May
[55] A. Kumar and B. Raj, “Audio event detection using weakly labeled
data,” in ACM MM , New York, New York, USA, ACM Press, pp.
1038 – 1047, 2016.
[56] G. Blanchard, G. Lee, and C. Scott, “Semi-supervised novelty
detection,” JMLR , vol. 11, pp. 2973–3009, Nov. 2010.
[57] E. Pedersen et al. , “PV Array Fault Detection using Radial Basis
Networks,” in IISA , Patras, Greece, IEEE, Jul. 2019.
[58] S. Rao, A. Spanias, and C. Tepedelenlioglu, “Solar Array Fault
Detection using Neural Networks,” in ICPS , Taipei, Taiwan, Institute
of Electrical and Electronics Engineers (IEEE), pp. 196–200, May
[59] A. S. Spanias, “Solar energy management as an Internet of Things
(IoT) application,” in IISA , Larnaca, Cyprus, IEEE, vol. 2018-Janua,
pp. 1–4, Aug. 2017.
[60] V. S. Narayanaswamy, R. Ayyanar, A. Spanias, C. Tepedelenlioglu,
and D. Srinivasan, “Connection Topology Optimization in
Photovoltaic Arrays using Neural Networks,” in ICPS , Taipei,
Taiwan, IEEE, pp. 167– 172 , May 2019.
[61] R. Ramakrishna, A. Scaglione, A. Spanias, and C. Tepedelenlioglu,
“Distributed Bayesian Estimation with Low-rank Data: Application
to Solar Array Processing,” in ICASSP , Brighton, UK, IEEE, vol.
2019 - May, pp. 4440–4444, 2019.
[62] J. Bekker and J. Davis, “Learning From Positive and Unlabeled Data:
A Survey,” ArXiv , Nov. 2018.
[63] C. Elkan and K. Noto, “Learning classifiers from only positive and
unlabeled data,” in SIGKDD , Las Vegas, ACM, pp. 213–20, Aug.
[64] R. Kiryo, G. Niu, M. C. du Plessis, and M. Sugiyama, “Positive-
Unlabeled Learning with Non-Negative Risk Estimator,” in NIPS ,
Long Beach, Curran Assoc. Inc., pp. 1674–84, Dec. 2017.
[65] M. Hou, B. Chaib-Draa, C. Li, and Q. Zhao, “Generative Adversarial
Positive-Unlabelled Learning,” in IJCAI , Stockholm, Sweden,
International Joint Conferences on Artificial Intelligence, Jul. 2018.
[66] J. Bekker and J. Davis, “Learning from Positive and Unlabeled Data
under the Selected At Random Assumption,” in Proc. Second Int.
Work. Learn. with Imbalanced Domains Theory Appl. , Dublin,
Ireland, PMLR, pp. 94:8–22, Sep. 2018.
[67] Y. Zhang, X. C. Ju, and Y. J. Tian, “Nonparallel hyperplane support
vector machine for PU learning,” in ICNC , Xiamen, China, IEEE, pp.
703 – 708, Aug. 2014.
[68] X.-L. Li and B. Liu, “Learning from Positive and Unlabeled
Examples with Different Data Distributions,” ECML PKDD , pp.
218 – 229, Jan. 2005.
[69] Z. Liu, W. Shi, D. Li, and Q. Qin, “Partially Supervised
Classification: Based on Weighted Unlabeled Samples Support
Vector Machine,” IJDWM , vol. 2, no. 3, pp. 42–56, 2006.
[70] K. Jaskie, C. Elkan, and A. S. Spanias, “A Modified Logistic
Regression for Positive and Unlabeled Learning,” in ACSSC , Pacific
Grove, California, IEEE, Nov. 2019.
[71] A. K. Menon, B. Van Rooyen, C. S. Oong, and R. C. Williamson,
“Learning from Corrupted Binary Labels via Class-Probability
Estimation,” in ICML , Lille, France, JMLR, vol. 37, pp. 125–34, Jul.
[72] T. Liu and D. Tao, “Classification with Noisy Labels by Importance
Reweighting,” TPAMI , vol. 38, no. 3, pp. 447–461, Mar. 2016.
[73] C. Scott, “A rate of convergence for mixture proportion estimation,
with application to learning from noisy labels,” in AISTATS , San
Diego, JMLR and Microtome, vol. 38, pp. 838–846, May 2015.
[74] Y. H. Shao, W. J. Chen, L. M. Liu, and N. Y. Deng, “Laplacian unit-
hyperplane learning from positive and unlabeled examples,” Inf. Sci.
(Ny). , vol. 314, no. 1, pp. 152–168, Sep. 2015.
[75] A. T. Smith and C. Elkan, “Making generative classifiers robust to
selection bias,” in SIGKDD , San Jose, ACM, pp. 657–666, Aug.
[76] F. He, G. I. Webb, T. Liu, and D. Tao, “Instance-Dependent PU
Learning by Bayesian Optimal Relabeling,” ArXiv , Aug. 2018.
[77] J. Bekker and J. Davis, “Beyond the Selected Completely At Random
Assumption for Learning from Positive and Unlabeled Data,” ArXiv ,
Jun. 2018.
[78] E. Sansone, F. G. B. De Natale, and Z. H. Zhou, “Efficient Training
for Positive Unlabeled Learning,” TPAMI , Jul. 2018.
[79] M. C. du Plessis, G. Niu, and M. Sugiyama, “Convex Formulation
for Learning from Positive and Unlabeled Data,” in ACML , Hong
Kong, China, Springer, pp. 221–236, Nov. 2015.
[80] Y. Kwon, W. Kim, M. Sugiyama, and M. C. Paik, “An analytic
formulation for positive-unlabeled learning via weighted integral
probability metric,” ArXiv , 2019.
[81] M. C. du Plessis, G. Niu, and M. Sugiyama, “Analysis of Learning
from Positive and Unlabeled Data,” NIPS , pp. 703–711, Dec. 2014.
[82] S. Theodoridis, Machine Learning: A Bayesian and Optimization
Perspective. Elsevier Ltd, Mar. 2015.
[83] M. Virvou, E. Alepis, G. A. Tsihrintzis, and L. C. Jain, Eds., Machine
Learning Paradigms, Advances in Learning Analytics. Springer, May
[84] H. Song, J. Thiagarajan, K. Ramamurthy, A. S. Spanias, and P.
Turaga, “Iterative Kernel Fusion for Image Classification,” in
ICASSP , Shanghai, China, IEEE, Mar. 2016.
[85] A. J. Smola and B. Schölkopf, “A tutorial on support vector
regression,” Statistics and Computing , vol. 14, no. 3. pp. 199–222,
Aug-2004.
[86] H. Song, J. J. Thiagarajan, P. Sattigeri, and A. Spanias, “Optimizing
kernel machines using deep learning,” IEEE Trans. Neural Networks
Learn. Syst. , vol. 29, no. 11, pp. 5528–5540, Feb. 2018.
[87] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind,
“Automatic Differentiation in Machine Learning: a Survey,” JMLR ,
vol. 18, pp. 1–43, Apr. 2018.
[88] M. C. Du Plessis, G. Niu, and M. Sugiyama, “Class-prior estimation
for learning from positive and unlabeled data,” in ACML , Hong
Kong, China, Asian Conference on Machine Learning, pp. 221–236,
Nov. 2015.
[89] J. He, Y. Zhang, X. Li, and Y. Wang, “Naive Bayes Classifier for
Positive Unlabeled Learning with Uncertainty *,” in SDM ,
Columbus, SIAM, p. 12, Apr. 2010.
[90] A. Blum and T. Mitchell, “Combining Labeled and Unlabeled Data
with Co-Training,” Proc. 11th Annu. Conf. Comput. Learn. Theory ,
pp. 92–100, Jul. 1998.
[91] K. Pelckmans and J. A. K. Suykens, “Transductively learning from
positive examples only,” in ESANN , Bruges, Belgium, ESANN, vol.
08, no. April 2009, pp. 2007–2011, Apr. 2009.
[92] S. Dan and N. S. Cardell, “Estimating logistic regression models
when the dependent variable has no variance,” Commun. Stat. -
Theory and Methods , vol. 21, no. 2, pp. 423–450, Jun. 2007.
[93] S. Jain, M. White, and P. Radivojac, “Estimating the class prior and
posterior from noisy positives and unlabeled data,” NIPS , pp. 2693–
2701, Dec. 2016.
[94] J. Bekker and J. Davis, “Estimating the Class Prior in Positive and
Unlabeled Data Through Decision Tree Induction,” in AAAI , New
Orleans, Louisiana, AAAI Press, pp. 2712–2719, Feb. 2018.
[95] B. Scholkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C.
Williamson, “Estimating the Support of a High-Dimensional
Distribution,” Neural Comput. , vol. 13, no. 7, pp. 1443–1471, Jul.
[96] M. C. Du Plessis and M. Sugiyama, “Class Prior Estimation from
Positive and Unlabeled Data,” IEICE Trans. Inf. Syst. , vol. E96-D,
no. 5, pp. 1358–1362, 2014.
[97] B. Zhang and W. Zuo, “Learning from positive and unlabeled
examples: A survey,” in ISIP WMWA , Moscow, Russia, IEEE, May
[98] H. G. Ramaswamy, C. Scott, and A. Tewari, “Mixture Proportion
Estimation via Kernel Embedding of Distributions,” in ICML , New
York, JMLR, vol. 48, pp. 2052–60, Jun. 2016.
[99] J. T. Zhou, Q. Mao, I. W. Tsang, and S. J. Pan, “Multi-view positive
and unlabeled learning,” in ACML , Singapore, Singapore, JMLR, vol.
25, pp. 555–570, Nov. 2012.
[100] S. Jain, M. White, M. W. Trosset, and P. Radivojac, “Nonparametric
semi-supervised learning of class proportions,” Jan. 2016.
[101] S. S. Khan and M. G. Madden, “One-class classification: Taxonomy
of study and review of techniques,” Knowl. Eng. Rev. , vol. 29, no. 3,
pp. 345–374, Jun. 2014.
[102] D. Ienco and R. G. Pensa, “Positive and unlabeled learning in
categorical data,” Neurocomputing , vol. 196, pp. 113–124, Jul. 2016.
[103] P. Yang, W. Liu, and J. Yang, “Positive unlabeled learning via
wrapper-based adaptive sampling,” in IJCAI , Melbourne, Australia,
IJCAI, pp. 3273–79, Aug. 2017.
[104] C. Hsieh, N. Natarajan, and I. S. Dhillon, “PU Learning for Matrix
Completion,” in ICML , Lille, France, ICML, vol. 37, pp. 2445–53,
Jul. 2015.
[105] M. C. Du Plessis and M. Sugiyama, “Semi-supervised learning of
class balance under class-prior change by distribution matching,”
Neural Networks , vol. 50, pp. 110–119, Feb. 2014.
[106] R. J. A. Little and D. B. Rubin, Statistical Analysis with Missing
Data , Second Edi. John Wiley & Sons, Incorporated, Sep. 2002.
[107] G. Niu, M. C. du Plessis, T. Sakai, Y. Ma, and M. Sugiyama,
“Theoretical Comparisons of Positive-Unlabeled Learning against
Positive-Negative Learning,” in NIPS , Barcelona, Spain, ACM, pp.
1207 – 15, Dec. 2016.