We introduce a new observational setting for Positive Unlabeled (PU) data where the observations at prediction time are also labeled. This occurs commonly in practice -- we argue that the additional information is important for prediction, and call this task "augmented PU prediction". We allow for labeling to be feature dependent. In such scenario, Bayes classifier and its risk is established and compared with a risk of a classifier which for unlabeled data is based only on predictors. We introduce several variants of the empirical Bayes rule in such scenario and investigate their performance. We emphasise dangers (and ease) of applying classical classification rule in the augmented PU scenario -- due to no preexisting studies, an unaware researcher is prone to skewing the obtained predictions. We conclude that the variant based on recently proposed variational autoencoder designed for PU scenario works on par or better than other considered variants and yields advantage over feature-only based methods in terms of accuracy for unlabeled samples.

我们介绍了一种新的正无标签（Positive Unlabeled，PU）数据的观测设置，其中预测时的观测也被标记。我们认为这在实践中很常见，并且认为附加信息对于预测很重要，我们称之为“增强PU预测”任务。我们允许标记与特征相关。在这种情况下，我们建立并比较了贝叶斯分类器及其风险与仅基于预测器的无标签数据的分类器的风险。我们在这种情境中引入了几种经验贝叶斯规则的变体，并研究了它们的性能。我们强调了在增强PU情境中应用经典分类规则的危险性（和简易性）-由于没有现有的研究，一个无知的研究者可能会偏离所得的预测结果。我们得出结论：基于最近提出的针对PU情境设计的变分自动编码器的变体与其他考虑的变体相比，在准确性方面在无标签样本上具有优势，并且优于仅基于特征的方法。

正选择偏差下对正未标记数据的增强预测