Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment. This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm. Central to GLWS is an Expectation-Maximization (EM) formulation, adeptly accommodating various weak supervision sources, including instance partial labels, aggregate statistics, pairwise observations, and unlabeled data. We further present an advanced algorithm that significantly simplifies the EM computational demands using a Non-deterministic Finite Automaton (NFA) along with a forward-backward algorithm, which effectively reduces time complexity from quadratic or factorial often required in existing solutions to linear scale. The problem of learning from arbitrary weak supervision is therefore converted to the NFA modeling of them. GLWS not only enhances the scalability of machine learning models but also demonstrates superior performance and versatility across 11 weak supervision scenarios. We hope our work paves the way for further advancements and practical deployment in this field.

这篇论文介绍了一种从弱监督学习中学习的通用框架，其中核心是使用期望最大化(EM)方法灵活处理各种弱监督来源，包括实例部分标签，聚合统计，成对观测和无标签数据。我们还提出了一种先进的算法，使用非确定性有穷自动机(NFA)和前后向算法显著简化了EM的计算需求，有效地将时间复杂度从现有解决方案中通常需要的二次或阶乘降低到线性规模。这种学习任意弱监督问题的方法被转化为对NFA的建模。GLWS不仅提高了机器学习模型的可扩展性，还在11个弱监督场景中展示了卓越的性能和多功能性。我们希望我们的工作为这个领域的进一步发展和实际应用铺平道路。

从弱监督学习的一般框架