This paper examines various methods of computing uncertainty and diversity for active learning in genetic programming. We found that the model population in genetic programming can be exploited to select informative training data points by using a model ensemble combined with an uncertainty metric. We explored several uncertainty metrics and found that differential entropy performed the best. We also compared two data diversity metrics and found that correlation as a diversity metric performs better than minimum Euclidean distance, although there are some drawbacks that prevent correlation from being used on all problems. Finally, we combined uncertainty and diversity using a Pareto optimization approach to allow both to be considered in a balanced way to guide the selection of informative and unique data points for training.

该论文研究了遗传编程中计算不确定性和多样性的各种方法。通过使用模型合集和不确定性度量，研究发现模型种群可以被利用来选择有信息量的训练数据点。对于不确定性度量，研究表明差分熵的性能最好。同时，比较了两种数据多样性度量，发现相关性作为多样性度量优于最小欧氏距离，尽管相关性存在某些缺点，不能用于所有问题。最后，使用 Pareto 优化方法结合不确定性和多样性，以平衡方式指导选择有信息量和独特的训练数据点。

遗传规划中的主动学习：引导符号回归的高效数据收集