This paper attempts to answer a "simple question" in building predictive models using machine learning algorithms. Although diagnostic and predictive models for various diseases have been proposed using data from large cohort studies and machine learning algorithms, challenges remain in their generalizability. Several causes for this challenge have been pointed out, and partitioning of the dataset with randomness is considered to be one of them. In this study, we constructed 33,600 diabetes diagnosis models with "initial state" dependent randomness using autoML (automatic machine learning framework) and open diabetes data, and evaluated their prediction accuracy. The results showed that the prediction accuracy had an initial state-dependent distribution. Since this distribution could follow a normal distribution, we estimated the expected interval of prediction accuracy using statistical interval estimation in order to fairly compare the accuracy of the prediction models.

本研究解决了使用机器学习算法构建预测模型时存在的预测准确性不稳定问题，尤其是在数据集随机划分带来的挑战。通过构建33600个糖尿病诊断模型并进行评估，结果表明其预测准确性受到初始状态的影响，因此采用统计区间估计方法对模型的预测准确性进行了公平比较，展现了该方法在提升模型比较公正性方面的潜力。

由于数据划分的随机性导致的预测准确性变化及区间估计的公平评估