We propose a distributional framework for assessing socio-technical risks of foundation models with quantified statistical significance. Our approach hinges on a new statistical relative testing based on first and second order stochastic dominance of real random variables. We show that the second order statistics in this test are linked to mean-risk models commonly used in econometrics and mathematical finance to balance risk and utility when choosing between alternatives. Using this framework, we formally develop a risk-aware approach for foundation model selection given guardrails quantified by specified metrics. Inspired by portfolio optimization and selection theory in mathematical finance, we define a \emph{metrics portfolio} for each model as a means to aggregate a collection of metrics, and perform model selection based on the stochastic dominance of these portfolios. The statistical significance of our tests is backed theoretically by an asymptotic analysis via central limit theorems instantiated in practice via a bootstrap variance estimate. We use our framework to compare various large language models regarding risks related to drifting from instructions and outputting toxic content.

提出了一个分布框架，用于评估基础模型的社会技术风险，并通过量化统计显著性进行考量。利用基于实际随机变量的一阶和二阶随机优势的新统计相对测试，与在选择备选方案时平衡风险和效用常用的均值-风险模型建立联系。采用这个框架，我们正式发展了一个基于风险意识的基础模型选择方法，给定由指定指标量化的约束。受数学金融中的投资组合优化和选择理论的启发，我们为每个模型定义了一个“指标组合”作为聚合指标的方法，并基于这些组合的随机优势进行模型选择。我们的测试的统计显著性在理论上得到支持，通过中心极限定理的渐近分析并在实践中通过自助方差估计来实例化。我们使用这个框架来比较各种大型语言模型，针对偏离指令和输出有害内容的风险进行评估。

基础模型时代中的风险评估和统计显著性