Purpose: High-speed video (HSV) phase detection (PD) segmentation is vital in nuclear reactors, chemical processing, and electronics cooling for detecting vapor, liquid, and microlayer phases. Traditional segmentation models face pixel-level accuracy and generalization issues in multimodal data. MSEG-VCUQ introduces VideoSAM, a hybrid framework leveraging convolutional neural networks (CNNs) and transformer-based vision models to enhance segmentation accuracy and generalizability across complex multimodal PD tasks. Methods: VideoSAM combines U-Net CNN and the Segment Anything Model (SAM) for advanced feature extraction and segmentation across diverse HSV PD modalities, spanning fluids like water, FC-72, nitrogen, and argon under varied heat flux conditions. The framework also incorporates uncertainty quantification (UQ) to assess pixel-based discretization errors, delivering reliable metrics such as contact line density and dry area fraction under experimental conditions. Results: VideoSAM outperforms SAM and modality-specific CNN models in segmentation accuracy, excelling in environments with complex phase boundaries, overlapping bubbles, and dynamic liquid-vapor interactions. Its hybrid architecture supports cross-dataset generalization, adapting effectively to varying modalities. The UQ module provides accurate error estimates, enhancing the reliability of segmentation outputs for advanced HSV PD research. Conclusion: MSEG-VCUQ, via VideoSAM, offers a robust solution for HSV PD segmentation, addressing previous limitations with advanced deep learning and UQ techniques. The open-source datasets and tools introduced enable scalable, precise, and adaptable segmentation for multimodal PD datasets, supporting advancements in HSV analysis and autonomous experimentation. The codes and data used for this paper are publicly available at https://github.com/chikap421/mseg_vcuq

本研究针对传统分割模型在多模态数据中的像素级准确性和泛化能力不足的问题，提出了MSEG-VCUQ框架。本框架结合了卷积神经网络和基于变换器的视觉模型，通过不确定性量化提高分割准确性并支持跨数据集泛化。实验结果表明，VideoSAM在复杂相边界环境下表现优越，提供可靠的分割输出，促进高速度视频分析的进步。

MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation
  Models, Convolutional Neural Networks, and Uncertainty Quantification for
  High-Speed Video Phase Detection Data

多模态分割与增强视觉基础模型、卷积神经网络及不确定性量化的高速度视频相位检测数据处理