Large language models (LLM) not only empower multiple language tasks but also serve as a general interface across different spaces. Up to now, it has not been demonstrated yet how to effectively translate the successes of LLMs in the computer vision field to the medical imaging field which involves high-dimensional and multi-modal medical images. In this paper, we report a feasibility study of building a multi-task CT large image-text (LIT) model for lung cancer diagnosis by combining an LLM and a large image model (LIM). Specifically, the LLM and LIM are used as encoders to perceive multi-modal information under task-specific text prompts, which synergizes multi-source information and task-specific and patient-specific priors for optimized diagnostic performance. The key components of our LIT model and associated techniques are evaluated with an emphasis on 3D lung CT analysis. Our initial results show that the LIT model performs multiple medical tasks well, including lung segmentation, lung nodule detection, and lung cancer classification. Active efforts are in progress to develop large image-language models for superior medical imaging in diverse applications and optimal patient outcomes.

本文探讨了如何将大型语言模型成功应用于医学成像领域中高维和多模态医学图像的特定任务，该文章使用了一个LIM（大型图像模型）和LLM组合来建立多任务CT大型图像文本（LIT）模型以实现肺癌诊断。该模型具有良好的医学任务表现，包括肺分割、肺结节检测和肺癌分类。

使用大型图像文本（LIT）模型的CT多任务学习