large language models (LLMs) and large visual language models (LVLMs) have
been at the forefront of the artificial intelligence field, particularly for
tasks like text generation, video captioning, and question-answering.
Typically, it is more applicable to train these models on broade