Large Language Models (LLMs) have demonstrated remarkable performance across
various natural language tasks, marking significant strides towards general
artificial intelligence. While general artificial intelligence is leveraged by
developing increasingly large-scale models, there could be another branch to
develop lightweight custom models that better serve certain domains, taking
into account the high cost of training and deploying LLMs and the scarcity of
resources. In this paper, we present MindLLM, a novel series of bilingual
lightweight large language models, trained from scratch, alleviating such
burdens by offering models with 1.3 billion and 3 billion parameters. A
thorough account of experiences accrued during large model development is
given, covering every step of the process, including data construction, model
architecture, evaluation, and applications. Such insights are hopefully
valuable for fellow academics and developers. MindLLM consistently matches or
surpasses the performance of other open-source larger models on some public
benchmarks. We also introduce an innovative instruction tuning framework
tailored for smaller models to enhance their capabilities efficiently.
Moreover, we explore the application of MindLLM in specific vertical domains
such as law and finance, underscoring the agility and adaptability of our
lightweight models.

MindLLM 是一系列双语轻量级大型语言模型，通过从头开始训练模型以减轻培训和部署大型语言模型的负担并解决资源不足问题。该论文提供了大模型开发过程中的经验，并介绍了适用于较小模型的创新指令调整框架，同时探索了 MindLLM 在法律和金融等特定垂直领域的应用。

从零开始预训练轻量级大型语言模型 MindLLM: 评估与领域应用

MindLLM: Pre-training Lightweight Large Language Model from Scratch,  Evaluations and Domain Applications

Scene text editing is a challenging task that involves modifying or inserting
specified texts in an image while maintaining its natural and realistic
appearance. Most previous approaches to this task rely on style-transfer models
that crop out text regions and feed them into image transfer models, such as
GANs. However, these methods are limited in their ability to change text style
and are unable to insert texts into images. Recent advances in diffusion models
have shown promise in overcoming these limitations with text-conditional image
editing. However, our empirical analysis reveals that state-of-the-art
diffusion models struggle with rendering correct text and controlling text
style. To address these problems, we propose DIFFSTE to improve pre-trained
diffusion models with a dual encoder design, which includes a character encoder
for better text legibility and an instruction encoder for better style control.
An instruction tuning framework is introduced to train our model to learn the
mapping from the text instruction to the corresponding image with either the
specified style or the style of the surrounding texts in the background. Such a
training method further brings our method the zero-shot generalization ability
to the following three scenarios: generating text with unseen font variation,
e.g., italic and bold, mixing different fonts to construct a new font, and
using more relaxed forms of natural language as the instructions to guide the
generation task. We evaluate our approach on five datasets and demonstrate its
superior performance in terms of text correctness, image naturalness, and style
controllability. Our code is publicly available.
this https URL

DIFFSTE 是一个改善预训练扩散模型性能的双编码器设计框架，通过指令调整训练，实现了场景文本编辑中正确文本渲染和风格控制的任务，使其具有零 - shot 泛化能力。