Deep learning-based appearance gaze estimation methods are gaining popularity due to their high accuracy and fewer constraints from the environment. However, existing high-precision models often rely on deeper networks, leading to problems such as large parameters, long training time, and slow convergence. In terms of this issue, this paper proposes a novel lightweight gaze estimation model FGI-Net(Fusion Global Information). The model fuses global information into the CNN, effectively compensating for the need of multi-layer convolution and pooling to indirectly capture global information, while reducing the complexity of the model, improving the model accuracy and convergence speed. To validate the performance of the model, a large number of experiments are conducted, comparing accuracy with existing classical models and lightweight models, comparing convergence speed with models of different architectures, and conducting ablation experiments. Experimental results show that compared with GazeCaps, the latest gaze estimation model, FGI-Net achieves a smaller angle error with 87.1% and 79.1% reduction in parameters and FLOPs, respectively (MPIIFaceGaze is 3.74{\deg}, EyeDiap is 5.15{\deg}, Gaze360 is 10.50{\deg} and RT-Gene is 6.02{\deg}). Moreover, compared with different architectural models such as CNN and Transformer, FGI-Net is able to quickly converge to a higher accuracy range with fewer iterations of training, when achieving optimal accuracy on the Gaze360 and EyeDiap datasets, the FGI-Net model has 25% and 37.5% fewer iterations of training compared to GazeTR, respectively.

本研究针对现有高精度注视估计模型在参数量、训练时间及收敛速度上的不足，提出了一种新颖的轻量级注视估计模型FGI-Net（融合全局信息）。该模型有效融合全局信息，降低了模型复杂性，同时提高了准确性和收敛速度。实验结果表明，FGI-Net在多个数据集上与现有模型相比，显著减少了角度误差及训练迭代次数，从而展现出其优越的性能。

通过融合全局信息的轻量级注视估计模型