Chart summarization is a crucial task for blind and visually impaired
individuals as it is their primary means of accessing and interpreting
graphical data. Crafting high-quality descriptions is challenging because it
requires precise communication of essential details within the chart without
vision perception. Many chart analysis methods, however, produce brief,
unstructured responses that may contain significant hallucinations, affecting
their reliability for blind people. To address these challenges, this work
presents three key contributions: (1) We introduce the AltChart dataset,
comprising 10,000 real chart images, each paired with a comprehensive summary
that features long-context, and semantically rich annotations. (2) We propose a
new method for pretraining Vision-Language Models (VLMs) to learn fine-grained
chart representations through training with multiple pretext tasks, yielding a
performance gain with ${\sim}2.5\%$. (3) We conduct extensive evaluations of
four leading chart summarization models, analyzing how accessible their
descriptions are. Our dataset and codes are publicly available on our project
page: this https URL

图表概括对于盲人和视障人士来说是一项至关重要的任务，因为它是他们获取和解释图形数据的主要手段。本研究提出了三个关键贡献：引入了 AltChart 数据集，提出了一个新的预训练视觉语言模型的方法，以及对四种主流图表概括模型进行了全面评估。

AltChart: 多预训练任务提升基于 VLM 的图表摘要

AltChart: Enhancing VLM-based Chart Summarization Through Multi-Pretext  Tasks

In recent years, image captioning and segmentation have emerged as crucial
tasks in computer vision, with applications ranging from autonomous driving to
content analysis. Although multiple solutions have emerged to help blind and
visually impaired people move around their environment, few are applications
that help them understand and rebuild a scene in their minds through text. Most
built models focus on helping users move and avoid obstacles, restricting the
number of environments blind and visually impaired people can be in.
In this paper, we will propose an approach that helps them understand their
surroundings using image captioning. The particularity of our research is that
we offer them descriptions with positions of regions and objects regarding them
(left, right, front), as well as positional relationships between regions,
while we aim to give them access to theatre plays by applying the solution to
our TS-RGBD dataset.

通过图像字幕和分割，本研究提出了一种帮助盲人和视觉受损人士了解和重建环境的方法，该方法提供了与他们相关的区域和对象的位置描述（左、右、前），以及区域之间的位置关系，通过将解决方案应用于 TS-RGBD 数据集，旨在帮助他们访问戏剧表演。

面向盲人和视障人士的实时自我的运动段落字幕生成在 RGB-D 影院图像中的研究

Towards Real Time Egocentric Segment Captioning for The Blind and  Visually Impaired in RGB-D Theatre Images

In today's society, where independent living is becoming increasingly
important, it can be extremely constricting for those who are blind. Blind and
visually impaired (BVI) people face challenges because they need manual support
to prompt information about their environment. In this work, we took our first
step towards developing an affordable and high-performing eye wearable
assistive device, DRISHTI, to provide visual navigation assistance for BVI
people. This system comprises a camera module, ESP32 processor, Bluetooth
module, smartphone and speakers. Using artificial intelligence, this system is
proposed to detect and understand the nature of the users' path and obstacles
ahead of the user in that path and then inform BVI users about it via audio
output to enable them to acquire directions by themselves on their journey.
This first step discussed in this paper involves establishing a
proof-of-concept of achieving the right balance of affordability and
performance by testing an initial software integration of a currency detection
algorithm on a low-cost embedded arrangement. This work will lay the foundation
for our upcoming works toward achieving the goal of assisting the maximum of
BVI people around the globe in moving independently.

本文介绍了一种名为 DRISHTI 的新型可穿戴助听设备，它由摄像头模块、ESP32 处理器、蓝牙模块、智能手机和扬声器组成，利用人工智能检测和了解用户路径以及路径前方的障碍物，然后通过语音输出向视障用户提供导航帮助，旨在实现视障人士的自主行动。该设备的低成本和高效性证明了其可行性。