May, 2024

母子嵌套多模型

TL;DRM3: Matryoshka Multimodal Models propose nested visual tokens to control visual granularity, offering flexibility in trading off information density v.s. efficiency, and revealing a large gap between fixed-scale representations and the oracle upper bound.