May, 2024
母子嵌套多模型
Matryoshka Multimodal Models
TL;DRM3: Matryoshka Multimodal Models propose nested visual tokens to control visual granularity, offering flexibility in trading off information density v.s. efficiency, and revealing a large gap between fixed-scale representations and the oracle upper bound.