本文记录一些读过的多模态论文。

Learning to Prompt for Vision-Language Models

( Citation: , & al., , , & (). Learning to Prompt for Vision-Language Models. International Journal of Computer Vision, 130(9). 2337–2348. https://doi.org/10.1007/s11263-022-01653-1 ) 提出了自动化生成CLIP类别提示词的方法。