
ACL-2023 文章阅读 CLIPTEXT: A New Paradigm for Zero-shot Text ...
2023年12月11日 · clip是一种从自然语言监督中学习视觉概念的有效且可扩展的方法,在各种零样本计算机视觉任务上取得了令人惊讶的显著成功。 CLIP包含视觉编码器V和文本编码器T。
mCLIP: Multilingual CLIP via Cross-lingual Transfer - ACL Anthology
2025年3月26日 · In this paper, we introduce mCLIP, a retrieval-efficient dual-stream multilingual VLP model, trained by aligning the CLIP model and a Multilingual Text Encoder (MTE) through a novel Triangle Cross-modal Knowledge Distillation (TriKD) method. It is parameter-efficient as only two light projectors on the top of them are updated during distillation.
CLIPText: A New Paradigm for Zero-shot Text Classification - ACL …
5 天之前 · Specifically, we introduce CLIPText, a novel paradigm for zero-shot text classification, which reformulates zero-shot text classification into a text-image matching problem that CLIP can be applied to. In addition, we further incorporate prompt into CLIPText (Prompt-CLIPText) to better derive knowledge from CLIP.
顶刊ACL 为多模态模型 添加文本的多语言能力 - 知乎
follow了这篇文章,Cross-lingual and Multilingual CLIP,使用了 Teacher Learning 的方法,从clip的text encoder中学习多语言的text encoder。 概述:蒸馏。 冻结一个预训练好的english teacher,然后student是在多语言数据上预训练过的模型,然后teacher输入的都是english,student输入的既有中文,又有英文。然后做蒸馏。有一点需要注意,是在第一次训练的时候student加了一个fc1, 在CL阶段,又加了一个proj。 使用原版的clip中的text encoder, 作 …
CVPR 2023 | Cross-modal Adaptation: 基于 CLIP 的微调新范式
本文提出了一种简单而有效的基于多模态预训练模型 CLIP 的小样本微调算法—— cross-modal adaptation ,通过将跨模态信息(例如文字标签)作为训练样本加入交叉熵损失(Cross-Entropy Loss, CE Loss)进行微调,即可实现用一个简单的线性分类器在十一个图像识别训练集中 ...
GitHub - ghchen18/acl23_mclip: The official code and model for ACL …
We propose mCLIP, a retrieval-efficient dual-stream multilingual VLP model. It is trained by aligning the CLIP model and a Multilingual Text Encoder (MTE) through a novel Triangle Cross-modal Knowledge Distillation (TriKD) method. It is parameter-efficient as only two light projectors on the top of them are updated during distillation.
GitHub - SunzeY/AlphaCLIP: [CVPR 2024] Alpha-CLIP: A CLIP …
🔥 3.93% improved zero-shot ImageNet classification accuracy when providing foreground alpha-map. 🔥 Plug-in and play with region focus in any work that use CLIP vision encoder. 🔥 A strong visual encoder as versatile tool when foreground mask is available. Training code for Alpha-CLIP and MaskImageNet data.
Delving into the Openness of CLIP - ACL Anthology
2025年3月23日 · Contrastive Language-Image Pre-training (CLIP) formulates image classification as an image-to-text matching task, i.e., matching images to the corresponding natural language descriptions instead of discrete category IDs.
GitHub - yangbang18/MultiCapCLIP: (ACL'2023) MultiCapCLIP: …
PyTroch implementation of our ACL'23 paper: MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning. Bang Yang, Fenglin Liu, Xian Wu, Yaowei Wang, Xu Sun, and Yuexian Zou. ACL Anthology, arXiv
[2409.15077] TSCLIP: Robust CLIP Fine-Tuning for Worldwide …
2025年3月8日 · In this paper, we propose TSCLIP, a robust fine-tuning approach with the contrastive language-image pre-training (CLIP) model for worldwide cross-regional traffic sign recognition. We first curate a cross-regional traffic sign benchmark dataset by combining data from ten different sources.