VLM Video - 搜索

约 57,000 个结果

在新选项卡中打开链接

时间不限

github.com
https://github.com › Awesome-LLMs-for-Video-Understanding
Awesome-LLMs-for-Video-Understanding - GitHub
This comprehensive survey covers video understanding techniques powered by large language models (Vid-LLMs), training strategies, relevant tasks, datasets, benchmarks, and evaluation …
zhihu.com
https://zhuanlan.zhihu.com
【VLM技术报告】《Hunyuan Video:用于大型视频生成模型的系统 …
2024年12月4日 · 为此，团队开发了一种内部视觉-语言模型（Vision Language Model, VLM），用于为图像和视频生成多维度的结构化描述。这些描述采用 JSON 格式，从以下多个角度呈现 …
csdn.net
https://blog.csdn.net › article › details
一文深度看懂视觉语言模型 (VLM) - CSDN博客
2025年1月21日 · 自从谷歌提出ViT、Open AI发布CLIP，视觉语言模型（VLM）便成为了研究热点，凭借跨模态处理和理解能力，以及零样本学习方法，为CV领域带来了重大革新，今 …
csdn.net
https://blog.csdn.net › article › details
VLM和VLAM（VLA）相关介绍和发展历程 - CSDN博客
2024年12月28日 · 视觉语言模型（VLM）和视觉语言动作模型（VLA）是近年来在人工智能领域内取得显著进展的两个概念，它们的发展历程反映了多模态学习的进步，特别是在结合视觉、语 …
csdn.net
https://blog.csdn.net › article › details
视觉语言模型详解【VLM】 - CSDN博客
2024年5月20日 · 视觉语言模型（Vision-Language Models, VLMs）是能够同时处理和理解视觉（图像）和语言（文本）两种模态信息的人工智能模型。这种模型结合了计算机视觉和自然 …
zhihu.com
https://zhuanlan.zhihu.com
VLM综述：An introduction to Vision-Language Modeling（一）
在 transformers 问世之后，VLM领域也有不小的发展，在多模态的训练中，主要有以下四种方式：1、contrastive training，对比学习，缩小正样本的距离，拉大负样本的距离；2、 masking …
zhihu.com
https://zhuanlan.zhihu.com
VTimeLLM：具有时序感知能力的VideoLLM - 知乎 - 知乎专栏
2023年11月30日 · 论文名：VTimeLLM: Empower LLM to Grasp Video Moments. Arxiv： arxiv.org/abs/2311.1844. GIthub： github.com/huangb23/VTi. [Submitted on 30 Nov 2023] 传 …
arxiv.org
https://arxiv.org › abs
NaVid: Video-based VLM Plans the Next Step for Vision-and …
2024年2月24日 · In this paper, we propose NaVid, a video-based large vision language model (VLM), to mitigate such a generalization gap. NaVid makes the first endeavor to showcase the …
arxiv.org
https://arxiv.org › abs
LongVLM: Efficient Long Video Understanding via Large Language …
2024年4月4日 · To tackle this challenge, we introduce LongVLM, a simple yet powerful VideoLLM for long video understanding, building upon the observation that long videos often consist of …
cogvlm2-video.github.io
https://cogvlm2-video.github.io
CogVLM2-Video
CogVLM2-Video not only achieves state-of-the-art performance on public video understanding benchmarks but also excels in video captioning and temporal grounding, providing a powerful …
分页
- 1
- 2
- 3
- 4
- 下一页

Awesome-LLMs-for-Video-Understanding - GitHub

【VLM技术报告】《Hunyuan Video:用于大型视频生成模型的系统 …

一文深度看懂视觉语言模型 (VLM) - CSDN博客

VLM和VLAM（VLA）相关介绍和发展历程 - CSDN博客

视觉语言模型详解【VLM】 - CSDN博客

VLM综述：An introduction to Vision-Language Modeling（一）

VTimeLLM：具有时序感知能力的VideoLLM - 知乎 - 知乎专栏

NaVid: Video-based VLM Plans the Next Step for Vision-and …

LongVLM: Efficient Long Video Understanding via Large Language …

CogVLM2-Video