VLM Video - 搜索

约 57,000 个结果

在新选项卡中打开链接

时间不限

github.com
https://github.com › Awesome-LLMs-for-Video-Understanding
Awesome-LLMs-for-Video-Understanding - GitHub
This comprehensive survey covers video understanding techniques powered by large language models (Vid-LLMs), training strategies, relevant tasks, datasets, benchmarks, and evaluation methods, and discusses the applications of Vid-LLMs across various domains.
zhihu.com
https://zhuanlan.zhihu.com
【VLM技术报告】《Hunyuan Video:用于大型视频生成模型的系统 …
2024年12月4日 · 为此，团队开发了一种内部视觉-语言模型（Vision Language Model, VLM），用于为图像和视频生成多维度的结构化描述。这些描述采用 JSON 格式，从以下多个角度呈现信息：
csdn.net
https://blog.csdn.net › article › details
一文深度看懂视觉语言模型 (VLM) - CSDN博客
2025年1月21日 · 自从谷歌提出ViT、Open AI发布CLIP，视觉语言模型（VLM）便成为了研究热点，凭借跨模态处理和理解能力，以及零样本学习方法，为CV领域带来了重大革新，今年CVPR'24自动驾驶挑战赛中，VLM也是参赛人数最多的赛道，围绕环境感知提升等，应用方案百花齐放，而 ...
csdn.net
https://blog.csdn.net › article › details
VLM和VLAM（VLA）相关介绍和发展历程 - CSDN博客
2024年12月28日 · 视觉语言模型（VLM）和视觉语言动作模型（VLA）是近年来在人工智能领域内取得显著进展的两个概念，它们的发展历程反映了多模态学习的进步，特别是在结合视觉、语言和机器人动作方面。_vlm
csdn.net
https://blog.csdn.net › article › details
视觉语言模型详解【VLM】 - CSDN博客
2024年5月20日 · 视觉语言模型（Vision-Language Models, VLMs）是能够同时处理和理解视觉（图像）和语言（文本）两种模态信息的人工智能模型。这种模型结合了计算机视觉和自然语言处理的技术，使得它们能够在视觉问答、图像描述生成、文本到图像搜索等复杂任务中表现出色。
zhihu.com
https://zhuanlan.zhihu.com
VLM综述：An introduction to Vision-Language Modeling（一）
在 transformers 问世之后，VLM领域也有不小的发展，在多模态的训练中，主要有以下四种方式：1、contrastive training，对比学习，缩小正样本的距离，拉大负样本的距离；2、 masking ，通过在给定的unmasked text，来重构masked image patches，同样的，也可以通过在给定的 ...
zhihu.com
https://zhuanlan.zhihu.com
VTimeLLM：具有时序感知能力的VideoLLM - 知乎 - 知乎专栏
2023年11月30日 · 论文名：VTimeLLM: Empower LLM to Grasp Video Moments. Arxiv： arxiv.org/abs/2311.1844. GIthub： github.com/huangb23/VTi. [Submitted on 30 Nov 2023] 传统的VLM结构上，增加了固定帧数输入，并通过三阶段预训练来使模型同时具有grounding和chat能力.
arxiv.org
https://arxiv.org › abs
NaVid: Video-based VLM Plans the Next Step for Vision-and …
2024年2月24日 · In this paper, we propose NaVid, a video-based large vision language model (VLM), to mitigate such a generalization gap. NaVid makes the first endeavor to showcase the capability of VLMs to achieve state-of-the-art level navigation performance without any maps, odometers, or depth inputs.
arxiv.org
https://arxiv.org › abs
LongVLM: Efficient Long Video Understanding via Large Language …
2024年4月4日 · To tackle this challenge, we introduce LongVLM, a simple yet powerful VideoLLM for long video understanding, building upon the observation that long videos often consist of sequential key events, complex actions, and camera movements.
cogvlm2-video.github.io
https://cogvlm2-video.github.io
CogVLM2-Video
CogVLM2-Video not only achieves state-of-the-art performance on public video understanding benchmarks but also excels in video captioning and temporal grounding, providing a powerful tool for subsequent tasks such as video generation and video summarization.
分页
- 1
- 2
- 3
- 4
- 下一页

Awesome-LLMs-for-Video-Understanding - GitHub

【VLM技术报告】《Hunyuan Video:用于大型视频生成模型的系统 …

一文深度看懂视觉语言模型 (VLM) - CSDN博客

VLM和VLAM（VLA）相关介绍和发展历程 - CSDN博客

视觉语言模型详解【VLM】 - CSDN博客

VLM综述：An introduction to Vision-Language Modeling（一）

VTimeLLM：具有时序感知能力的VideoLLM - 知乎 - 知乎专栏

NaVid: Video-based VLM Plans the Next Step for Vision-and …

LongVLM: Efficient Long Video Understanding via Large Language …

CogVLM2-Video