Donut OCR - 搜索

约 21,400 个结果

在新选项卡中打开链接

时间不限

github.com
https://github.com › clovaai › donut
GitHub - clovaai/donut: Official Implementation of OCR-free …
2023年6月15日 · Donut does not require off-the-shelf OCR engines/APIs, yet it shows state-of-the-art performances on various visual document understanding tasks, such as visual document classification or information extraction (a.k.a. document parsing).
zhihu.com
https://zhuanlan.zhihu.com
Donut:不用OCR中间过程也能理解图片文档 - 知乎
我们引入了一个新的无OCR VDU模型，以解决由OCR依赖性引起的问题。我们的模型基于仅有Transformer的架构，被称为文档理解transformer (Donut)，这是继视觉和语言 [8,9,29]取得巨大成功后的产物。我们提出了一个包括简单架构和预训练方法的最小基线。
huggingface.co
https://huggingface.co › docs › transformers › model_doc › donut
Donut - Hugging Face
To address these issues, in this paper, we introduce a novel OCR-free VDU model named Donut, which stands for Document understanding transformer. As the first step in OCR-free VDU research, we propose a simple architecture (i.e., Transformer) with a pre-training objective (i.e., cross-entropy loss).
csdn.net
https://blog.csdn.net › wentinghappyday › article › details
Donut模型-图像文本阅读以及下游任务的多模态大模型-CSDN博客
2024年4月16日 · donut相比于传统的方式，直接用给一个E2E的模型取代解耦的OCR+下游语言模型。时间上更快，准确率也更高。结构很简单，一个视觉编码器和一个NLP的语言解码器构成。编码器和解码器都是基于Transformer的。在本文中作者采用swin-Transformer作为视觉的编码器，（swin_transformer代码地址 https://github.com/huggingface/pytorch-image-models/blob/v0.6.13/timm/models/swin_transformer.py）因为效果最好。编码器和解码器： …
arxiv.org
https://arxiv.org › abs
[2111.15664] OCR-free Document Understanding Transformer
2021年11月30日 · Through extensive experiments and analyses, we show a simple OCR-free VDU model, Donut, achieves state-of-the-art performances on various VDU tasks in terms of both speed and accuracy. In addition, we offer a synthetic data generator that helps the model pre-training to be flexible in various languages and domains.
zhihu.com
https://zhuanlan.zhihu.com
[论文] Donut: OCR-free Document Understanding Transformer
Donut模型通过直接映射原始输入图像到需要的输出，避免了OCR的依赖。文章还提供了一个合成数据生成器 SynthDoG，使模型的预训练过程可以适应不同的语言和领域。
hugging-face.cn
https://hugging-face.cn › docs › transformers › model_doc › donut
Donut - Hugging Face 机器学习平台
Donut 模型由 Geewook Kim、Teakgyu Hong、Moonbin Yim、Jeongyeon Nam、Jinyoung Park、Jinyeong Yim、Wonseok Hwang、Sangdoo Yun、Dongyoon Han 和 Seunghyun Park 在《OCR-free Document Understanding Transformer》中提出。 Donut 由图像 Transformer 编码器和自回归文本 Transformer 解码器组成，用于执行文档图像分类、表单理解和视觉问答等文档理解任务。论文摘要如下. 理解文档图像（例如，发票）是一项核心但具有挑战性的任务，因为它需要复 …
ecnu-ica22.github.io
https://ecnu-ica22.github.io › 不需要ocr的文档理解...
不需要OCR的文档理解Transformer：Donut - ECNU ICALK 702
2022年10月20日 · 通过基于深度学习的光学字符识别（OCR）的显著进步，大多数现有的 VDU 系统共享类似的架构，该架构依赖于单独的 OCR 模块从目标文档图像中提取文本信息。
github.com
https://github.com › WalysonGO › donut-ocr
GitHub - WalysonGO/donut-ocr: Official Implementation of OCR …
Donut 🍩, Do cume n t u nderstanding t ransformer, is a new method of document understanding that utilizes an OCR-free end-to-end Transformer model. Donut does not require off-the-shelf OCR engines/APIs, yet it shows state-of-the-art performances on various visual document understanding tasks, such as visual document classification or ...
segmentfault.com
https://segmentfault.com
无需OCR的文档理解Transformer模型Donut - SegmentFault 思否
2025年2月14日 · Donut是一个用于文档图像通用理解的端到端（即自包含）视觉文档理解（VDU）模型。 Donut的架构相当简单，由基于Transformer的视觉编码器和文本解码器模块组成。

某些结果已被删除
分页
- 1
- 2
- 3
- 4
- 下一页

GitHub - clovaai/donut: Official Implementation of OCR-free …

Donut:不用OCR中间过程也能理解图片文档 - 知乎

Donut - Hugging Face

Donut模型-图像文本阅读以及下游任务的多模态大模型-CSDN博客

[2111.15664] OCR-free Document Understanding Transformer

[论文] Donut: OCR-free Document Understanding Transformer

Donut - Hugging Face 机器学习平台

不需要OCR的文档理解Transformer：Donut - ECNU ICALK 702

GitHub - WalysonGO/donut-ocr: Official Implementation of OCR …

无需OCR的文档理解Transformer模型Donut - SegmentFault 思否