Guided OCR - 搜索

约 2,110,000 个结果

在新选项卡中打开链接

时间不限

arxiv.org
https://arxiv.org › abs
LOCR: Location-Guided Transformer for Optical Character …
2024年3月4日 · While end-to-end OCR methods offer improved accuracy over layout-based approaches, they often grapple with significant repetition issues, especially with complex layouts in Out-Of-Domain (OOD) this http URL tackle this issue, we propose LOCR, a model that integrates location guiding into the transformer architecture during autoregression.
aclanthology.org
https://aclanthology.org
[PDF]
LOCR: Location-Guided Transformer for Optical Character …
To tackle this issue, we propose LOCR1, a model that integrates loca- tion guiding into the transformer architecture during autoregression. We train the model on an original large-scale dataset comprising over 53M text-location pairs from 89K academic document pages, including bounding boxes for words, tables and mathematical symbols.
zhihu.com
https://zhuanlan.zhihu.com
自然场景文本检测识别 - GTC - 知乎
最近为大家介绍了一些自然场景文本检测识别（STR）领域中的检测模型和端到端模型，今天我们要看的是一个识别模型—— GTC：Guided Training of CTC。这篇文章为2020年2月由南洋理工大学和商汤科技的研究者共同发表…
缺失:
- OCR
必须包含:
- OCR
aclanthology.org
https://aclanthology.org
LOCR: Location-Guided Transformer for Optical Character …
2025年3月19日 · We train the model on an original large-scale dataset comprising over 53M text-location pairs from 89K academic document pages, including bounding boxes for words, tables and mathematical symbols. LOCR adeptly handles various formatting elements and generates content in Markdown language.
github.com
https://github.com › amazon-science › visfocus
GitHub - amazon-science/visfocus
In this paper, we present VisFocus, an OCR-free method designed to better exploit the vision encoder's capacity by coupling it directly with the language prompt. To do so, we replace the down-sampling layers with layers that receive the input prompt and allow highlighting relevant parts of the document, while disregarding others.
qbitai.com
https://www.qbitai.com
AI读论文新神器：多栏密集文字、中英图文混排文档都能读｜旷视
2024年6月1日 · 此外，为了促进对文档细粒度理解的研究，作者还打造了一个中英双语的benchmark，已经开源了数据和评测代码，共包含以下9种任务： Page-level OCR Region-level OCR Line-level OCR Color-guided OCR Region-level translation Region-level summary In-document figure caption Multi-page multi-region OCR Cross ...
arxiv.org
https://arxiv.org › html
LOCR: Location-Guided Transformer for Optical Character …
We introduce LOCR, a location-guided document understanding model, together with an original large-scale dataset and an interactive OCR mode to align with human intention (see Figure 1 for an overview).
baai.ac.cn
https://hub.baai.ac.cn › paper
LOCR: Location-Guided Transformer for Optical Character …
本文旨在解决OCR在处理复杂排版的文档时重复率高的问题，提出了一种集成位置引导的transformer架构模型LOCR。 LOCR模型在自回归过程中集成位置引导，能够更好地处理复杂排版的文档，生成Markdown格式的内容，并在多个评价指标上优于现有方法。本文使用了超过77M个文本-位置对的数据集进行训练，包括单词、表格和数学符号的边界框。在arXiv数据集上，LOCR将重复率从4.4%降低到0.5%；在OOD量子物理文档和市场营销文档中，LOCR的重 …
github.com
https://github.com › VinAIResearch › dict-guided
GitHub - VinAIResearch/dict-guided: Dictionary-guided Scene …
We propose a novel dictionary-guided sense text recognition approach that could be used to improve many state-of-the-art models. We also introduce a new benchmark dataset (namely, VinText) for Vietnamese scene text recognition.
github.com
https://github.com › hzauzxb › guidance-ocr › blob › main › readme.md
guidance-ocr/readme.md at main · hzauzxb/guidance-ocr · GitHub
与专用OCR模型相比，当前多模态大模型的识字能力相对较弱。直接使用多模态大模型做视觉信息抽取往往会出现错字。本项目使用OCR结果来引导多模态大模型输出，以期得到更高的信息抽取准确率。算法详情见博客使用Qwen2VL-2B的例子
某些结果已被删除
分页
- 1
- 2
- 3
- 4
- 下一页

LOCR: Location-Guided Transformer for Optical Character …

LOCR: Location-Guided Transformer for Optical Character …

自然场景文本检测识别 - GTC - 知乎

缺失:

必须包含:

LOCR: Location-Guided Transformer for Optical Character …

GitHub - amazon-science/visfocus

AI读论文新神器：多栏密集文字、中英图文混排文档都能读｜旷视

LOCR: Location-Guided Transformer for Optical Character …

LOCR: Location-Guided Transformer for Optical Character …

GitHub - VinAIResearch/dict-guided: Dictionary-guided Scene …

guidance-ocr/readme.md at main · hzauzxb/guidance-ocr · GitHub