
LOCR: Location-Guided Transformer for Optical Character …
2024年3月4日 · While end-to-end OCR methods offer improved accuracy over layout-based approaches, they often grapple with significant repetition issues, especially with complex layouts in Out-Of-Domain (OOD) this http URL tackle this issue, we propose LOCR, a model that integrates location guiding into the transformer architecture during autoregression.
To tackle this issue, we propose LOCR1, a model that integrates loca- tion guiding into the transformer architecture during autoregression. We train the model on an original large-scale dataset comprising over 53M text-location pairs from 89K academic document pages, including bounding boxes for words, tables and mathematical symbols.
自然场景文本检测识别 - GTC - 知乎
最近为大家介绍了一些自然场景文本检测识别(STR)领域中的检测模型和端到端模型,今天我们要看的是一个识别模型—— GTC:Guided Training of CTC。 这篇文章为2020年2月由南洋理工大学和商汤科技的研究者共同发表…
LOCR: Location-Guided Transformer for Optical Character …
2025年3月19日 · We train the model on an original large-scale dataset comprising over 53M text-location pairs from 89K academic document pages, including bounding boxes for words, tables and mathematical symbols. LOCR adeptly handles various formatting elements and generates content in Markdown language.
GitHub - amazon-science/visfocus
In this paper, we present VisFocus, an OCR-free method designed to better exploit the vision encoder's capacity by coupling it directly with the language prompt. To do so, we replace the down-sampling layers with layers that receive the input prompt and allow highlighting relevant parts of the document, while disregarding others.
AI读论文新神器:多栏密集文字、中英图文混排文档都能读|旷视
2024年6月1日 · 此外,为了促进对文档细粒度理解的研究,作者还打造了一个中英双语的benchmark,已经开源了数据和评测代码,共包含以下9种任务: Page-level OCR Region-level OCR Line-level OCR Color-guided OCR Region-level translation Region-level summary In-document figure caption Multi-page multi-region OCR Cross ...
LOCR: Location-Guided Transformer for Optical Character …
We introduce LOCR, a location-guided document understanding model, together with an original large-scale dataset and an interactive OCR mode to align with human intention (see Figure 1 for an overview).
LOCR: Location-Guided Transformer for Optical Character …
本文旨在解决OCR在处理复杂排版的文档时重复率高的问题,提出了一种集成位置引导的transformer架构模型LOCR。 LOCR模型在自回归过程中集成位置引导,能够更好地处理复杂排版的文档,生成Markdown格式的内容,并在多个评价指标上优于现有方法。 本文使用了超过77M个文本-位置对的数据集进行训练,包括单词、表格和数学符号的边界框。 在arXiv数据集上,LOCR将重复率从4.4%降低到0.5%;在OOD量子物理文档和市场营销文档中,LOCR的重 …
GitHub - VinAIResearch/dict-guided: Dictionary-guided Scene …
We propose a novel dictionary-guided sense text recognition approach that could be used to improve many state-of-the-art models. We also introduce a new benchmark dataset (namely, VinText) for Vietnamese scene text recognition.
guidance-ocr/readme.md at main · hzauzxb/guidance-ocr · GitHub
与专用OCR模型相比,当前多模态大模型的识字能力相对较弱。 直接使用多模态大模型做视觉信息抽取往往会出现错字。 本项目使用OCR结果来引导多模态大模型输出,以期得到更高的信息抽取准确率。 算法详情见 博客 使用Qwen2VL-2B的例子
- 某些结果已被删除