Vit OCR - 搜索

约 61,000 个结果

在新选项卡中打开链接

时间不限

github.com
https://github.com › lukas-blecher › LaTeX-OCR
GitHub - lukas-blecher/LaTeX-OCR: pix2tex: Using a ViT to …
GitHub - lukas-blecher/LaTeX-OCR: pix2tex: Using a ViT to convert images of equations into LaTeX code. Cannot retrieve latest commit at this time. The goal of this project is to create a learning based system that takes an image of a math formula and returns corresponding LaTeX code. To run the model you need Python 3.7+.
arxiv.org
https://arxiv.org › abs
Vision Transformer for Fast and Efficient Scene Text Recognition
2021年5月18日 · In this paper we propose ViTSTR, an STR with a simple single stage model architecture built on a compute and parameter efficient vision transformer (ViT). On a comparable strong baseline method such as TRBA with accuracy of 84.3%, our small ViTSTR achieves a competitive accuracy of 82.6% (84.2% with data augmentation) at 2.4x speed up, using ...
zhihu.com
https://zhuanlan.zhihu.com
TrOCR：基于预训练模型的Transformer光学字符识别 - 知乎
光学字符识别（OCR）是将打印、手写或打印文本的图像电子或机械转换为机器编码文本，无论是从扫描文档、文档照片、场景照片还是从叠加在图像上的字幕文本。通常，OCR系统包括两个主要模块：文本检测模块和文本识别模块。文本检测旨在定位文本图像中的所有文本块，无论是单词级还是文本行级。文本检测任务通常被认为是一个对象检测问题，其中可以应用传统的对象检测模型，如YoLOv5和DBNet（Liao等人，2019）。同时，文本识别旨在理解文本图像内容，并将 …
github.com
https://github.com › roatienza › deep-text-recognition-benchmark
roatienza/deep-text-recognition-benchmark - GitHub
ViTSTR is a simple single-stage model that uses a pre-trained Vision Transformer (ViT) to perform Scene Text Recognition (ViTSTR). It has a comparable accuracy with state-of-the-art STR models although it uses significantly less number of parameters and FLOPS.
csdn.net
https://blog.csdn.net › article › details
万字分享多模态大模型OCR工作 OCR VLM - CSDN博客
2024年12月4日 · 提出了视觉词表（vision vocabulary），做法是通过一个小的语言模型作为解码器对CLIP-like ViT进行再训练（在上一篇文章中也有类似做法），利用OCR数据作为正例，自然图像作为负例，得到一个具有新“词表”的ViT；

zhihu.com
https://zhuanlan.zhihu.com
ECCV 2022 | MGP-STR：一种基于视觉Transformer的多粒度文字识 …
2022年11月4日 · 论文提出了一种简洁高效的文字识别方法MGP-STR，该方法直接使用Vision Transformer (ViT)进行特征提取，专门为文字识别任务设计了自适应寻址聚合模块A³进行解码，并利用多粒度预测来隐式引入语言信息，不需要搭建额外语言模型。实验结果表明，MGP-STR的识 …
csdn.net
https://blog.csdn.net › Python_cocola › article › details
华为诺亚方舟实验室开源ViTLP文档大模型：预训内置OCR、版式 …
2024年12月3日 · ViTLP，一款颠覆性的视觉引导生成式文本-布局预训练模型，由华为诺亚方舟实验室重磅开源！无需OCR引擎，ViTLP直接从图像中学习文本和布局，轻松处理任意长度文档，并生成可解释的视觉定位信息。在OCR和VDU任务中表现卓越，ViTLP开启文档智能新纪元！_vitlp
csdn.net
https://blog.csdn.net › article › details
目标检测算法-transformer系列-ViT（Vision Transformer）（附论 …
2025年1月6日 · 这份资源是一段 Python 代码，聚焦于利用 Vision Transformer（ViT）算法开展图像分类任务。其核心功能是在 CIFAR-10 或 CIFAR-100 数据集上训练 ViT 模型，以实现对图像类别的准确判断。代码中，通过argparse模块...
github.com
https://github.com › arcta › ocr-cnn-vit
arcta/ocr-cnn-vit: CNN vs ViT comparison in OCR tasks - GitHub
OCR: ViT vs. CNN In this experiment we compare ViT and CNN based UNet in document-page layout understanding tasks: different attention module architectures; residual and skip-connections. We used the agent with dynamic visual field (can zoom and rotate): data generated on …
medium.com
https://medium.com › @mehmet.cagri.calpur › ocr-with-vision...
OCR with Vision Transformers. Vision transformers are ... - Medium
2024年1月17日 · Vision transformers are disrupting the conventional visual tasks such as classification, segmentation, image-to-text caption generation. I am particularly interested in one of the newer models,...
某些结果已被删除
分页
- 1
- 2
- 3
- 4
- 下一页

GitHub - lukas-blecher/LaTeX-OCR: pix2tex: Using a ViT to …

Vision Transformer for Fast and Efficient Scene Text Recognition

TrOCR：基于预训练模型的Transformer光学字符识别 - 知乎

roatienza/deep-text-recognition-benchmark - GitHub

万字分享多模态大模型OCR工作 OCR VLM - CSDN博客

ECCV 2022 | MGP-STR：一种基于视觉Transformer的多粒度文字识 …

华为诺亚方舟实验室开源ViTLP文档大模型：预训内置OCR、版式 …

目标检测算法-transformer系列-ViT（Vision Transformer）（附论 …

arcta/ocr-cnn-vit: CNN vs ViT comparison in OCR tasks - GitHub

OCR with Vision Transformers. Vision transformers are ... - Medium