Gpt2 Paper - 搜索

约 1,260,000 个结果

在新选项卡中打开链接

时间不限

openai.com
https://cdn.openai.com › better-language-models › ...
[PDF]
Language Models are Unsupervised Multitask Learners - OpenAI
Our largest model, GPT-2, is a 1.5B parameter Transformer that achieves state of the art results on 7 out of 8 tested lan-guage modeling datasets in a zero-shot setting but still underfits WebText. Samples from the model reflect these improvements and contain co-herent paragraphs of text.
github.com
https://github.com › openai
GitHub - openai/gpt-2: Code for the paper "Language Models …
Code and models from the paper "Language Models are Unsupervised Multitask Learners". You can read about GPT-2 and its staged release in our original blog post, 6 month follow-up post, and final post. We have also released a dataset for researchers to study their behaviors.
arxiv.org
https://arxiv.org › abs
[2005.14165] Language Models are Few-Shot Learners - arXiv.org
2020年5月28日 · GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
paperswithcode.com
https://paperswithcode.com › method
GPT-2 Explained - Papers With Code
GPT-2 is a Transformer architecture that was notable for its size (1.5 billion parameters) on its release. The model is pretrained on a WebText dataset - text from 45 million website links.
arxiv.org
https://arxiv.org › abs
[2401.12181] Universal Neurons in GPT2 Language Models
2024年1月22日 · In this work, we study the universality of individual neurons across GPT2 models trained from different initial random seeds, motivated by the hypothesis that universal neurons are likely to be interpretable.
openai.com
https://cdn.openai.com › research-covers › language...
[PDF]
Improving Language Understanding by Generative Pre …
In this paper, we explore a semi-supervised approach for language understanding tasks using a combination of unsupervised pre-training and supervised fine-tuning. Our goal is to learn a universal representation that transfers with little adaptation to a …
paperswithcode.com
https://paperswithcode.com › paper › language-models...
Language Models are Unsupervised Multitask Learners
Our largest model, GPT-2, is a 1.5B parameter Transformer that achieves state of the art results on 7 out of 8 tested language modeling datasets in a zero-shot setting but still underfits …
arxiv.org
https://arxiv.org › abs
[2412.12351] Krony-PT: GPT2 compressed with Kronecker …
2024年12月16日 · We introduce Krony-PT, a compression technique of GPT2 \citep {radford2019language} based on Kronecker Products. We specifically target the MLP layers of each transformer layer, and systematically compress the …
semanticscholar.org
https://www.semanticscholar.org › paper › Language...
Language Models are Unsupervised Multitask Learners
Language Models are Unsupervised Multitask Learners
jalammar.github.io
https://jalammar.github.io
The Illustrated GPT-2 (Visualizing Transformer Language Models)
2019年8月12日 · The GPT2 was, however, a very large, transformer-based language model trained on a massive dataset. In this post, we’ll look at the architecture that enabled the model to produce its results. We will go into the depths of its self-attention layer.

某些结果已被删除
分页
- 1
- 2
- 3
- 4
- 下一页

Language Models are Unsupervised Multitask Learners - OpenAI

GitHub - openai/gpt-2: Code for the paper "Language Models …

[2005.14165] Language Models are Few-Shot Learners - arXiv.org

GPT-2 Explained - Papers With Code

[2401.12181] Universal Neurons in GPT2 Language Models

Improving Language Understanding by Generative Pre …

Language Models are Unsupervised Multitask Learners

[2412.12351] Krony-PT: GPT2 compressed with Kronecker …

Language Models are Unsupervised Multitask Learners

The Illustrated GPT-2 (Visualizing Transformer Language Models)