R1 Model - 搜索

约 36,500,000 个结果

在新选项卡中打开链接

时间不限

github.com
https://github.com › deepseek-ai
deepseek-ai/DeepSeek-R1 - GitHub
2025年1月20日 · DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors.
ollama.com
https://ollama.com › library
deepseek-r1
Below are the models created via fine-tuning against several dense models widely used in the research community using reasoning data generated by DeepSeek-R1. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks.
deepseek.com
https://www.deepseek.com
DeepSeek
🎉 DeepSeek-R1 is now live and open source, rivaling OpenAI's Model o1. Available on web, app, and API. Click for details.
arxiv.org
https://arxiv.org › abs
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via ...
2025年1月22日 · DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors.
nvidia.com
https://build.nvidia.com › deepseek-ai
deepseek-r1 Model by Deepseek-ai | NVIDIA NIM
State-of-the-art, high-efficiency LLM excelling in reasoning, math, and coding.
github.com
https://github.com › om-ai-lab
om-ai-lab/VLM-R1: Solve Visual Understanding with Reinforced …
2025年2月15日 · In this project, we propose VLM-R1, a stable and generalizable R1-style Large Vision-Language Model. Specifically, for the task of Referring Expression Comprehension (REC), we trained Qwen2.5-VL using both R1 and SFT approaches. The results reveal that, on the in-domain test data, the performance of the SFT model is slightly lower than that of ...
microsoft.com
https://azure.microsoft.com › en-us › blog › deepsee
DeepSeek R1 is now available on Azure AI Foundry and GitHub
2025年1月29日 · DeepSeek R1 is now available in the model catalog on Azure AI Foundry and GitHub, joining a diverse portfolio of over 1,800 models, including frontier, open-source, industry-specific, and task-based AI models. As part of Azure AI Foundry, DeepSeek R1 is accessible on a trusted, scalable, and enterprise-ready platform, enabling businesses to ...
arxiv.org
https://arxiv.org › abs
Fin-R1: A Large Language Model for Financial Reasoning through ...
3 天之前 · Reasoning large language models are rapidly evolving across various domains. However, their capabilities in handling complex financial tasks still require in-depth exploration. In this paper, we introduce Fin-R1, a reasoning large language model specifically designed for the financial sector. Fin-R1 is built using a two-stage architecture, leveraging a financial reasoning dataset distilled and ...
geeksforgeeks.org
https://www.geeksforgeeks.org
DeepSeek-R1: Technical Overview of its Architecture and Innovations
2025年2月3日 · DeepSeek-R1, an innovative AI model from Chinese startup DeepSeek, combines a Mixture of Experts framework and advanced transformer design to achieve exceptional performance and cost-efficiency in handling complex reasoning tasks and long-context comprehension.
zhihu.com
https://zhuanlan.zhihu.com
deepseek-R1 解读 - 知乎 - 知乎专栏
Group Relative Policy Optimization：节省RL的训练成本，它放弃了通常与policy model大小相同的critic model，而是从group scores中估计baseline。具体而言，对于每个问题，GRPO从旧的policy中抽取一组输出，然后通过最大化以下目标来优化policy model：
某些结果已被删除
分页
- 1
- 2
- 3
- 4
- 下一页

deepseek-ai/DeepSeek-R1 - GitHub

deepseek-r1

DeepSeek

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via ...

deepseek-r1 Model by Deepseek-ai | NVIDIA NIM

om-ai-lab/VLM-R1: Solve Visual Understanding with Reinforced …

DeepSeek R1 is now available on Azure AI Foundry and GitHub

Fin-R1: A Large Language Model for Financial Reasoning through ...

DeepSeek-R1: Technical Overview of its Architecture and Innovations

deepseek-R1 解读 - 知乎 - 知乎专栏