
deepseek-ai/DeepSeek-R1 - GitHub
2025年1月20日 · DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors.
deepseek-r1
Below are the models created via fine-tuning against several dense models widely used in the research community using reasoning data generated by DeepSeek-R1. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks.
DeepSeek
🎉 DeepSeek-R1 is now live and open source, rivaling OpenAI's Model o1. Available on web, app, and API. Click for details.
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via ...
2025年1月22日 · DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors.
deepseek-r1 Model by Deepseek-ai | NVIDIA NIM
State-of-the-art, high-efficiency LLM excelling in reasoning, math, and coding.
om-ai-lab/VLM-R1: Solve Visual Understanding with Reinforced …
2025年2月15日 · In this project, we propose VLM-R1, a stable and generalizable R1-style Large Vision-Language Model. Specifically, for the task of Referring Expression Comprehension (REC), we trained Qwen2.5-VL using both R1 and SFT approaches. The results reveal that, on the in-domain test data, the performance of the SFT model is slightly lower than that of ...
DeepSeek R1 is now available on Azure AI Foundry and GitHub
2025年1月29日 · DeepSeek R1 is now available in the model catalog on Azure AI Foundry and GitHub, joining a diverse portfolio of over 1,800 models, including frontier, open-source, industry-specific, and task-based AI models. As part of Azure AI Foundry, DeepSeek R1 is accessible on a trusted, scalable, and enterprise-ready platform, enabling businesses to ...
Fin-R1: A Large Language Model for Financial Reasoning through ...
3 天之前 · Reasoning large language models are rapidly evolving across various domains. However, their capabilities in handling complex financial tasks still require in-depth exploration. In this paper, we introduce Fin-R1, a reasoning large language model specifically designed for the financial sector. Fin-R1 is built using a two-stage architecture, leveraging a financial reasoning dataset distilled and ...
DeepSeek-R1: Technical Overview of its Architecture and Innovations
2025年2月3日 · DeepSeek-R1, an innovative AI model from Chinese startup DeepSeek, combines a Mixture of Experts framework and advanced transformer design to achieve exceptional performance and cost-efficiency in handling complex reasoning tasks and long-context comprehension.
deepseek-R1 解读 - 知乎 - 知乎专栏
Group Relative Policy Optimization:节省RL的训练成本,它放弃了通常与policy model大小相同的critic model,而是从group scores中估计baseline。 具体而言,对于每个问题,GRPO从旧的policy中抽取一组输出,然后通过最大化以下目标来优化policy model:
- 某些结果已被删除