Preview 5B - 搜索

约 3,010,000 个结果

在新选项卡中打开链接

时间不限

github.com
https://github.com › agentica-project › deepscaler
GitHub - agentica-project/deepscaler: Democratizing …
[2025/02/10] We release DeepScaleR-1.5B-Preview, a 1.5B model that surpasses O1-Preview and achieves 43.1% Pass@1 on AIME. We achieve this by iteratively scaling Deepseek's …
github.com
https://github.com › RUCAIBox › Slow_Thinking_with_LLMs
RUCAIBox/Slow_Thinking_with_LLMs - GitHub
🚀 STILL-3-1.5B-Preview: A 1.5B slow-thinking reasoning model continuously evolving through RL. To delve deeper into the potential of reinforcement learning, we applied this training method to …
nvidia.com
https://build.nvidia.com › nvidia
cosmos-1.0-autoregressive-5b Model by NVIDIA | NVIDIA NIM
cosmos-1.0-autoregressive-5b PREVIEW Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.
csdn.net
https://blog.csdn.net › ViniJack › article › details
关于“DeepScaleR：通过扩展强化学习，用1.5B模型超越O1-Preview…
2025年2月22日 · 在这篇博客中，我们将逐步揭示如何利用强化学习将一个小型模型转变为强大的推理模型。我们推出的DeepScaleR-1.5B-Preview模型，通过4万个高质量数学问题进行训 …
ollama.com
https://ollama.com › library › deepscaler
DeepScaleR - ollama.com
2025年2月12日 · DeepScaleR-1.5B-Preview is a language model fine-tuned from DeepSeek-R1-Distilled-Qwen-1.5B using distributed reinforcement learning (RL) to scale up to long context …

csdn.net
https://blog.csdn.net › article › details
LLMs之DeepSeek-R1：DeepScaleR(民主化大型语言模型的强化学习/DeepScaleR-1.5B-Preview…
2025年2月23日 · DeepScaleR-1.5B-Preview 是一个基于 DeepSeek-R1-Distilled-Qwen-1.5B 微调的语言模型，它使用了分布式强化学习 (RL) 技术，并通过迭代式地增加上下文长度来提升 …
jdon.com
https://www.jdon.com
4500美元重现DeepSeek：性能超o1-preview - 极道 - 解道jdon
我们推出了 DeepScaleR-1.5B-Preview，这是一个从 Deepseek-R1-Distilled-Qwen-1.5B 微调而来的语言模型，使用了简单的强化学习（RL）。它在 AIME2024 上取得了惊人的 43.1% 的 …
github.com
https://github.com › giterinhub
giterinhub/DeepScaleR-1.5B-Preview - GitHub
DeepScaleR-1.5B-Preview is a language model fine-tuned from DeepSeek-R1-Distilled-Qwen-1.5B using distributed reinforcement learning (RL) to scale up to long context lengths. The …
baidu.com
https://baijiahao.baidu.com
4500美元复刻DeepSeek，1.5B战胜o1-preview用RL！训练细节全 …
2025年2月11日 · 近日，来自UC伯克利的研究团队基于Deepseek-R1-Distilled-Qwen-1.5B，通过简单的强化学习（RL）微调，得到了全新的DeepScaleR-1.5B-Preview。在AIME2024基准 …
junki.cn
https://www.junki.cn › archives
号称 1.5B 战胜 o1-preview 只需使用 RL，DeepScaleR 模型部署体验
2025年2月18日 · 近日，来自 UC 伯克利的研究团队基于 Deepseek-R1-Distilled-Qwen-1.5B，通过简单的强化学习（RL）微调，得到了全新的 DeepScaleR-1.5B-Preview。目前，研究团队已 …
某些结果已被删除
分页
- 1
- 2
- 3
- 4
- 下一页

GitHub - agentica-project/deepscaler: Democratizing …

RUCAIBox/Slow_Thinking_with_LLMs - GitHub

cosmos-1.0-autoregressive-5b Model by NVIDIA | NVIDIA NIM

关于“DeepScaleR：通过扩展强化学习，用1.5B模型超越O1-Preview…

DeepScaleR - ollama.com

LLMs之DeepSeek-R1：DeepScaleR(民主化大型语言模型的强化学习/DeepScaleR-1.5B-Preview…

4500美元重现DeepSeek：性能超o1-preview - 极道 - 解道jdon

giterinhub/DeepScaleR-1.5B-Preview - GitHub

4500美元复刻DeepSeek，1.5B战胜o1-preview用RL！训练细节全 …

号称 1.5B 战胜 o1-preview 只需使用 RL，DeepScaleR 模型部署体验