
CPL: Critical Plan Step Learning Boosts LLM Generalization in …
2024年9月13日 · To address this, we propose searching within the action space on high-level abstract plans to enhance model generalization and introduce Critical Plan Step Learning (CPL), comprising: 1) searching on plan, using Monte Carlo Tree Search (MCTS) to explore diverse plan steps in multi-step reasoning tasks, and 2) learning critical plan steps ...
如何泛化AI的深度推理能力? - Microsoft Research
2024年10月22日 · 微软亚洲研究院的最新研究关键计划步骤学习 CPL(Critical Plan Step Learning),旨在将强化学习扩展到更广泛、更复杂的问题场景,并取得了突破性进展。 CPL 通过在自我生成的高层次抽象计划上进行强化学习,不仅提升了模型在数学推理任务上的表现,还在多 …
GitHub - tianlwang/CPL-Reasoning: CPL: Critical Plan Step …
CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks Environment Setup Create a Python virtual environment and install the dependencies:
- [PDF]
Abstract - arXiv.org
In this section, we introduce our Critical Plan Step Learning (CPL), it boosts model performance via iterative process over plan-based search and step-level preference learning. We first introduce our plan-based MCTS, which enables the LLM to explore diverse plan strategies in the vast search space. Next, we present our Step-APO in detail to ...
Critical Planning Step Learning: Enhancing LLM Generalization in ...
2024年9月17日 · A novel method called Critical Planning Step Learning (CPL) that leverages Monte Carlo Tree Search (MCTS) to explore planning steps in multi-step reasoning tasks,...
关键规划步骤学习提升大语言模型在推理任务中的泛化能力
通过引入关键规划步骤学习(CPL)和逐步优势偏好优化(Step-APO),利用蒙特卡罗树搜索(MCTS)探索多步骤推理任务中的规划步骤,从而改善了模型的推理能力。
CPL: Critical Planning Step Learning Boosts LLM ... - NASA/ADS
To tackle this challenge, we introduce Critical Planning Step Learning (CPL), which leverages Monte Carlo Tree Search (MCTS) to explore diverse planning steps in multi-step reasoning tasks. Based on long-term outcomes, CPL learns step-level planning preferences to improve the model's planning capabilities and, consequently, its general ...
CPL: Critical Planning Step Learning Boosts LLM Generalization in ...
2024年10月1日 · Critical Planning Step Learning (CPL): CPL leverages Monte Carlo Tree Search (MCTS) to explore diverse planning steps in multi-step reasoning tasks. By analyzing the long-term outcomes of these planning steps, CPL learns which intermediate steps are most critical for effective planning.
CPL: Critical Planning Step Learning Boosts LLM Generalization in ...
To tackle this challenge, we introduce Critical Planning Step Learning (CPL), which leverages Monte Carlo Tree Search (MCTS) to explore diverse planning steps in multi-step reasoning tasks. Based on long-term outcomes, CPL learns step-level planning preferences to improve the model's planning capabilities and, consequently, its general ...
CPL: Critical Planning Step Learning Boosts LLM Generalization in ...
2024年9月13日 · To tackle this challenge, we introduce Critical Planning Step Learning (CPL), which leverages Monte Carlo Tree Search (MCTS) to explore diverse planning steps in multi-step reasoning tasks. Based on long-term outcomes, CPL learns step-level planning preferences to improve the model's planning capabilities and, consequently, its general ...