Mmlu - 搜索

约 95,600 个结果

在新选项卡中打开链接

时间不限

github.com
https://github.com › EleutherAI › lm-evaluation-harness › blob › main › lm…
lm-evaluation-harness/lm_eval/tasks/mmlu/README.md at main
mmlu: Original multiple-choice MMLU benchmark; mmlu_continuation: MMLU but with continuation prompts; mmlu_generation: MMLU generation; MMLU is the original benchmark ...
github.com
https://github.com › EleutherAI › lm-evaluation-harness › blob › main › lm…
lm-evaluation-harness/lm_eval/tasks/mmlu_pro/README.md at …
Title: MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Abstract: In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains.However, …
github.com
https://github.com › standardgalactic › mmlu
Measuring Massive Multitask Language Understanding - GitHub
This is the repository for Measuring Massive Multitask Language Understanding by Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt (ICLR 2021). This repository contains OpenAI API evaluation code, and the test is available for download here ...
github.com
https://github.com › aryopg › mmlu-redux
GitHub - aryopg/mmlu-redux
We fine-tune the Llama-3 (8B-Instruct) using LabelChaos datasets. To balance the distribution, where most instances are labelled as "correct", we adjusted the label distribution to: 0.1 (Wrong Ground Truth), 0.1 (Poor Question Clarity), 0.1 (No Correct Answers), 0.1 (Unclear Options), 0.1 (Multiple Correct Answers), and 0.5 (correct).The training involves 2048 steps, with a batch …
github.com
https://github.com › TIGER-AI-Lab › MMLU-Pro › blob › main › README…
MMLU-Pro/README.md at main · TIGER-AI-Lab/MMLU-Pro - GitHub
2024年10月10日 · We introduce MMLU-Pro, an enhanced benchmark designed to evaluate language understanding models across broader and more challenging tasks. Building on the Massive Multitask Language Understanding (MMLU) dataset, MMLU-Pro integrates more challenging, reasoning-focused questions and increases the answer choices per question from …
github.com
https://github.com › chaoswork › MMLU_Chinese › blob › master › REA…
MMLU_Chinese/README.md at master - GitHub
Measuring Massive Multitask Language Understanding | ICLR 2021 - MMLU_Chinese/README.md at master · chaoswork/MMLU_Chinese
github.com
https://github.com › VILA-Lab › Mobile-MMLU
GitHub - VILA-Lab/Mobile-MMLU: Mobile-MMLU: A Mobile …
Mobile-MMLU is a comprehensive benchmark designed to evaluate mobile-compatible Large Language Models (LLMs) across 80 diverse fields including Education, Healthcare, and Technology. Our benchmark is redefining mobile intelligence evaluation for a smarter future, with a focus on real-world ...
github.com
https://github.com › MoonshotAI › Moonlight
GitHub - MoonshotAI/Moonlight
Contribute to MoonshotAI/Moonlight development by creating an account on GitHub. Recently, the Muon optimizer based on matrix orthogonalization has demonstrated strong results in training small-scale language models, but the scalability to larger models has not been proven. We identify two crucial techniques for scaling up Muon: (1) adding weight decay and (2) carefully …
github.com
https://github.com › TIGER-AI-Lab › MMLU-Pro
TIGER-AI-Lab/MMLU-Pro - GitHub
We introduce MMLU-Pro, an enhanced benchmark designed to evaluate language understanding models across broader and more challenging tasks. Building on the Massive Multitask Language Understanding (MMLU) dataset, MMLU-Pro integrates more challenging, reasoning-focused questions and increases the answer choices per question from four to ten, significantly raising …
github.com
https://github.com › deepseek-ai
GitHub - deepseek-ai/DeepSeek-V3
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.

分页
- 1
- 2
- 3
- 4
- 下一页