Benchmark performance of DeepSeek-R1 (IMAGE)
Caption
DeepSeek-R1 demonstrates strong performance across multiple educational and reasoning benchmarks. It achieves 79.8% Pass@1 on AIME 2024, slightly surpassing OpenAI-o1-1217. On MATH-500, it reaches 97.3%, matching OpenAI-o1-1217 and outperforming other models. Compared to DeepSeek-V3, DeepSeek-R1 shows significant improvements, scoring 90.8% on MMLU, 84.0% on MMLU-Pro, and 71.5% on GPQA Diamond. Although it performs slightly below OpenAI-o1-1217 on some benchmarks, it outperforms other closed-source models, highlighting its strength in educational tasks. The data for the comparison in the figure are derived from Figure 1 in the DeepSeek paper [4], where the benchmark performance of DeepSeek-R1 is presented.
Credit
The corresponding author Hao Chen
Usage Restrictions
Please cite the source.
License
Original content