WritingQwen (Alibaba Cloud)Qwen (Alibaba Cloud)published Aug 8, 2024seen 6d

Introducing Qwen2-Math

Open original ↗

Captured source

source ↗
published Aug 8, 2024seen 6dcaptured 3dhttp 200method plain

Introducing Qwen2-Math | Qwen

We have a new blog! View this page at qwen.ai . This page will automatically redirect in 5 seconds. If you are not redirected automatically, please click the button below. Go Now

Introducing Qwen2-Math August 8, 2024 · 28 min · 5758 words · Qwen Team | Translations: 简体中文

GITHUB HUGGING FACE MODELSCOPE DISCORD 🚨 This model mainly supports English. We will release bilingual (English and Chinese) math models soon.

Introduction # Over the past year, we have dedicated significant effort to researching and enhancing the reasoning capabilities of large language models, with a particular focus on their ability to solve arithmetic and mathematical problems. Today, we are delighted to introduce a series of math-specific large language models of our Qwen2 series, Qwen2-Math and Qwen2-Math-Instruct-1.5B/7B/72B. Qwen2-Math is a series of specialized math language models built upon the Qwen2 LLMs, which significantly outperforms the mathematical capabilities of open-source models and even closed-source models (e.g., GPT-4o). We hope that Qwen2-Math can contribute to the community for solving complex mathematical problems. We evaluate our math-specific models on a series of math benchmarks. The results below demonstrate that our largest math-specific model Qwen2-Math-72B-Instruct outperforms the state-of-the-art models, including GPT-4o, Claude-3.5-Sonnet, Gemini-1.5-Pro, and Llama-3.1-405B. Qwen2-Math: Base Models # The base models of Qwen2-Math are initialized with Qwen2-1.5B/7B/72B, and then pretrained on a meticulously designed Mathematics-specific Corpus. This corpus contains large-scale high-quality mathematical web texts, books, codes, exam questions, and mathematical pre-training data synthesized by Qwen2. We evaluate our Qwen2-Math base models on three widely used English math benchmarks GSM8K, Math, and MMLU-STEM. In addition, we also evaluate three Chinese math benchmarks CMATH, GaoKao Math Cloze, and GaoKao Math QA. All evaluations are tested with few-shot chain-of-thought prompting. Qwen2-Math-Instruct: Instruction-Tuned Models # We first trained a math-specific reward model based on Qwen2-Math-72B. We then combined this dense reward signal with a binary signal indicating whether the model answered correctly. This combined signal is used as supervision for constructing the SFT data through Rejection Sampling and also in the reinforcement learning with Group Relative Policy Optimization (GRPO) after SFT. We evaluate Qwen2-Math-Instruct on mathematical benchmarks in both English and Chinese. In addition to the widely-used benchmarks, such as GSM8K and Math, we also involve more exams that are much challenging to fully inspect the capabilities of Qwen2-Math-Instruct, such as OlympiadBench, CollegeMath, GaoKao, AIME2024, and AMC2023. For Chinese mathematical benchmarks, we use CMATH, Gaokao (Chinese college entrance examination 2024), and CN Middle School 24 (China High School Entrance Examination 2024). We report greedy , Maj@8 and RM@8 performance on all benchmarks in the zero-shot setting, except for the multi-choice benchmarks (including MMLU STEM and multiple-choice problems in GaoKao and CN Middle School 24) with a 5-shot setting. Qwen2-Math-Instruct achieves the best performance among models of the same size, with RM@8 outperforming Maj@8, particularly in the 1.5B and 7B models. This demonstrates the effectiveness of our math reward model. In more complex mathematical competition evaluations such as AIME 2024 and AMC 2023, Qwen2-Math-Instruct also performs well across various settings, including Greedy, Maj@64, RM@64, and RM@256. Case Study # Here we list some test cases, which include some IMO math problems. From the experimental results and case study, we find that Qwen2-Math is capable of solving simple math competition problems. Feel free to click the expandable blocks to check the cases! All the solutions are generated by our model without modification. Please note that we do not guarantee the correctness of the claims in the process.

Problem From IMO Shortlist 2002 Problem What is the smallest positive integer $t$ such that there exist integers $x_1,x_2,\ldots,x_t$ with $$ x^3_1+x^3_2+,\ldots,+x^3_t=2002^{2002}, $$ Qwen2-Math-72B-Instruct Solution To determine the smallest positive integer $ t $ such that there exist integers $ x_1, x_2, \ldots, x_t $ satisfying the equation $$ x_1^3 + x_2^3 + \cdots + x_t^3 = 2002^{2002}, $$ we start by analyzing the properties of cubes and their sums. We need to find the smallest $ t $ for which the sum of $ t $ cubes equals $ 2002^{2002} $. First, we note that $ 2002 $ can be factored into prime factors: $$ 2002 = 2 \times 7 \times 11 \times 13. $$ Thus, $$ 2002^{2002} = (2 \times 7 \times 11 \times 13)^{2002} = 2^{2002} \times 7^{2002} \times 11^{2002} \times 13^{2002}. $$ Next, we consider the properties of cubes modulo 9. Cubes modulo 9 can only be congruent to $ 0, 1, $ or $ -1 $. This is because: $$ \begin{aligned} 0^3 &\equiv 0 \pmod{9}, \\\ 1^3 &\equiv 1 \pmod{9}, \\\ 2^3 &\equiv 8 \equiv -1 \pmod{9}, \\\ 3^3 &\equiv 27 \equiv 0 \pmod{9}, \\\ 4^3 &\equiv 64 \equiv 1 \pmod{9}, \\\ 5^3 &\equiv 125 \equiv -1 \pmod{9}, \\\ 6^3 &\equiv 216 \equiv 0 \pmod{9}, \\\ 7^3 &\equiv 343 \equiv 1 \pmod{9}, \\\ 8^3 &\equiv 512 \equiv -1 \pmod{9}. \end{aligned} $$ Now, we need to determine $ 2002^{2002} \mod 9 $. First, we find $ 2002 \mod 9 $: $$ 2002 \div 9 = 222 \text{ remainder } 4 \quad \Rightarrow \quad 2002 \equiv 4 \pmod{9}. $$ Thus, $$ 2002^{2002} \equiv 4^{2002} \pmod{9}. $$ Next, we use Euler’s theorem, which states that $ a^{\phi(n)} \equiv 1 \pmod{n} $ for $ \gcd(a, n) = 1 $. Here, $ \phi(9) = 6 $, so: $$ 4^6 \equiv 1 \pmod{9}. $$ We can reduce the exponent $ 2002 \mod 6 $: $$ 2002 \div 6 = 333 \text{ remainder } 4 \quad \Rightarrow \quad 2002 \equiv 4 \pmod{6}. $$ Thus, $$ 4^{2002} \equiv 4^4 \pmod{9}. $$ $$ 4^4 = 256. $$ Finally, we find $ 256 \mod 9 $: $$ 256 \div 9 = 28 \text{ remainder } 4 \quad \Rightarrow \quad 256 \equiv 4 \pmod{9}. $$ Therefore, $$ 2002^{2002} \equiv 4 \pmod{9}. $$ We need to find the smallest $ t $ such that the sum of $ t $ cubes is congruent to 4 modulo 9. The possible sums of cubes modulo 9 are: $$ \begin{aligned} 0 &\equiv 0 \pmod{9}, \\\ 1 &\equiv 1 \pmod{9}, \\\ -1 &\equiv 8 \pmod{9}, \\\ 1 + 1 &\equiv 2 \pmod{9}, \\\ 1 + (-1) &\equiv 0 \pmod{9}, \\\ -1 + (-1) &\equiv -2 \equiv 7 \pmod{9}, \\\ 1 + 1 + 1 &\equiv 3 \pmod{9}, \\\ 1 + 1 + (-1) &\equiv 1...

Excerpt shown — open the source for the full document.

Notability

notability 8.0/10

Notable model release from major lab