DeepMind gave AI 2 million math problems, and the result was not as good as a calculator hahahahaha

Latest update time：2021-09-04 09:13

Reads：

Computing Li Guo Yipu from Aofei Temple
Quantum Bit Report | Public Account QbitAI

Where is there no mathematics in life?

This is a mental arithmetic problem on an off-road vehicle.

This is a word problem that has been messed up by kids.

Uh, this is someone else’s math Olympiad problem.

△ Romanian Mathematics Masters, a question that completely defeated the Chinese team

However, it is 2020 and almost every day there is news about "AI surpassing humans". So, if we give neural networks the math homework we did in middle school, can they do it?

In another corner of the world, DeepMind has read your mind and given neural networks a set of 2 million math problems . The data set has been released.

Arithmetic, algebra, probability theory, calculus... no matter if it is a formula or a problem described in human language , as long as it can be written in text.

For example, this permutation and combination (Chinese translation):

Question: From this string of letters qqqkkklkqkkk, pick three out without putting them back, what is the probability of getting qql?

Answer: 1/110.

For example, this composite function:

Question: Find g(h(f(x))), f(x) = 2x + 3, g(x) = 7x − 4, h(x) = −5x − 8.

Answer: −70x − 165

These are all math test questions for AI.

As the news came out, people cheered : God has his own way, no one can escape it .

2 million questions, what are the types of questions?

Why do I suddenly want to know whether AI is good at mathematics?

DeepMind says that AI and humans learn mathematics differently.

We rely mainly on reasoning, learning, and the use of rules and operational symbols, while AI relies on experience and evidence.

Let’s take a familiar example, the machine learning interview joke.

Examiner: What are your strengths?

Me: I’m a machine learning expert.

Examiner: What is 9+10?

Me: 3.

Examiner: That’s too far off. It’s 19.

Me: 16.

Examiner: Wrong, it is 19.

Me: 18.

Examiner: No, 19.

Me: 19.

Examiner: You are admitted.

AI's answer is an inductive answer.

DeepMind believes that without human reasoning ability, it would be difficult for AI to learn mathematics. However, the field of mathematics is very important for the study of neural network architecture.

So the team wanted to see what it would be like to use induction to learn mathematics.

What is the scope of the exam?

The original sample was mathematics courses in public schools for children under 16 (presumably in the UK).

The team expanded the syllabus to include the following aspects:

One is algebra, such as solving a system of linear equations with two variables, finding the roots of polynomials, and finding the general term of a sequence.

The second is arithmetic, such as the four arithmetic operations, calculating formulas with a specific order (such as those with brackets), simplifying expressions with square roots, etc.

The third is calculus and polynomial differentiation.

Fourth is comparison, judging the size of numbers, finding the number closest to a certain number from a list of numbers, etc.

The fifth is measurement, such as converting between different units of length, calculating time intervals, etc.

Sixth is numbers, finding divisors, rounding, digits of integers, factorization, prime numbers and composite numbers, etc.

Seven is polynomial operations, combining similar terms, etc.

Eight is probability, such as the probability of picking red, red and white from a pile of red, white and blue balls.

The 2 million question bank is generated by an algorithm using the textbook samples for children under 16 years old mentioned earlier.

Therefore, the above-mentioned problems can be combined together organically. This is interesting because many mathematical laws are also composed of various concepts.

Let’s take the same example again. Composite functions and derivatives are combined into composite function derivatives. Do you still remember what we learned in high school?

[f(g(x))]'=f'(g(x))g'(x)

First find the outer layer, then find the inner layer, and multiply them together.

Wait for the AI's answer.

What players are there?

In the exam held by DeepMind, there were two candidates, one was a recurrent neural network ( RNN ) and the other was a Transformer .

The RNN side sent LSTM (Long Short-Term Memory), and two models took the test.

The first one is relatively simple. We feed the question directly to LSTM, one character at a time, and the model outputs one character.

The second one is more complex (as shown below), which is an encoder + decoder with an attention mechanism. This model is very common in machine translation today: it does not necessarily understand and calculate in the order of character input, for example, 8/(1+3) must first calculate 1+3.

The second candidate is the Transformer model. As a seq2seq model, it performs well in machine translation.

Let's take a look at its structure first:

There is an encoder that converts a sequence of math problem vectors into a sequence of the same length, and a decoder that generates the predicted answer.

How are the test results?

Compared to LSTM, the Transformer model is superior, and both have a similar number of parameters.

Neither network does much “algorithmic reasoning,” but the Transformer model is better suited for learning math than the LSTM architecture:

1. With the same number of parameters, more calculations can be done;

2. It has a shallower architecture and better gradient propagation;

3. It has a sequential internal memory, which makes it easier to set mathematical objects such as digital sequences.

For AI, the simplest math problems are dealing with decimals, integers, and comparisons of sizes, as well as problems composed of different modules, such as:

Given k(c) = -611*c + 2188857, is k(-103)! = 2251790 correct? (No)

Or this:

Arrange -139/4, 40.8, -555, 607 from smallest to largest.

The most difficult math problems are more theoretical, such as determining whether a number is prime and factoring.

However, the Transformer model can still give answers that seem somewhat reliable.

For example, the answer to decomposing 235232673 into several prime numbers is 3, 11, 13, 19, 23, 1487, and the correct answer should be 3, 13, 19, 317453.

Although the answer is not correct, but they look very similar.

In addition, if we let the Transformer model do arithmetic directly, we will find that if we only let it do addition and subtraction, or only do multiplication and division, its accuracy is about 90%, but if we mix addition, subtraction, multiplication and division into one question, its accuracy is only 50%.

It’s really not as good as a calculator!

This proves that when AI solves math problems, it relies entirely on induction and generalization, and does not use algebraic skills.

He doesn't even know how to use the calculator he has. What a simple person.

Now, you can go out and brag:

I am better at math than AI.

One More Thing

Unfortunately, based on the current results, AI can no longer take advanced mathematics exams for us.

However, this research itself is not to help you pass the advanced math test. As a company that can create AlphaGo, DeepMind should not understand the pain of poor students.

By understanding that "AI answers math problems by relying on induction and generalization", DeepMind can expand the relevant principles to other richer fields. Problems that require induction and generalization may be solved by AI.

How about letting AI answer subjective questions in liberal arts next time?

Portal

☞Analysing
Mathematical Reasoning Abilities of Neural Models
David Saxton, Edward Grefenstette, Felix Hill, Pushmeet Kohli
https://arxiv.org/abs/1904.01557

☞Dataset
https://github.com/deepmind/mathematics_dataset

-over-

Quantumbit AI+ Salon Series--Smart City

Join the community

The QuantumBit AI community has started recruiting. The QuantumBit community is divided into: AI discussion group, AI+ industry group, and AI technology group;

Students who are interested in AI are welcome to reply to the keyword "WeChat group" in the dialogue interface of the Quantum Bit public account (QbitAI) to obtain the group entry method. (The technical group and AI+ industry group need to be reviewed and the review is strict, please understand)

Sincere recruitment

Qbit is recruiting editors/reporters, and the work location is Beijing Zhongguancun. We look forward to talented and enthusiastic students to join us! For relevant details, please reply to the word "recruitment" in the dialogue interface of the Qbit public account (QbitAI).

Quantum Bit QbitAI · Toutiao signed author

Tracking new trends in AI technology and products

If you like it, click here!

Latest articlesabout

■AI venom is all over Douyin and Xiaohongshu! Xianyu generates it for 10 yuan per time, but the official website is actually free

■The space-based intelligent version of ImageNet is here! Produced by Fei-Fei Li and Jia-Jun Wu’s team

■Multimodal models can be connected to the Internet without fine-tuning. A plug-and-play new framework is more effective than closed-source commercial solutions.

■Last week! 2024 Artificial Intelligence Annual Selection, the industry pioneers in the AI era are waiting for you

■The world's first legal o1 big model is released, slow thinking legal experts under the System2 paradigm | HKUST & Peking University

■Tsinghua University and Xiamen University proposed the "infinite length context" technology, which can find a needle in a million haystacks and make Llama\Qwen\MiniCPM score high

■Domestic AI can now shoot micro-movies! 4K, 60fps high-definition picture quality, with built-in sound effects

■Ant Group’s front-end technology team shares: What opportunities and changes will front-end development usher in under the wave of AI?

■AI protein published in Nature again after winning the Nobel Prize, with first-principles-level accuracy, a 4-year effort by Microsoft Research Asia

■A pop-up window confused Claude, and he suddenly couldn't use the computer | Stanford & HKU new research

最新精华更多