Teasing Microsoft's Classical Chinese AI Translation: What the hell are "Never Leave You" and "Its Mother"? ? ?
Mengchen Xiaoxiao sent from Aofei Temple
Quantum Bit Report | Public Account QbitAI
I heard that Microsoft has developed an AI to translate classical Chinese?
Let’s try it out right away. Let’s start with the beginning of “Cao Gui’s Discussion on War”:
I was shocked that they could translate "I" into "Lu State" and "Duke" into "Duke Zhuang of Lu".
In addition to learning classical Chinese vocabulary and grammar, could it be that AI has also read "Zuo Zhuan" thoroughly?
How would it be expressed in poetry?
Although the translation is not very literary, AI correctly understood the meaning of "looking at the same moon".
Wow, this translation really aroused my interest.
If Baidu and Microsoft take the exam together
Since it is not too difficult to translate the correct meaning of words, can AI master the special grammar in classical Chinese?
In order to better evaluate the capabilities of Microsoft Translator, let’s bring in the veteran player Baidu Translator to compete with each other.
Question 1: The Moon in the Qin Dynasty and the Pass in the Han Dynasty
The focus here is the rhetorical method of intertextuality , which should be understood as the bright moon and the pass of the Qin and Han dynasties.
Baidu's answer is:
It seems that Baidu did not fully understand it. Let’s take a look at Microsoft’s answer:
Microsoft correctly understood the intertextuality and took the lead in scoring 1 point.
Question 2: Spring breeze makes the south bank green again
The key point of this sentence is the usage of word classes . Green is originally an adjective, but it is used as a verb here.
Baidu comes first:
No problem, next is Microsoft:
Wait a minute, although the translation of "green" as a verb is correct, why is there an extra "but" at the end?
Could it be that... try entering the second half of the poem as well:
As expected, it seems that Microsoft Translator mastered the transition relationship between sentences when training with whole sentences, and then somehow calculated it into the first half of the sentence.
This time Baidu pulled back a game and the score was tied at 1:1.
The last question tests another common grammatical phenomenon in classical Chinese - inversion.
For example, in "Zou Ji's Advice to the King of Qi", there is a line "Who is more handsome, me or Lord Xu in the north of the city?"
According to the convention, Baidu first:
Then there's Microsoft:
It seems that both AIs have learned how to use inverted sentences, and the final result was a 2:2 tie, with each having its own merits.
Although Microsoft Translator has learned one more thing, intertextuality, it is still a young player after all, and needs more practice in handling the relationship between sentences.
Next, let’s challenge the limits of Microsoft’s classical Chinese translation.
For example, Wikipedia actually has a classical Chinese version called Wikipedia , which happens to have an entry for Microsoft .
Let’s try to let Microsoft AI translate the introduction of your company:
The modern-looking pseudo-classical Chinese is still too harsh for this newly born AI.
Although I was specially trained to use modern terms such as "Microsoft" and "computer", I was unable to recognize expressions such as "1975" which were not used in ancient times, and I couldn't recognize the name of my former boss Bill Gates.
Here, the "who established it" is also imagined to be a "king" according to the context of ancient Chinese. Maybe this is overfitting .
Speaking of modern expressions, this translation tool can actually be used in reverse, translating vernacular Chinese into classical Chinese.
For example, wouldn’t Prime Minister Zhuge Liang’s words “I have never seen such a shameless person!” be more appropriate if they were said in classical Chinese?
So, how was such a model created?
Transformer bonus, specializing in training data
This is certainly not the first time that AI has been used to translate classical Chinese.
Baidu was the first to use machine learning to translate classical Chinese, and has also applied for a related patent: "A method and device for converting between vernacular and classical Chinese."
There are many models for classical Chinese translation, ranging from machine learning, RNN to Transformer. For example, the Transformer model is used by Microsoft this time:
△
Image source: Microsoft Research AI headlines
However, training data in classical Chinese translation has always been a difficult problem.
Compared with other mainstream languages (modern Chinese, English, etc.), classical Chinese has very little training data. There are also problems such as sentence changes and mixing of traditional and simplified Chinese, which make the translation awkward.
This time, Microsoft's classical Chinese translation mainly solved the data problems in four aspects :
-
First, in response to the lack of data , the same words are used for data synthesis and enhancement. Classical Chinese and modern Chinese have some words with the same meaning. If these words are recalled and aligned, and then expanded to short words and sentences, a large amount of usable training data can be synthesized.
-
Second, the data format is transformed to improve robustness in view of the inflexible sentence transformation . The sentence segmentation of classical Chinese is different from that of modern Chinese. For this reason, the researchers use data format transformation to expand the amount of training data and let the model learn to translate similar sentences.
-
Third, in order to solve the problem of poor font recognition , the model is trained with mixed simplified and traditional Chinese data to improve its recognition ability. In order to enable machine learning to recognize both simplified and traditional classical Chinese, researchers will mix simplified Chinese and traditional Chinese data together when training the model to ensure that the translation model is error-free.
-
Fourth, for the "new words" in modern texts , we specially built relevant data sets and recognition models to ensure that there is no "random translation". In order to avoid confusion when the model encounters words such as "high-speed rail, computer, and Internet" in modern texts (for example, translating high-speed rail into a piece of iron at a height), the researchers built a model specifically for identifying these new words. In addition to new words, we also train new genres such as blogs, forums, and microblogs.
However, these are just translations between classical Chinese and Chinese. How about trying some English?
Bugs in English-Chinese translation can no longer be hidden
This time, Microsoft's classical Chinese translation is directly integrated into Bing translation. Is it possible to translate classical Chinese into a foreign language?
Let’s challenge a single English sentence first:
Never gonna give you up
It seems that simple sentences are not difficult for AI. Let's increase the difficulty and try it with a famous English poem "When You Are Old":
Wait, “grey and gloomy”, “shadowy depth of field”, “bending over the wine”…what the hell is all this?
Simple sentences seem fine, but when it comes to long sentences, why does it translate like this?
However, Microsoft also said that this time it mainly realizes the translation between classical Chinese and modern Chinese, which means that other languages should also need to be translated into modern Chinese before being translated into classical Chinese.
Let’s take a look at how Microsoft’s English-Chinese translation works:
Case solved, Microsoft's English-to-Chinese translation is really not that good...it may also be the reason for the errors in the translation from English to classical Chinese.
In comparison, the effect of translating classical Chinese into modern Chinese and then into Chinese is slightly better.
By the way, although the translation of serious English sentences is not very good, the translation of these words... is actually a bit literary?
It seems that I can learn how to curse gracefully from the translation model in the future. (dog head)
If you have come up with any interesting translations, please leave a message~
Microsoft Classical Chinese Translation Address:
https://cn.bing.com/translator
Reference links:
[1]https://weibo.com/msra?profile_ftype=1&is_all=1#1630370728811
[2]https://mp.weixin.qq.com/s/5cpBuUXfeb0r13JSyNuS_Q
-over-
This article is the original content of [Quantum位], a signed account of NetEase News•NetEase's special content incentive plan. Any unauthorized reproduction is prohibited without the account's authorization.
The "Smart Car" exchange group is recruiting!
Friends who are interested in smart cars and autonomous driving are welcome to join the community to communicate and exchange ideas with industry leaders, so as not to miss the development and technological progress of the smart car industry. Please be sure to add friends Note Your Name-Company-Position Oh~
click here
Featured Posts