Surpassing human intelligence: Alibaba’s AI outperformed a real human on reading comprehension test

Chinese online seller Alibaba Group announced that its artificial intelligence (AI) software surpassed the human brain in a reading comprehension test. The company noted that the deep-learning model, developed by its Institute of Data Science of Technologies, attained an “Exact Match” score of 82.44 on the Stanford Question Answering Dataset, compared with a human score of only 82.304. The test was created by researchers at the Stanford University in California, and has more than 100,000 questions and answers that were based on data found in over 500 stories from the Wikipedia website.

“It is our great honor to witness the milestone where machines surpass humans in reading comprehension.That means objective questions such as “what causes rain” can now be answered with high accuracy by machines. The technology underneath can be gradually applied to numerous applications such as customer service, museum tutorials and online responses to medical inquiries from patients, decreasing the need for human input in an unprecedented way,” said Luo Si, chief scientist for natural language processing at the Alibaba Institute.

“We are thrilled to see NLP [natural language processing] research has achieved significant progress over the year. We look forward to sharing our model-building methodology with the wider community and exporting the technology to our clients in the near future,” Si added.

Alibaba noted that it has already used its reading comprehension model in various parts of the business including telephone operations.

Experts believe human comprehension is still richer, broader

The recent developments in reading comprehension technologies add to a growing number of algorithms and AI systems designed to match or even surpass the human brain’s capabilities. (Related: Researchers look for ways for humans to maintain control over artificial intelligence.)

Another deep-learning model developed by Microsoft Research Asia was able to outperform human scores in the same reading comprehension test. According to a Voice of America report, the software giant’s model reached an Exact Match score of 82.65. Microsoft reported that it has already incorporated the model into its Bing search engine and the Cortana digital assistant. However, a Microsoft official noted that while the innovation was a major milestone for the company, the human brain is still better than machine in terms of understanding language nuances and complexity.

“Natural language processing is still an area with lots of challenges that we all need to keep investing in and pushing forward. This milestone is just a start,” said Ming Zhou, assistant managing director at Microsoft Research Asia.

Microsoft also previously reported that a number of its algorithm were able to match human capacity in certain activities such as speech and critical thinking. In 2016, the software giant announced that its speech recognition system turned out to be just as good as a human’s following a speech test. Moreover, Microsoft and Google announced in 2015 that their sorting system outperformed the human brain in categorizing and classifying the content of images.

The system was put to test by sorting photos into 1,000 categories, 120 of which are breeds of dog. The system was touted for its sorting performance, which proved too tricky for the human brain. However, experts were quick to take note that the computer system was no match against adults and even small children at interpreting imagery. This was largely due in part to the system’s apparent lack of common sense, the researchers said.

“The good news is that on these narrow tasks, for the first time, we see learning systems in the neighborhood of humans. We also see results that show how narrow and brittle these systems are. What we would naturally mean by reading, or language understanding, or vision is really much richer or broader,” Oren Etzioni, CEO of the Allen Institute for AI, told Wired online.

Sources include:

comments powered by Disqus