DeepMind says its new language model can beat other language models 25 times


This AI is called RETRO (“Retrieval Enhanced Translator”), and its performance matches that of the neural network, which is 25 times its size, thereby reducing the time and cost of training very large models. The researchers also claim that the database makes it easier to analyze what AI has learned, which helps filter out prejudices and toxic language.

DeepMind’s Jack Rae said: “It’s often useful to be able to find everything instantly instead of memorizing everything, just like for humans.” said DeepMind’s Jack Rae, who is the company’s large-scale language model research leader.

The language model generates text by predicting the next word in a sentence or dialogue. The larger the model, the more information it can learn about the world during training, which makes its predictions better. GPT-3 has 175 billion parameters-the values ​​in the neural network, which are used to store data and adjust them when the model is learning. Microsoft’s language model Megatron has 530 billion parameters.But large models also require a lot of computing power to train, making them impossible to achieve Except for the richest organizations.

Through RETRO, DeepMind tries to reduce training costs without reducing the amount of AI learning. The researchers trained the model on a large number of news articles, Wikipedia pages, books, and text data sets from GitHub (an online code repository). The data set contains texts in 10 languages, including English, Spanish, German, French, Russian, Chinese, Swahili, and Urdu.

RETRO’s neural network has only 7 billion parameters. But the system makes up for this with a database of approximately 2 trillion texts. The database and neural network are trained at the same time.

When RETRO generates text, it uses a database to find and compare paragraphs similar to the article it wrote to make its predictions more accurate. Outsourcing some of the neural network memory to the database, making RETRO do more with less.

This idea is not new, but this is the first time that a search system has been developed for a large language model, and for the first time the results of this method have proven to be comparable to the performance of the best language AI around.​​

Bigger is not always better

RETRO draws on two other studies released by DeepMind this week, one studying how the size of a model affects its performance, and the other studying the potential harm caused by these AIs.

To study the scale, DeepMind built a large language model called Gopher with 280 billion parameters. It defeated the most advanced models in 82% of the more than 150 common language challenges used for testing. The researchers then compared it with RETRO and found that the 7 billion parameter model matched Gopher’s performance on most tasks.

Ethics research is a comprehensive investigation of well-known issues inherent in large-scale language models. These models extract prejudice, misinformation, and toxic language, such as hate speech, from the articles and books they are trained on. As a result, they sometimes spit out harmful statements, blindly reflecting what they encountered in the training text, without knowing what it means. “Even models that perfectly mimic the data will be biased,” Lei said.


Source link

Recommended For You

About the Author: News Center