August 3, 2020, ainerd
In June, OpenAI released GPT 3.0, an API it developed to access new AI models and allow users to try it out. Designed to be easy for everyone to use and increase the productivity of machine learning teams, the API runs on all models of the G PT 3 family and in the background.
So last week, OpenAI published a 74-page research paper, “Few Shot Learners,” published on arXiv, telling the world about a gigantic model trained on 174,600 million parameters. Although we do not delve into the depth of the numbers, you can see here all the parameters used in past models. The paper “Language Models” is the first in a series of contributions to the GPT 3.0 family of models and even a few steps in the right direction for Language Models, but more details are available on the GitHub of this project.
If you’re not overwhelmed by all this, take a look at the numbers of training records and then some of the more interesting features.
The title itself is quite interesting and gives an impulse to think about the language of the model, which can be used to create more complex models such as the GPT 2 model architecture. GPT 3 scales the model architectures of GPGP 2 to include modified initialization, a number of new training records, and a number of new features. It shows many improvements over the previous version as well as some improvements over the training model.
API products build on OpenAI’s ability to compile AI algorithms and neural networks fed into an ever-growing text database. Recent research has demonstrated and compared this approach using a pre-trained large body of text, followed by fine-tuning for specific tasks. In this case, GPT 3 approaches the performance of SOTA fine-tuning, can produce high-quality samples, and shows significant performance improvements for tasks defined in flight, “said Open AI researchers.
At its core, the OpenAI API looks at what it sees in the language and then uses examples to predict what words should appear next in a sentence and how to best answer a particular question. Software developers can start by training an AI system and showing it what they want to do with the code.
By asking the system a series of questions in a row, it can begin to sense that it is in question-and-answer mode, and adjust its response accordingly. GPT 3 could generate images or MIDI files, which we did not explicitly do, but it sounds much better than just predicting the next word in a row, which it still does. But it will never have its own agency, and it could not be used to predict, for example, protein folds, so it will not be.
In this respect, it is not, and never will be, an AGI, and it is no further than that, but it is still a very important step in the right direction.
GPT 3 expands from the GPT 2 architecture to 175 billion parameters, but also includes a number of new features, such as the use of a new set of parameters and a few new methods to customize the initialization. This is reflected in the fact that it is now compared with a number of different architectures and architectures with different parameters. It must also be able to customize the initialization for a variety of architectures, from a single architecture to an architecture with multiple architectures.
OpenAI has recently revealed the eye – memorably, that GPT 3 has 175 billion parameters and that it has a number of new features, such as the use of a new set of parameters and a few new methods to customize the initialization. The new G PT 3 from OpenAI is much larger than the previous version of GPGPT 2, which had 1.5 billion parameters. It has more than twice as many parameters as its predecessor, but less than half the complexity of its predecessors.
The behavior resulting from larger models is exciting: models, for example, need less fine-tuning to perform NLP tasks well. Language models are less able to learn, and the behavior of GPGPT 3 shows that models with a larger set of parameters, such as those in GPT 2, need more examples and less fine-tuning to respond better to questions that cannot be answered with the same number of examples in the previous version of the model.
OpenAI made headlines last year when it released GPT 2, a new version of its GPGPT 3 model, and there is no doubt that it has a successor. G PT 2 was a huge transformer based on BERT, trained to predict the next word from 40 GB of internet text. You would take a set of parameters (say “BERT”) of 10,000 words (10 times more than in the original model) and feed them into a translation pair. This is the successor, which has the ability to operate the parameters with more than 10 times the parameters and to train 10 times longer on a data set.