AI Model GPT-4 Announced By OpenAI Claims To Perform Better Than 90% Of Test Takers On The SAT

March 14, 2023

minute read

There is a new version of OpenAI's primary large language model, GPT-4, that it announced on Tuesday, and it says it demonstrates "human-level performance" on many of the professional tests that it has conducted.

‍

Compared to previous versions of ChatGPT, ChatGPT-4 is a much "larger" version in terms of the amount of data trained on and the weights in the model file, which makes it more expensive to run.

‍

It is believed that many of the recent advances in artificial intelligence are due to the running of ever-larger models on thousands of supercomputers in training processes that can cost tens of millions of dollars per model, according to many researchers. In order to achieve better results, GPT-4 is an example of an approach aimed at "scaling up" to achieve better results.

‍

OpenAI said it used Microsoft

‍

Azure will be used to train the model; Microsoft has invested billions in the startup. Citing "the competitive landscape," OpenAI did not disclose the specific model size or hardware used to train it.

‍

Many of the artificial intelligence demos that have impressed people in the technology industry have been powered by OpenAI's GPT large language model in the past six months, including Bing's AI chat and ChatGPT, and the latest version is a preview of new advancements that will eventually reach consumer products like chatbots. The AI chatbot for Bing uses GPT-4, Microsoft said Tuesday.

‍

With the new model, OpenAI claims it will produce fewer factually incorrect answers, go off topic less often, and even perform better on standardized tests than humans.

‍

The GPT-4 was said to have performed at the 90th percentile on a simulated bar exam, the 93rd percentile on an SAT reading test, and the 89th percentile on the SAT math test, according to OpenAI.

‍

Nevertheless, OpenAI warns that the new software is not perfect yet and that it is capable of performing poorly in many scenarios when compared to humans. The company says the program still has a major problem with "hallucination," or making things up that aren't based on facts, and is not factually reliable. There is still a tendency for it to insist that it is correct even when it is incorrect.

‍

“We are still working on correcting many of the known limitations of GPT-4, including social biases, hallucinations, and adversarial prompts, as well as many other issues,” the company said in a blog post.

‍

“There is often a subtle difference between GPT-3.5 and GPT-4 in a casual conversation. "GPT-4 is more reliable, creative, and is able to handle much more nuanced instructions than GPT-3.5 when the complexity of the task reaches a certain threshold - it is more reliable, creative, and will handle much more complex instructions than GPT-3.5," wrote OpenAI in a blog post.

‍

It is expected that the new model will be available to paid subscribers of ChatGPT, as well as being made available as part of an API that enables programmers to integrate AI into their applications. There will be a charge of approximately 3 cents per word for about 750 words of prompts and 6 cents per word for approximately 750 words of responses from OpenAI.