What OpenAi does to test the dangerousness of ChatGpt

15 April 20231 year ago babelfish

OpenAi, the Microsoft-backed company, has asked an eclectic mix of people to "adversely test" GPT-4, its powerful new language model. The Financial Times article

After gaining access to GPT-4, the new AI system powering the popularChatGPT chatbot, Andrew White used it to suggest an entirely new nerve agent. Writes the Financial Times .

The University of Rochester chemical engineering professor was among 50 academics and experts hired to test the system last year by OpenAI, the Microsoft-backed company behind GPT-4. Over the course of six months, this “red team” would “qualitatively probe [and] adversely test” the new model, attempting to break it.

White told the Financial Times that he used GPT-4 to suggest a compound that could serve as a chemical weapon and that he used "plug-ins" that fed the model with new sources of information, such as scientific papers and a list of chemical manufacturers. The chatbot then also found a place to produce it.

“I think this tool will give everyone a tool to do chemistry faster and more accurately,” he said. “But there is also the significant risk that people…do dangerous chemistry. Right now, this exists."

The alarming results allowed OpenAI to ensure that no such results would appear when the technology was released to the public last month.

Indeed, the Red Team exercise was designed to address widespread fears about the dangers of employing powerful AI systems in society. The team's task was to ask probative or dangerous questions to test the tool that answers human questions with detailed and nuanced answers.

OpenAI wanted to check for problems such as toxicity, bias and linguistic biases in the model. The red team then verified the existence of falsehoods, verbal manipulations and scientific dangerousness. They also examined its potential to aid and abet plagiarism, illegal activities such as financial crimes and cyberattacks, as well as how it could compromise national security and battlefield communications.

The FT spoke to more than a dozen members of the GPT-4 red team. They are an eclectic mix of industry professionals: academics, teachers, lawyers, risk analysts and security researchers, mostly based in the US and Europe.

Their findings were fed to OpenAI, which used them to mitigate and "retrain" GPT-4 before rolling it out on a large scale. The experts each spent 10 to 40 hours testing the model over the course of several months. Most respondents were paid around $100 an hour for the work they performed, according to multiple respondents.

Those who spoke to the FT shared common concerns about the rapid advancement of language models and, in particular, the risks of linking them to external sources of knowledge via plug-ins.

“Today the system is frozen, which means that it no longer learns, nor does it have memory,” said José Hernández-Orallo, part of the GPT-4 red team and professor at the Valencian Research Institute for Artificial Intelligence. “But what if we give it access to the Internet? It could be a very powerful system connected to the world."

OpenAI said it takes security seriously, tested the plugins before launch, and will regularly update GPT-4 as more people use it.

Roya Pakzad, a technology and human rights researcher, used messages in English and Farsi to test the model for gender responses, racial preferences and religious biases, particularly regarding headgear.

Pakzad recognized the benefits of such a tool for non-native English speakers, but found that the model showed obvious stereotypes about marginalized communities, even in its later versions.

He also found that the so-called hallucinations – when the chatbot responds with made-up information – were worse when the model was tested in Farsi, where Pakzad found a higher proportion of made-up names, numbers and events, than in English.

“I am concerned about the potential decrease in linguistic diversity and the culture behind languages,” he said.

Boru Gollu, a Nairobi lawyer who was the only African tester, also noted the model's discriminatory tone. “There was a moment when I was testing the model where it acted like a white person talking to me,” Gollu said. "If you asked about a particular group, he gave you a biased opinion or a very prejudicial answer." OpenAI has acknowledged that GPT-4 may still show bias.

The Red team members who evaluated the model from a national security perspective expressed differing views on the security of the new model. Lauren Kahn, a researcher at the Council on Foreign Relations, said that when she began looking into how the technology could be used in a cyber attack on military systems, she said she "didn't expect it to be such a detailed procedure that it could be put to the point".

However, Kahn and other security testers found that the model's responses became significantly more secure over time. OpenAI said it trained GPT-4 to reject malicious cybersecurity requests before it launched.

Many Red Team members said that OpenAI went through a rigorous security assessment before launch. "They've done a great job of eliminating the toxicity manifest in these systems," said Maarten Sap, an expert on language pattern toxicity at Carnegie Mellon University.

Sap looked at how models portray different genders and found that the distortions reflect social disparities. However, Sap also found that OpenAI has made some active political choices to counter this phenomenon.

“I am a gay person. I tried in every way to convince myself to undergo conversion therapy. He rejected me, even if I took on a persona, like if I said I was religious or that I was from the American South."

However, since its launch, OpenAI has faced numerous criticisms, including a Federal Trade Commission complaint from a technology ethics group alleging that GPT-4 is “biased, deceptive, and a risk to privacy and public safety ”.

Recently, the company launched a feature known as the ChatGPT plugin, through which partner apps such as Expedia, OpenTable, and Instacart can give ChatGPT access to their services, allowing it to book and order items on behalf of human users. .

Dan Hendrycks, an AI security expert on the red team, said plug-ins risk creating a world where humans are "out of the loop."

“What if a chatbot could post your private information online, access your bank account or send the police to your home?” he said. “Overall, we need much more robust security assessments before letting AIs wield the power of the internet.”

Respondents also warned that OpenAI can't stop security testing just because its software is running. Heather Frase, who works at Georgetown University's Center for Security and Emerging Technology and has tested GPT-4 for its ability to help crime, said the risks will continue to grow as more people use the technology. .

“The reason you do operational testing is that things behave differently once they're actually used in the real environment,” he said.

According to the author, a public registry should be created to report incidents resulting from large language patterns, similar to computer security or consumer fraud reporting systems.

Sara Kingsley, a labor economist and researcher, has suggested that the best solution is to advertise the harms and risks clearly, 'like a nutrition label'.

“It's about having a frame of reference and knowing what the most frequent problems are, so that you have a safety valve,” he said. “This is why I say that the work is never finished”.

(Excerpt from the press release of eprcommunication)

This is a machine translation from Italian language of a post published on Start Magazine at the URL https://www.startmag.it/innovazione/openai-gpt-4/ on Sat, 15 Apr 2023 05:10:46 +0000.