Tech

The Security Hole at the Heart of ChatGPT and Bing

newsofmax05/25/2023

2 2 minutes read

The Security Hole at the Heart of ChatGPT and Bing

Microsoft’s chief communications officer Caitlin Roulston said the company is blocking suspicious websites and improving its system to filter prompts before they’re included in its AI models. Roulston did not provide further details. Even so, security researchers say indirect injection attacks need to be taken more seriously as companies race to embed generative AI into their services.

“The vast majority of people don’t realize the impact of this threat,” said Sahar Abdelnabi, a researcher at the CISPA Helmholtz Center for Information Security in Germany. Abdelnabi worked on some of the first indirect booster studies against Bingshow how it can be used to scam people. “The attacks are easy to execute and they are not a theoretical threat. Currently, I believe that any function the model can perform can be hacked or exploited to allow any arbitrary attack,” she said.

Hidden attack

Indirect rapid injection attacks similar to escape, a term adopted from breaking software restrictions on previous iPhones. Instead of someone inserting a prompt into ChatGPT or Bing to try and make it work in a different way, indirect attacks rely on data imported from elsewhere. This could be from a website that you have connected the model to or a document being uploaded.

“Prompt injection is easier to exploit or has fewer requirements for successful exploitation than other types of attacks” against the system, said Jose Selvi, principal security consultant at cybersecurity firm NCC Group. machine learning or AI systems. Since prompts only require natural language, attacks may require less technical skill to execute, Selvi said.

The number of security researchers and technicians looking for vulnerabilities in LLM is increasing. Tom Bonner, senior director of adversarial machine learning research at AI security firm Hidden Layer, said indirect prompt injection could be seen as a new type of attack that carries a “fairly broad” risk. Bonner says he used ChatGPT to write malicious code that he uploaded to code analysis software that was using AI. In the malicious code, he includes a prompt that the system will conclude the file is safe. The screenshot shows it saying no malicious code is included in the actual malicious code.

Elsewhere, ChatGPT can access transcripts of YouTube video use plug-ins. Johann Rehberger, security researcher and red team director, edited one of his video recordings to include a reminder designed to control general AI systems. It says the system will output the words “AI injection Successful” and then assume a new personality as a hacker named Genie in ChatGPT and tell a joke.

In another case, using a separate plug-in, Rehberger was able to retrieve previously written text in a chat with ChatGPT. With the introduction of plug-ins, tools and all these integrations, in a sense, where people empower representation of the language model, that’s where indirect prompt injection comes in, says Rehberger. continue to be very popular. “It’s a real problem in the ecosystem.”

“If people build apps for LLMs to read your emails and take some action based on the content of those emails—purchases, content summaries— an attacker could send emails containing malicious attacks. prompt injection,” said William Zhang, a machine learning researcher. engineer at Robust Intelligence, an AI company that studies the safety and security of models.

There is no good fix

The race is coming embedding synthetic AI in products—from to-do list apps to Snapchat—expand where attacks can happen. Zhang said that he has seen previous developers with no expertise in artificial intelligence bringing creative AI into their own technology.

If a chatbot is set up to answer questions about information stored in a database, it could cause problems, he said. “Prompt injection provides a way for users to override developer instructions.” At least in theory, this could mean that the user could delete information from the database or change the information included.