Various vendors tout security products as incorporating Artificial Intelligence, with practical results not different than what came before it. What is needed to deploy AI in security products?
Larry Lunetta: So the AI washing is absolutely true—and you can reach back to the late ’90s, it was web washing, right, as the internet became popular and then roll forward, it’s cloud washing.
There’s a set of waves that the marketers ride, and AI is certainly one now. The interesting thing is as you go through those phases, the technologies get tougher and tougher to actually execute. And I think part of the disappointment that you alluded to from an AI results perspective, is the fact that you know, when you AI wash, you may have algorithms, you may have actually found some data to train some of the models, but it’s not sufficient to deliver a practical result.
So there’s four or five elements that are required for successful AI solutions, starting with domain expertise. So it doesn’t matter how good a data scientists you are, if you’re not read into basically how the domain you’re trying to address works, you’re going to be building basically models in a vacuum, all right?
So there’s domain expertise. Data is crucial. AI people, data scientists will tell you that data is the new oil. I know startups in Silicon Valley now, who started their business with access to the data they needed to train their models before they wrote a line of code, before they hired any people, they guaranteed access to the data.
So that’s a big barrier to entry and it’s not hundreds of data points or thousands, it’s millions. Then you need to do it at scale and you need to have real-world exposure. AI 1.0 rarely works the way you expect it to, so there’s a lot of iterations.
So that’s why you’re seeing disappointment. But that’s also why Aruba’s successful with AI. We’ve been doing this for a long time and it’s not just in security. We do it to optimize the placement of access points for network coverage, RF optimization, and we use it in security in order to find attacks on the inside that have eluded the standard security defenses, things that use rules and signatures and pattern matching.
Looking for the known. Well, the challenge is finding things you haven’t seen before, and you can only do that with behavioral analytics and that leads you to supervise and then unsupervise machine learning.
So we have a product called IntroSpect, that houses those algorithms, and we have many customers from Fortune 50 to small organizations that use it very successfully. An example, in a school district we found a digital sign that no one had thought about that was installed several years ago, and when we came in with our machine learning, we found it and we also discovered it was communicating with a hundred different countries. It had been totally compromised, but there was no way anybody was looking for that, and that’s the value of AI—because you can see it through network behavior and other things and then the models will pick that up as an abnormal activity.
We also use AI in the Internet of Things world because it’s very difficult to find and fingerprint a camera or an MRI machine. Conventional techniques don’t work very well. So again, we use AI looking at network traffic to understand the difference between those kinds of devices.
So we pick our spots when a common theme is networks, and who knows networks better than Aruba? So that’s where the domain expertise comes in.
James Sanders: Considering the value that businesses have about data, what type of privacy concerns do businesses have about aggregate connection data used to train machine learning models that are deployed in products?
Larry Lunetta: The reason, the startup example is, it’s hard to get the right data. People are reluctant to part with it. We do not essentially surf customer data, you know. We may extract certain metadata out of the traffic that we see, but it has no personal information associated with it. It’s things like ports, protocols and things like that.
And some privacy from a couple of perspectives. One is, where does the data go? What do you do with it? Second, especially in Europe with GDPR, we do use our attribution to some of the data that we do collect for IntroSpect. It doesn’t go to the cloud, but even on-prem, there’s a lot of sensitivity about monitoring personal behavior.
SEE: Artificial intelligence: A business leader’s guide (free PDF) (TechRepublic)
So, the tension that exists is if you don’t do that, the potential of a breach of personal information is higher. We literally sit down with worker councils and explain what we do and give them a chance to understand that and to weigh in.
So, a lot of times we obfuscate personal information, which means we carry it, but unless you have permission, you don’t see it. And in the case of Germany, the workers’ councils will be involved with the decision to unobfuscate the data. So at least some level of security analyst will be able to see it.
So, privacy and security, sometimes they’re uneasy bedfellows, but I think security vendors make a concerted effort to reduce the amount of personal information we collect and use. And when we do it, we try and be very protective of it.
James Sanders: Moving on a bit. How can machine learning and artificial intelligence be used to increase security by detecting and stopping anomalous or malicious traffic that traditional firewalls would either mark as a false positive, keeping work from getting done, or a false negative allowing malicious traffic through?
Larry Lunetta: So, it gets back to the point of what the mission of a rule is, which is what drives a firewall. You’re looking for a set of conditions that, in your mind will indicate an attack is underway. So that by, and again, a priori means you’re looking for what you know about. And a lot of these attacks don’t conform to previous behavior.
The attackers are very smart. In fact, the attackers are using AI to shape their behaviors, to cloak their malware, et cetera, to evade these kinds of defenses. So the point is, if you think about an attack on the inside, which is really what’s doing the most damage, the attacker first seeks legitimate credentials. So if I click on the wrong email attachment and get spearfished, I’m going to be compromised, but I probably don’t know about it.
The attacker has a beachhead and because it’s a legitimate credential, they don’t have to do things very fast. They can be very deliberate. Work in small steps. Typically there’s a command and control channel that gets opened up so they can communicate with what they’ve already established, direct the attack. They surveil the networks, scan for if it’s ransomware, the valuable data and things like that.
And those behaviors are, what often tip you off that something’s amiss. But how do you see that out of millions and billions of pieces of data? And how do you prevent the false positive problem, which is rife in the security industry?
If you think about products like SIN and things like that, you hear repeatedly that there are too many red warnings, too many 10s that analysts have to follow up. So white noise is a problem. And then, of course, the catastrophe is the false negative where things get through. So that’s where, again, AI and machine learning can process all that data, distill it down, see the small changes in behavior, and stop the attack before it actually executes.