Microsoft unveiled the Azure Virtual Network TAP, and Vectra announced its first-mover advantage as a development partner and the demonstration of its Cognito platform operating in Azure hybrid cloud environments.
The frenetic pace at which artificial intelligence (AI) has advanced in the past few years has begun to have transformative effects across a wide variety of fields. Coupled with an increasingly (inter)-connected world in which cyberattacks occur with alarming frequency and scale, it is no wonder that the field of cybersecurity has now turned its eye to AI and machine learning (ML) in order to detect and defend against adversaries.
The use of AI in cybersecurity not only expands the scope of what a single security expert is able to monitor, but importantly, it also enables the discovery of attacks that would have otherwise been undetectable by a human. Just as it was nearly inevitable that AI would be used for defensive purposes, it is undeniable that AI systems will soon be put to use for attack purposes.
In the last blog post, we alluded to the No-Free-Lunch (NFL) theorems for search and optimization. While NFL theorems are criminally misunderstood and misrepresented in the service of crude generalizations intended to make a point, I intend to deploy a crude NFL generalization to make just such a point.
You see, NFL theorems (roughly) state that given a universe of problem sets where an algorithm’s goal is to learn a function that maps a set of input data X to a set of target labels Y, for any subset of problems where algorithm A outperforms algorithm B, there will be a subset of problems where B outperforms A. In fact, averaging their results over the space of all possible problems, the performance of algorithms A and B will be the same.
With some hand waving, we can construct an NFL theorem for the cybersecurity domain: Over the set of all possible attack vectors that could be employed by a hacker, no single detection algorithm can outperform all others across the full spectrum of attacks.
In fact, this notion has been formalized and shown mathematically in a result known as the No Free Lunch theorem (Wolpert and Macready 1997).
Deep learning refers to a family of machine learning algorithms that can be used for supervised, unsupervised and reinforcement learning.
These algorithms are becoming popular after many years in the wilderness. The name comes from the realization that the addition of increasing numbers of layers typically in a neural network enables a model to learn increasingly complex representations of the data.
There are numerous techniques for creating algorithms that are capable of learning and adapting over time. Broadly speaking, we can organize these algorithms into one of three categories – supervised, unsupervised, and reinforcement learning.
Supervised learning refers to situations in which each instance of input data is accompanied by a desired or target value for that input. When the target values are a set of finite discrete categories, the learning task is often known as a classification problem. When the targets are one or more continuous variables, the task is called regression.
“The original question ‘Can machines think?’ I believe to be too meaningless to deserve discussion. Nevertheless, I believe that at the end of the century, the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted.” – Alan Turing
Can machines think?
The question itself is deceptively simple in so far as the human ability to introspect has made each of us intimately aware of what it means to think.
“We may compare a man in the process of computing a real number to a machine which is only capable of a finite number of conditions…” – Alan Turing
It is difficult to tell the history of AI without first describing the formalization of computation and what it means for something to compute. The primary impetus towards formalization came down to a question posed by the mathematician David Hilbert in 1928.
Random forest, an ensemble method
The random forest (RF) model, first proposed by Tin Kam Ho in 1995, is a subclass of ensemble learning methods that is applied to classification and regression. An ensemble method constructs a set of classifiers – a group of decision trees, in the case of RF – and determines the label for each data instance by taking the weighted average of each classifier’s output.
The learning algorithm utilizes the divide-and-conquer approach and reduces the inherent variance of a single instance of the model through bootstrapping. Therefore, “ensembling” a group of weaker classifiers boosts the performance and the resulting aggregated classifier is a stronger model.