Machine learning has rapidly gained recognition as a powerful technique across a wide range of applications, and rightly so. Machine learning algorithms, particularly deep neural networks (DNNs), have surpassed earlier image-recognition methods by a wide margin, and when Google switched its text translate service to a machine learning algorithm, users noticed an immediate, dramatic improvement. Machine learning methods already play a quiet but critical role in applications like email-spam filtering, malware detection, security-threat detection, and in emerging technologies such as automated driving, it lies at the heart of a worldwide rush to field true driverless vehicles. Still, reasons exist to move with suitable caution and awareness. Even as machine learning spreads more broadly and penetrates more deeply into everyday life, it introduces a particularly insidious type of security threat.
For the past few years, researchers studying the robustness of machine learning algorithms have recognized that trained machine learning models can be tricked into misclassifying data. Using a number of different techniques, researchers found that they could fool the models by manipulating the input data, with methods as simple as adding noise. In more involved methods, the use of adversarial neural networks to find subtle alterations to input data could result in a misclassification. This kind of misclassification has had serious consequences, for example, in driverless vehicles, where a stop sign misclassified as a speed limit sign could lead to damage, injury, or worse.
In most cases, researchers have applied these techniques on white-box models, which offer complete exposure to the inner workings of the neural networks being attacked. Although this level of visibility may not have changed the result, questions still remained about whether the vulnerabilities found in these white-box models would apply in real-world applications. These questions soon vanished when attacks began to succeed on black-box models, where access to the model only included the ability to present input data and view the inference results.
In these successful black-box attacks, researchers created a parallel model trained to mimic the results generated by the black-box model when both models received the same input data. This approach and similar methods require large input data sets and a correspondingly large number of input queries on the model under attack. For this and other reasons, questions remained on whether the methods used in those attacks would apply in practical situations where attackers may face a limit on the number of input queries they can apply, or on the amount of output data or detail they can receive. Even those questions have fallen away recently as researchers found that they could fool a black-box model into misclassifying data even under those tight constraints.
Particularly disturbing in most of these white-hat attacks is that a hacker can fool a model using input modifications that to most humans seem trivial or even imperceptible. A model may classify a slightly altered photo as something entirely different than what clearly appears to be something else to a human observer. Similarly, when subtly injecting words into an audio-stream of speech, the result may sound like the original speech to a human, though the model hears the injected phrase.
By nature, DNNs both enable this kind of vulnerability and complicates the mitigation of these same vulnerabilities. Together, multiple layers of neurons in a DNN classify an input by building a complex association among the numerous features derived from the original input. How this happens at a micro-level is not well understood. In fact, the general understanding of how DNNs produce their results is so limited that no generalized algorithm or even heuristic methods exist for finding the optimal model parameters or architecture. The most experienced researchers say that the way to find the best model is to try as many alternative architectures as possible, tweak their designs, modify their designs even further, and see which model reveals itself as the best.
This lack of understanding of how DNNs produce their results serves as an open door to exploits—or perhaps, more precisely, it provides a potential backdoor for hackers. For example, one of the most efficient approaches for creating an image recognition model is employing other pre-trained models as the starting point for developing custom models. Because the micro-level details of a model’s operations are not well understood, a hacker could compromise the existing model (that is, to no apparent effect) and seed the modified model into repositories of pre-trained models. Then, if a developer uses the compromised model as a starting point, his or her custom model could provide hackers with an eventual backdoor to a target application and its associated resources.
Threats and the mitigation of these threats in machine learning applications is an aspect of security that is only beginning to emerge. Most likely, the cure is in the disease, and white-hat hackers will likely secure models using the same techniques black-hat hackers use to compromise these same kinds of models. For now, the immediate lessons being learned by those on the protective side are largely about recognition of these classes of threats. At this early stage of the model-security story, preparing for these threats begins with an understanding that the same fundamentals necessary to remediate security weaknesses in any product development are the same fundamentals that apply equally well to machine-learning model acquisitions and custom model developments.
Stephen Evanczuk has more than 20 years of experience writing for and about the electronics industry on a wide range of topics including hardware, software, systems, and applications including the IoT. He received his Ph.D. in neuroscience on neuronal networks and worked in the aerospace industry on massively distributed secure systems and algorithm acceleration methods. Currently, when he's not writing articles on technology and engineering, he's working on applications of deep learning to recognition and recommendation systems.