AI Safety as a PL Problem

[Originally published on PL Perspectives, 6/24/2019]

In the last few years, there have been many anxious ruminations about AI safety. To a significant extent, these fears come from the realization that modern AI systems can be alarmingly brittle. For example, a deep neural net used for image classification can produce different outputs on inputs that are indistinguishable to humans. Machine learning algorithms can reflect harmful biases in their training data. Robots trained in controlled settings routinely fail in the real world.

Fortunately, AI systems are ultimately software, and therefore PL researchers can play a key role in making them more reliable. An emerging body of research uses PL ideas — such as high-level language abstractions and automated reasoning about correctness — to aid the design of robust and accountable AI systems. In this post, I will give a taste of some recent work on this topic.

My focus here is on sequential decision-making problems, which naturally arise in safety-critical domains like autonomous cyber-physical systems. Here, we have an AI agent that is interacting with its environment using a learned policy: a function from environment states to agent actions. A popular approach to discovering policies is >reinforcement learning (RL), in which the agent explores the environment strategically, receiving rewards for various actions, and seeks to learn a policy that maximizes the expected reward over a long time horizon. In contemporary work, it is common to represent policies using deep neural networks. Because neural nets are differentiable in their parameters, this representation allows the use of scalable, gradient-based learning techniques.

Verifying learned policies

All functions represented on computers are ultimately programs — even a neural network is just a low-level program! In that case, we can imagine using program verification techniques to make sure that a machine-learned function — specifically, a neural network — is doing what it’s supposed to do. There is an emerging body of work that seeks to do exactly this.

One challenge here is requirement specification: how do we even define what the network should do? So far, most work on neural network verification has focused on the requirement of robustness, which states that a small change to the input of the network does not change its output. While this property is useful in showing that a neural net won’t be fooled by adversarial examples, it is natural to wonder how to go beyond the verification of this single class of requirements.

One way of going beyond robustness properties is to think about end-to-end verification of larger systems with learned components. Requirements for such end-to-end analysis can often be easier to write than those for individual modules. For example, it is easy to demand that an autonomous car controlled by a neural policy doesn’t crash. This requirement induces an assertion on the neural policy that we do not need to explicitly specify.

End-to-end verification of systems with learned components is still fairly new. An interesting recent paper in this space is by Sun et al.: “Formal Verification of Neural Network Controlled Autonomous Systems” (HSCC 2019). The paper considers an autonomous robot controlled by a neural network; the requirement is that the robot does not collide with a set of known obstacles. The main idea is to construct a finite-state abstraction of the system — a nontrivial task as the system includes a deep neural network. This is done using a Satisfiability Modulo Convex Programming (SMC) solver, which uses a combination of SAT-solving and convex optimization to check the satisfiability of a boolean combination of convex constraints over the reals.

Perhaps the biggest challenge in this line of work is scalability — modern neural nets can be simply too large for end-to-end analysis using heavyweight solvers. Also, the separation of verification and learning may have inherent costs. The neural net here is trained a priori, and provable correctness is not part of its training objective. Why should we expect the parameters discovered by this training process to be provably safe? That said, the high-level idea of end-to-end verification of systems with neural components is compelling, and I expect many other methods to build on this paper’s ideas.

Synthesizing abstract representations

One way of overcoming the scalability challenge in neural network verification is to employ the PL idea of abstraction. A neural network, after all, is an extremely low-level representation of a function. Perhaps we can represent learned policies more compactly as higher-level programs, without significantly compromising performance?

Motivated by this idea, we recently proposed a formulation of the RL problem as a form of program synthesis (“Programmatically Interpretable Reinforcement Learning“, Verma et al., ICML 2018). Here, we assume a high-level domain-specific language (DSL) that can describe complex policies obtained by composing a set of “library” control modules (for example, PID controllers). The objective is to learn a policy, represented in this DSL, that achieves maximal reward. (A NeurIPS 2018 paper by Bastani et al. studies the same problem but uses decision trees instead of programs.)

The main difficulty here is that the space of programs in the DSL is vast and nonsmooth, so a direct search over programs is a nonstarter. Our approach to this problem is based on imitation learning: we first learn a neural policy, and then find a program whose outputs, on a set of iteratively discovered “interesting” inputs, imitates the neural network’s as closely as possible. Once the program is discovered, the neural network is thrown away; thus, its only use is as a “crutch” for accelerating search.

Unsurprisingly, the programs generated through this process are easier to interpret and analyze than neural nets. Such higher-level representations may have other benefits as well. Deep networks have a proclivity to “memorize” training data. In the RL setting, this means that a learned policy may fail when placed in an environment that is even slightly different from the training environment. In comparison, our programmatic policies appear to be more robust. We see this in the video below, in which learned policies are used to drive a car around a race track. The program learned using the NDPS algorithm (our method) continues to perform well when one switches from the training track to a different “transfer” track. Compare this with what happens to the policy learned using standard DRL.

There are limitations, of course: the programs learned here are quite small, and there is a nontrivial performance gap between the neural and the programmatic policy in the training environment. Also, the paper does not integrate verification with synthesis/learning. Finally, the control inputs here are sensor values such as steering angle and acceleration; it is not clear how to extend the method to a setting where inputs are, say, images.

Followup work has already begun to address some of these issues. For example, an upcoming paper by Zhu et al., “An Inductive Synthesis Framework for Verifiable Reinforcement Learning” (PLDI 2019), synthesizes provably safe programmatic policies and uses runtime monitoring to handle the performance gap between programmatic and neural policies. Concretely, the method executes the unverified but higher-performance neural policy rather than the verified but lower-performance programmatic policy, and monitors the system state during execution. If the system executes an unsafe action, control is switched to the programmatic policy (the shield), whose actions are guaranteed to be safe. If unsafe actions tend to be executed rarely — a reasonable assumption for properly trained neural policies — this approach guarantees safety without significantly compromising performance.

Conclusion

PL ideas such as automated program verification and synthesis can be useful in solving problems facing real-world AI/ML systems. Mainstream machine learning research is primarily focused on the careful, manual design and analysis of individual learning models and algorithms. PL researchers can help automate certain facets of such design and analysis, and also develop methods for reasoning about complex systems obtained by composing different models/algorithms. A synthesis of ideas from PL and machine learning can be invaluable in the design of real-world AI systems.

PL-for-AI can also motivate exciting technical problems in core PL. For example, in the work on learning programmatic policies I mentioned earlier, the goal is to synthesize a program that optimally imitates a neural network. This is a quantitative synthesis problem, and while such problems have been studied in past work, they have always been on the margins of PL research. With the emergence of PL-for-AI, problems like this are likely to move closer to the center of PL.

If you are interested in learning more, here are the slides for a keynote at CAV 2019 that expanded on some of the topics described in this post.

Verifying learned policies

Synthesizing abstract representations

Conclusion

Trustworthy Intelligent Systems Lab