AI Safety as a PL Problem

In the last few years, there have been many anxious ruminations about AI safety. To a significant extent, these fears come from the realization that modern AI systems can be alarmingly brittle. For example, a deep neural net used for image classification can produce different outputs on inputs that are indistinguishable to humans. Machine learning algorithms can reflect harmful biases in their training data. Robots trained in controlled settings routinely fail in the real world.

Fortunately, AI systems are ultimately software, and therefore PL researchers can play a key role in making them more reliable. An emerging body of research uses PL ideas — such as high-level language abstractions and automated reasoning about correctness — to aid the design of robust and accountable AI systems. In this post, I will give a taste of some recent work on this topic.

My focus here is on sequential decision-making problems, which naturally arise in safety-critical domains like autonomous cyber-physical systems. Here, we have an AI agent that is interacting with its environment using a learned policy: a function from environment states to agent actions. A popular approach to discovering policies is >reinforcement learning (RL), in which the agent explores the environment strategically, receiving rewards for various actions, and seeks to learn a policy that maximizes the expected reward over a long time horizon. In contemporary work, it is common to represent policies using deep neural networks. Because neural nets are differentiable in their parameters, this representation allows the use of scalable, gradient-based learning techniques.