To trust a system, you need to understand it. However, in learning-enabled systems, interpretability is often at odds with learning performance. For example, deep neural networks can learn efficiently but are opaque black boxes. On the other hand, linear models or shallow decision trees are more interpretable but do not perform well on complex tasks.
Our lab has introduced programmatic interpretability, a new way around this conflict. Here, one learns models represented as programs in neurosymbolic domain-specific languages [ICML 2018; NeurIPS 2020]. These languages are designed to be interpretable by specific groups of users; at the same time, they are more expressive than traditional “shallow” models. Other goals include the synthesis of programmatic explanations of local decisions made by a more complex model, the inference of human-comprehensible properties of models through program analysis, and the systematic exploration of the tradeoffs between interpretability and model performance.
In: Larochelle, Hugo; Ranzato, Marc'Aurelio; Hadsell, Raia; Balcan, Maria-Florina; Lin, Hsuan-Tien (Ed.): Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
Programmatically Interpretable Reinforcement Learning Inproceedings
In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pp. 5052–5061, 2018.