AI fashions want to be ‘interpretable’ relatively than simply ‘explainable’

Last November, Apple ran into trouble after customers pointed out on Twitter that its credit card service was discriminating against women. David Heinemeir Hansson, the creator of Ruby on Rails, called Apple Card a sexist program. “Apple’s black box algorithm thinks I deserve 20x the credit limit [my wife] does,” he tweeted.

The success of deep learning in the past decade has increased interest in the field of artificial intelligence. But the rising popularity of AI has also highlighted some of the key problems of the field, including the “black box problem,” the challenge of making sense of the way complex machine learning algorithms make decisions. The Apple Card disaster is one of many manifestations of the black-box problem coming to light in the past years.

The increased attention to black-box machine learning has given rise to a body of research on explainable AI. And a lot of the work done in the field involves developing techniques that try to explain the decision made by a machine learning algorithm without breaking open the black box. But explaining AI decisions after they happen can have dangerous implications, argues Cynthia Rudin, professor of computer science at Duke University, in a paper published in the Nature Machine Intelligence journal.

[Read: How to trick deep learning algorithms into doing new things]

“Rather than trying to create models that are inherently interpretable, there has been a recent explosion of work on ‘explainable ML’, where a second (post hoc) model is created to explain the first black box model. This is problematic. Explanations are often not reliable,” Rudin writes. and can be misleading, as we discuss below.

Such practices can “potentially cause great harm to society,” Rudin warns, especially in critical domains such as healthcare and criminal justice.

Instead, developers should opt for AI models that are “inherently interpretable” and “provide their own explanations” Rudin discusses in her paper. And contrary to what some AI researchers believe, in many cases, interpretable models can produce results that are just as accurate as black-box deep learning algorithms.

Two types of black-box AI

Like many things involving artificial intelligence, there’s a bit of confusion surrounding the black-box problem. Rudin differentiates between two types of black-box AI systems: functions that are too complicated for any human to comprehend, and functions that are proprietary.

The first kind of black-box AI includes deep neural networks, the architecture used in deep learning algorithms. DNNs are composed of layers upon layers of interconnected variables that become tuned as the network is trained on numerous examples. As neural networks grow larger and larger, it becomes virtually impossible to trace how their millions (and sometimes, billions) of parameters combine to make decisions. Even when AI engineers have access to those parameters, they won’t be able to precisely deconstruct the decisions of the neural network.

deep neural networks