The gorilla problem | The Enlightened Economist

I was so keen to read Stuart Russell’s new book on AI, Human Compatible: AI and the Problem of Control, that I ordered it three times over. I’m not disappointed (although I returend two copies). It’s a very interesting and thoughtful book, and has some important implications for welfare economics – an area of the discipline in great need of being revisited, after years – decades – of a general lack of interest.

The book’s theme is how to engineer AI that can be guaranteed to serve human interests, rather than taking control and serving the specific interests programmed into its objective functions and rewards. The control problem is much-debated in the AI literature, in various forms. AI systems aim to achieve a specified objective given what they perceive from data inputs including through sensors. How to control them is becoming an urgent challenge – as the book points out, by 2008 there were more objects than people connected to the internet, giving AI systems ever more extensive access, input and output, to the real world. The potential of AI is their scope to communicate – machines can do better than any number n of humans because they access information that isn’t kept in n separate brains and communicated imperfectly between them. Humans have to spend a ton of time in meetings, machines don’t.

Russell argues that the AI community has been too slow to face up to the probability that machines as currently designed will gain control over humans – keep us at best as pets and at worst create a hostile environment for us, driving us slowly extinct, as we have gorillas (hence the gorilla problem). Some of the solutions proposed by those recognising the problem have been bizarre, such as ‘neural lace’ that permanently connects the human cortex to machines. As the book comments: “If humans need brain surgery merely to survive the threat posed by their own technology, perhaps we’ve made a mistake somewhere along the line.”

He proposes instead three principles to be adopted by the AI community:

the machine’s only objective is to maximise the realisation of human preferences
the machine is initially uncertain about what those preferences are
the ultimate source of information about human preferences is human behaviour.

He notes that AI systems embed uncertainty except in the objective they are set to maximise. The utility function and cost/reward/loss function are assumed to be perfectly known. This is an approach shared of course with economics. There is a great need to study planning and decision making with partial and uncertain information about preferences, Russell argues. There are also difficult social welfare questions. It’s one thing to think about an AI system deciding for an individual but what about groups? Utilitarianism has some well-known issues, much chewed over in social choice theory. But here we are asking AI systems to (implicitly) aggregate over individuals and make interpersonal comparisons. As I noted in my inaugural lecture, we’ve created AI systems that are homo economicus on steroids and it’s far from obvious this is a good idea. In a forthcoming paper with some of my favourite computer scientists, we look at the implications of the use of AI in public decisions for social choice and politics. The principles also require being able to teach machines (and ourselves) a lot about the links between human behaviour, the decision environment and underlying preferences. I’ll need to think about it some more, but these principles seem a good foundation for developing AI systems that serve human purposes in a fundamental way, rather than in a short-term instrumental one.

Anyway, Human Compatible is a terrific book. It doesn’t need any technical knowledge and indeed has appendices that are good explainers of some of the technical stuff. I also like it that the book is rather optimistic, even about the geopolitical AI arms race: “Human-level AI is not a zero sum game and nothing is lost by sharing it. On the other hand, competing to be the first to achieve human level AI without first solving the control problem is a negative sum game. The payoff for everyone is minus infinity.” It’s clear that to solve the control problem cognitive and social scientists and the AI community need to start talking to each other a lot – and soon – if we’re to escape the fate gorillas suffered at our hands.

Human Compatible by Stuart Russell