Exploration-Exploitation in a Contextual Multi-Armed Bandit Task

Eric Schulz, University College London, London, London, United Kingdom
Emmanouil Konstantinidis, Carnegie Mellon University
Maarten Speekenbrink, University College London

Abstract

We introduce the Contextual Multi-Armed Bandit task as a method to assess decision making in uncertain environments and test how participants behave in this task. Within an experimental paradigm named ``Mining in Space'', participants see 4 different planets that are described by 3 different binary elements (the context) and then have to decide on which planet they want to mine (which arm to play). We find that participants adapt their decisions to the context well and can best be described by a Contextual Gaussian Process algorithm that probability matches according to expected outcomes. We conclude that humans are well-adapted to contextualized bandit problems even in potentially non-stationary environments through probability matching, a heuristic that used to be described as biased behavior. We argue that Contextual Bandit problems can provide further insight into how people make decisions in real world scenarios.

The Paper: Exploration-Exploitation in a Contextual Multi-Armed Bandit Task

Back