Using Heuristic Models to Understand Human and Optimal Decision-Making on Bandit Problems


We study bandit problems in which a decision-maker gets reward-or-failure feedback when choosing repeatedly between two alternatives, with fixed but unknown reward rates, over a short sequence of trials. We collect data across a number of types of bandit problems to analyze five heuristics—four seminal heuristics from machine learning, and one new model we develop—as models of human and optimal decision-making. We find that the new heuristic, known as t-switch, which assumes a latent search state is followed by a latent stand state to control decision-making on key trials, is best able to mimic optimal decision-making, and best account for the decision-making of the majority of our experimental participants. We show how these results allow human and optimal decision-making to be characterized and compared in simple, psychologically interpretable ways, and discuss some theoretical and practical implications.

Back to Table of Contents