We study bandit problems in which a decision-maker gets reward-or-failure feedback when choosing repeatedly between two alternatives, with fixed but unknown reward rates, over a short sequence of trials. We collect data across a number of types of bandit problems to analyze five heuristicsfour seminal heuristics from machine learning, and one new model we developas models of human and optimal decision-making. We find that the new heuristic, known as t-switch, which assumes a latent search state is followed by a latent stand state to control decision-making on key trials, is best able to mimic optimal decision-making, and best account for the decision-making of the majority of our experimental participants. We show how these results allow human and optimal decision-making to be characterized and compared in simple, psychologically interpretable ways, and discuss some theoretical and practical implications.