Learning a complex skill requires traversing a potentially enormous search space. While reinforcement learning (RL) algorithms can approach human levels of performance in complex tasks, they require much more training to do so. This may be because humans constrain search problems with prior knowledge, allowing them to more rapidly discover solutions. Importantly, the statistics that underlie this learning process are both poorly understood and hard to investigate in the large state spaces found in most complex tasks. To address this, we have designed a parametric variation of the traveling salesman problem, a standard NP-hard search problem. By changing the number and arrangement of targets, we densely sample all instances of the problem, and the dilemmas that ensue. I will present two versions of the task in which subjects experience different levels of time pressure, sensorimotor noise, and tolerance for suboptimality. In each case, we find that subjects rapidly learn to solve the task by arriving at a strategy that exploits task invariances and generalizes to previously unencountered stimuli. However, as task dimensionality increases, this strategy becomes increasingly suboptimal, and is discarded only after extensive failure. In contrast, model-free RL agents initially learn the task slowly, unable to generalize, but subsequently avoid getting stuck with suboptimal solutions. Thus, human and RL agents improve in complementary ways, respectively showing flexibility in either the application or modification of their policy. Releasing this task as a mobile app, we now aim to model variability and learning in humans’ plan construction in high-resolution.