Designing More Informative Tests: Separating Execution from Recognition -- by Andrew Caplin, Leo Zhu
Tests are widely used to measure ability, yet performance on a test often reflects more than the ability to execute assigned tasks. It also reflects the ability to recognize which tasks are worth attempting, how they should be prioritized, and how effort should be allocated under uncertainty. This paper studies how tests can be designed to separate these capabilities. We model a test as a sequential decision problem. Tasks differ in difficulty, their ordering is uncertain, and examinees may acquire costly information about that ordering before choosing how to proceed. The testing environment is the informational structure surrounding the realized test: in particular, the examinee's beliefs about how task difficulty has been arranged. Performance is therefore generated by an optimal recognition–execution policy, not by execution skill alone. The analysis delivers two negative results. First, even in the simplest two-task environment, a single score exhibits dimensional collapse: distinct combinations of execution skill and recognition capability generate identical expected scores. Second, with three tasks, the relationship between capabilities and scores becomes environment-dependent: changing beliefs about task ordering can change which actions are considered and how capabilities translate into performance. These results imply that standard scores are not generally informative enough to separate the capabilities that generate performance. This matters because scores are used to summarize what individuals can do and to guide downstream decisions about placement, training, and instruction. If a test does not separately reveal execution and recognition, it provides limited guidance about which capability is strong, which is weak, and where improvement should be directed. We then show how more informative tests can be designed. Under a simple communicability constraint, two canonical environments—ordered and randomized tests—induce distinct relationships between capabil
Read on NBER Education