“Picking a millionaire at random to participate in a training program for low skilled workers, or making an idiot into a PhD may be intriguing thought experiments but are usually neither policy relevant nor feasible” Heckman maintains in this paper.

Until fairly recently, I did not think too much about why someone would estimate and ATT and not ATE and what would that mean. Continuing with Heckman’s example, estimating ATE for a training programme for low skilled workers is unlikely to be meaningful. In essence, one would be comparing training participants (the treated) with those who did not participate — a control group which may include the millionaire and the idiot with a PhD. In other words, estimating the average effect of the treatment for the average — say — American, is not informative if one really aims to assess the benefits of the training programme.

By contrast, the ATT estimation becomes more meaningful I think in such a context, as this point estimate would speak about programme participants rather than the larger population.