I have a new paper on arXiv (link) that proposes a novel machine learning estimator for difference-in-differences with staggered adoptions, fused extended two-way fixed effects (FETWFE). Its main advantage over existing methods is that it is more efficient. Unlike existing methods, it leverages our knowledge that treatment effects for nearby times are likely to be equal in order to improve efficiency.
The canonical two-way fixed effects estimator for difference-in-differences is biased under staggered adoptions. One fix is to use more parameters–estimate a separate treatment effect for each cohort at each time. But treating each treatment effect like a free parameter wastes our knowledge that nearby treatment effects in time are likely similar. A natural idea is to restrict some of these parameters to be equal. For example, we can fit a model under the assumption that the treatment effects within each cohort are equal across time, or that each cohort has the same treatment effect at each time since treatment started.
But this gets tricky. If we don’t actually know what the right restrictions are, and we just guess them, we could be reintroducing the bias that adding the parameters removed. On the other hand, if we select too few restrictions, our estimator is needlessly inefficient.
Even if we’re not willing to commit to specific restrictions, we have information about the structure of restrictions that a model with totally free parameters for each treatment effect isn’t using. For example, we know the treatment effects within a cohort at times t and t – 1 are likely to be similar.
My new estimator, fused extended two-way fixed effects, is a bridge-penalized version of Wooldridge’s extended two-way fixed effects estimator. My estimator is closely related to the fused lasso: I regularize the differences between pairs of treatment effects that are likely to be close together.
The below figure from my paper shows my penalty structure. Treatment effects within cohorts are penalized towards each other, and the first treatment effect within each cohort is penalized towards the first treatment effect in the previous cohort. FETWFE automatically selects restrictions and estimates treatment effects in one step.
FETWFE automatically selects the correct restrictions with probability tending to 1, consistently estimates several classes of treatment effects, and is asymptotically normal for heterogeneous average treatment effects. If you don’t want to split data, I prove asymptotic subgaussianity and provide a conservative variance estimator.
Check out the paper on arXiv, and feel free to reach out if you have any questions or comments!