StaggeredTripleDifference methodology validation + opt-in Eq-4.14 overall ATT#504
Conversation
…rall ATT
Validates the StaggeredTripleDifference source against Ortiz-Villavicencio &
Sant'Anna (2025, arXiv:2505.09942v3) and promotes the methodology-review row to
Complete. Adds an opt-in Eq-4.14 overall ATT (overall_att_es).
Source:
- New _se_from_psi helper factored from _compute_aggregated_se_with_wif
(survey/replicate/simple variance dispatch), reused for the overall SE.
- _aggregate_event_study stashes self._event_study_overall (mirroring
_event_study_vcov; no return-signature change, CallawaySantAnna unaffected):
unweighted mean of post-treatment ES(e) over e >= -anticipation.
- StaggeredTripleDiffResults gains overall_att_es/_se_es/_t_stat_es/_p_value_es/
_conf_int_es (populated only under aggregate in {event_study, all}); rendered
in summary() and to_dict(). Default overall_att unchanged.
- Bootstrap parity: per-draw mean of post-treatment ES(e) draws; cluster-
unidentified NaN guard mirrored for the new scalars.
Methodology docs (REGISTRY ## StaggeredTripleDifference):
- Formalize the previously-unlabeled overall-aggregation prose under a Note
documenting both overalls (default CS-simple vs opt-in Eq-4.14).
- Consolidate the duplicate aggregation-weight deviation; fix the P(G=g) vs R
P(S=g) mislabel.
Tests:
- Paper-equation-anchored Verified Components (Thm 4.1/Eq 4.5, Eq 4.1, Eqs
4.11-4.12, Eq 4.13, Eq 4.14/Cor 4.2) + overall_att_es R cross-validation +
bootstrap/survey cross-surface coverage.
Tracker/refs: METHODOLOGY_REVIEW.md row -> Complete with Verified Components / R
Comparison Results; priority queue pruned; references.rst pinned to v3; CHANGELOG.
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
|
Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…te=all / cluster-unidentified coverage CI Codex review findings on PR #504: - P2: the Eq. 4.14 overall (overall_att_es) terminal warning re-derived its trigger from the post-bootstrap state, so a bootstrap that NaN'd overall_se_es for unrelated reasons (e.g. a single-PSU/cluster-unidentified survey design) was misdiagnosed as an analytical non-finite influence function. Gate the warning on whether the ANALYTICAL SE was non-finite (captured before the bootstrap override), and broaden the message to name both causes (non-finite IF or unidentified variance). The bootstrap path already emits its own authoritative "variance unidentified" warning. - P3 (tests): add a direct regression that aggregate="all" populates overall_att_es and matches the aggregate="event_study" value/inference bit-for-bit on the same data. - Add a single-PSU cluster-unidentified bootstrap regression: overall_att_es keeps its point estimate, SE+inference are NaN-consistent, and the Eq. 4.14 warning (when emitted) is cause-accurate. No numeric output changes; inference fields remain NaN-consistent in all paths. Affected suites pass, incl. the shared-mixin CallawaySantAnna regression guard. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Highest remaining item is P3 informational tech debt already tracked in Executive Summary
Methodology No findings. The new Code Quality No findings. The prior P2 warning issue is fixed: the Eq. 4.14 warning is now keyed to an analytical non-finite SE captured before bootstrap override, so bootstrap-side NaNs are not mislabeled as influence-function failures. References: Performance No findings. Maintainability No findings. Tech Debt P3 — Impact: remaining StaggeredTripleDifference R-parity follow-ups are still tracked in Security No findings. Documentation/Tests No findings. The prior |
Summary
diff_diffsource against Ortiz-Villavicencio & Sant'Anna (2025, arXiv:2505.09942v3); PR-A (StaggeredTripleDifference PR-A: Ortiz-Villavicencio & Sant'Anna (2025) paper review #499) added the paper review on file, this PR validates the implementation against it.overall_att_es(paper Eq. 4.14 overall) — the unweighted mean of the post-treatment event-study effects ES(e) — onStaggeredTripleDiffResults(withoverall_se_es/overall_t_stat_es/overall_p_value_es/overall_conf_int_es), populated only underaggregate="event_study"/"all". The defaultoverall_att(Callaway-Sant'Anna simple post-treatment average, the library-wide convention) is unchanged. Computed via a side-channel stash on the sharedCallawaySantAnnaAggregationMixin._aggregate_event_study(no return-signature change; CallawaySantAnna unaffected), over post-treatmente >= -anticipation. Analytical SE = the influence function of the mean (per-event-time combined IFs averaged, routed through the same survey-aware variance estimator as the per-e effects via a new_se_from_psihelper); a multiplier-bootstrap SE replaces it undern_bootstrap>0.**Note:**documenting both overalls; consolidated the duplicate aggregation-weight deviation and fixed aP(G=g)vs RP(S=g)mislabel.rel_periods(balance_e-emptied event studies);overall_att_esuses its own replicate-weight effective df.docs/references.rstpinned to arXiv:2505.09942v3; autosummary stubs + CHANGELOG updated.Methodology references
StaggeredTripleDifference(staggered triple-differences / DDD), built on the shared CallawaySantAnna aggregation + multiplier-bootstrap mixins.triplediff::ddd(panel=TRUE)+agg_ddd().docs/methodology/REGISTRY.md## StaggeredTripleDifference, all verified non-masking against the v3 paper:g_c > max(t, base_period) + anticipation(matches the companion Rtriplediff; the paper statesg_c > max(g,t)) — valid cell-by-cell and base-period/anticipation-aware.P(S=g, Q=1)(eligible-treated; matches the paper's Eq. 4.13 whereG_iis finite only forQ=1) vs R'sP(S=g)— drives the larger tolerance on aggregated quantities.wif=NULL).overall_att= CS-simple post-treatment average (library convention); the paper's Eq. 4.14 overall is available opt-in asoverall_att_es.Validation
tests/test_methodology_staggered_triple_diff.py— paper-equation-anchored Verified Components (Theorem 4.1 / Eq. 4.5 RA=IPW=DR identification; Eq. 4.1 three-term DDD decomposition; Eqs. 4.11-4.12 optimal-GMM weight normalization + single-group reduction; Eq. 4.13 event-study cohort-share weighting; Eq. 4.14 / Cor. 4.2 overall),overall_att_esR cross-validation,balance_e-empties-event-study fail-soft, and the aggregation return-contract arity.tests/test_survey_staggered_ddd.py—overall_att_esunder survey weighting (uniform-equivalence, nontrivial-weights-change-SE, full design, replicate weights).tests/test_staggered_triple_diff.py— result-surface smoke (summary()/to_dict()expose the new fields;Nonefor the default fit).triplediff::ddd(panel=TRUE)+agg_ddd()— group-time ATT(g,t) exact, SE within 1%; Eq. 4.14 overall within 10% (ATT) / 3% (SE). CSV fixtures gitignored / regenerated on-the-fly frombenchmarks/R/benchmark_staggered_triplediff.R; JSON golden committed. CallawaySantAnna + StaggeredDiD suites pass (shared-mixin regression guard).Security / privacy
🤖 Generated with Claude Code