PowerAnalysis methodology review (PR-A): Bloom 1995 + Burlig 2020 source audits by igerber · Pull Request #506 · igerber/diff-diff

igerber · 2026-05-31T15:17:25Z

Summary

PR-A (paper-review) of the PowerAnalysis methodology validation — the Step-1 fidelity artifact of the locked 2-PR sequence. Adds Bloom (1995) and Burlig, Preonas & Woerman (2020) source audits under docs/methodology/papers/, sourced only from the papers.
Documents (under review — does NOT yet fix) four discrepancies between the authoritative PowerAnalysis surfaces and the source material; reconciliation is deferred to PR-B and tracked in TODO.md.
No estimator logic, formula, or test changes. (One class-docstring panel-variance display was corrected from a self-canceling factor to the form _compute_variance already implements — docstring only.)

Methodology references (required if estimator / math changes)

Method name(s): PowerAnalysis (analytical MDE / power / sample size; simulation-based power)
Paper / source link(s): Bloom (1995) https://doi.org/10.1177/0193841X9501900504 ; Burlig, Preonas & Woerman (2020) https://doi.org/10.1016/j.jdeveco.2020.102458 (reviewed via the open NBER WP 26250)
Any intentional deviations from the source (and why): This PR documents (does not resolve) the deviations, via a reviewer-recognized REGISTRY **Note:** + power.py docstring notes + a references.rst clarification, with reconciliation tracked in TODO.md for PR-B:
1. the analytical path uses the normal (z) approximation (faithful to Bloom, who derives the MDE multiplier from the normal), while Burlig's Eq. 1 uses the t distribution;
2. the REGISTRY SE formula's 1/√(1−R²) and cluster-size m terms are not what power.py implements;
3. the displayed sample-size formula omits the T(1−T) allocation factor the code applies;
4. the panel (1+(T−1)ρ)/T factor is an equicorrelated/Moulton design effect, not Burlig's serial-correlation-robust (SCR) variance (Eq. 2) — an attribution overclaim pending PR-B re-attribution or implementation.

Validation

Tests added/updated: None — this is a documentation-only (paper-review) PR. Source validation, equation-anchored methodology tests, and R parity (pcpanel for the panel path; a normal-based reference for the analytical path) are deferred to PR-B.
Backtest / simulation / notebook evidence (if applicable): N/A

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

github-actions · 2026-05-31T15:21:51Z

Overall Assessment

✅ Looks good

Executive Summary

No executable estimator or inference logic changed; the diff is limited to documentation, docstrings, and a tracked follow-up in TODO.md.
The PR correctly turns four pre-existing PowerAnalysis mismatches into documented, tracked deviations in docs/methodology/REGISTRY.md:3207-3220, diff_diff/power.py:1232-1236, docs/references.rst:246-248, and TODO.md:77.
The corrected panel-variance docstring now matches the implemented formula in _compute_variance() at diff_diff/power.py:1226-1228 and diff_diff/power.py:1331-1343.
No new empty-result, NaN/Inf inference, control-group, or parameter-propagation risk is introduced because no estimator code path changed.
One residual documentation inconsistency remains: the REGISTRY R-equivalents table still points PowerAnalysis to pwr.t.test(), which conflicts with the new Bloom audit and the code’s normal-based critical values.

Methodology

Severity: P3-informational. Impact: The PR documents the existing PowerAnalysis deviations instead of leaving them implicit; under the review policy, these are mitigated because they are now explicitly noted in the Methodology Registry and tracked in TODO.md. Concrete fix: None required in this PR; reconcile or re-attribute in PR-B. References: docs/methodology/REGISTRY.md:3207-3220, diff_diff/power.py:1232-1236, docs/references.rst:246-248, TODO.md:77.

Code Quality

No findings. The only code-file change is documentation text, and the corrected panel formula matches the implementation at diff_diff/power.py:1226-1228 and diff_diff/power.py:1331-1343.

Performance

No findings. No runtime logic changed.

Maintainability

No findings beyond the documentation inconsistency noted below.

Tech Debt

No findings. The only new deferred work introduced by this PR is explicitly tracked at TODO.md:77.

Security

No findings. The diff adds only documentation/TODO content and no new execution or secret-handling surface.

Documentation/Tests

Severity: P2. Impact: docs/methodology/REGISTRY.md:3364 still maps PowerAnalysis to pwr.t.test(), but the new Bloom audit says the analytical path is normal-based and that pwr::pwr.t.test() is not the faithful parity target, consistent with _get_critical_values() using stats.norm.ppf. That leaves the methodology docs internally inconsistent on the recommended analytical reference. Concrete fix: Update the PowerAnalysis row to a normal-based analytical reference or annotate the row as under review until PR-B settles the contract. References: docs/methodology/REGISTRY.md:3364, docs/methodology/papers/bloom-1995-review.md:128-131, diff_diff/power.py:1273-1280.
Severity: P3-informational. Impact: No automated tests changed. That is acceptable for a docs-only PR, but the new audits do not by themselves add executable assurance for the eventual PR-B reconciliation. Concrete fix: In PR-B, add focused analytical tests for the chosen MDE/sample-size formulas and the selected panel-path parity target. I did not run tests for this review.

github-actions · 2026-05-31T15:31:07Z

🔁 AI review rerun (requested by @igerber)

Head SHA: aa64e2675b635304e55d779da41ba6b782ba5e3b

Overall Assessment

⚠️ Needs changes

Executive Summary

The previous re-review P2 on PowerAnalysis is addressed: the methodology docs now explicitly mark the analytical path as under review, document the four source mismatches, and track the follow-up in TODO.md (docs/methodology/REGISTRY.md:L3207-L3220, docs/methodology/REGISTRY.md:L3364, diff_diff/power.py:L1232-L1236, docs/references.rst:L246-L248, TODO.md:L77-L77).
The diff is not limited to PowerAnalysis documentation; it also rolls back vcov_type="conley" support on SunAbraham and WooldridgeDiD, and rewrites the StackedDiD Conley deferral (diff_diff/sun_abraham.py:L520-L555, diff_diff/wooldridge.py:L429-L441, diff_diff/stacked_did.py:L203-L216).
[Newly identified] SunAbraham.set_params(vcov_type="conley") still accepts the now-forbidden value, so sklearn-style mutation bypasses the new constructor guard and falls into a lower-level Conley code path that this PR removed (diff_diff/sun_abraham.py:L543-L555, diff_diff/sun_abraham.py:L1441-L1450, diff_diff/sun_abraham.py:L2150-L2160).
WooldridgeDiD propagates the rollback correctly by revalidating vcov_type in set_params(), so the unmitigated contract regression is specific to SunAbraham (diff_diff/wooldridge.py:L429-L441, diff_diff/wooldridge.py:L487-L521).
I did not run the test suite in this environment; numpy is unavailable, so this review is from static inspection.

Methodology

Severity: P1 [Newly identified]. Impact: the new SunAbraham variance-family contract is inconsistent across public entry points. __init__ now rejects vcov_type="conley", but set_params() still blindly mutates self.vcov_type without revalidation. Because the old estimator-level Conley front-door and Conley argument threading were removed, a caller can still put the estimator into the forbidden "conley" state via sklearn-style mutation and then reach _fit_saturated_regression() / LinearRegression(vcov_type="conley") without any conley_* inputs. That is exactly the changed-surface propagation problem this repo flags as a P1 anti-pattern. Concrete fix: factor the constructor validation into a shared helper and call it from set_params() before mutation, mirroring WooldridgeDiD.set_params(). Add a regression test that failed mutation leaves vcov_type unchanged. References: diff_diff/sun_abraham.py:L543-L555, diff_diff/sun_abraham.py:L1441-L1450, diff_diff/sun_abraham.py:L2150-L2160, diff_diff/wooldridge.py:L487-L521, tests/test_sun_abraham.py:L1656-L1661.
Severity: P3-informational. Impact: the PowerAnalysis source mismatches are now explicitly documented and tracked, so under the review policy they are mitigated rather than blocking. Concrete fix: none required in this PR; PR-B can reconcile or re-attribute the formulas. References: docs/methodology/REGISTRY.md:L3207-L3220, diff_diff/power.py:L1232-L1236, TODO.md:L77-L77.

Code Quality

No additional findings beyond the SunAbraham.set_params() contract regression above.

Performance

No findings. The estimator changes are removals/deferrals, and the PowerAnalysis changes are documentation-only.

Maintainability

Severity: P3-informational. Impact: the StackedDiD Conley deferral is now described as a simple threading follow-up and “same shape as the SunAbraham conley follow-up,” which drops the prior methodology caveat about duplicated units across stacked sub-experiments at distance 0. The feature is still rejected, so this does not change runtime numbers, but it does understate the unresolved design issue for the eventual follow-up PR. Concrete fix: restore a short methodology caveat in the rejection message / REGISTRY / TODO row instead of framing the work as pure plumbing. References: diff_diff/stacked_did.py:L107-L109, diff_diff/stacked_did.py:L208-L216, docs/methodology/REGISTRY.md:L1473-L1478, TODO.md:L107-L109.

Tech Debt

No new untracked tech debt. The PowerAnalysis discrepancies are properly tracked in TODO.md, which is the correct mitigation for this docs-first audit (TODO.md:L77-L77).

Security

No findings.

Documentation/Tests

No separate blocker beyond the missing SunAbraham.set_params() regression noted above. The changed tests only assert constructor-time rejection for the rolled-back Conley surfaces, not mutation-time rejection (tests/test_sun_abraham.py:L1625-L1661, tests/test_wooldridge.py:L1755-L1757).
I did not execute the test suite in this environment because numpy is unavailable.

Path to Approval

Make SunAbraham.set_params() re-run the same vcov_type validation as __init__ before mutating state, so vcov_type="conley" is rejected consistently across both public entry points.
Add regression tests that SunAbraham().set_params(vcov_type="conley") and SunAbraham().set_params(vcov_type="foo") both raise and leave self.vcov_type unchanged.

…rce audits Step-1 fidelity artifact for the PowerAnalysis methodology validation (the paper-review PR of the 2-PR sequence; source validation + code reconciliation + tests are deferred to PR-B). No estimator logic, formula, or test changes. New source audits under docs/methodology/papers/ (sourced only from the papers): - bloom-1995-review.md: the MDE-multiplier framing — MDE = M*SE with M from the NORMAL distribution (Bloom builds 1.65 + 0.84 = 2.49 from z-quantiles, p.548-549), the cross-sectional impact-estimator SE sigma*sqrt((1-R^2)/(n*T*(1-T))) (Eq. 1/2), the T(1-T) allocation factor (optimal at 50/50), one- and two-sided multipliers; documents that Bloom explicitly excludes clustering/multi-site design effects (Note 1). - burlig-preonas-woerman-2020-review.md: the serial-correlation-robust (SCR) panel-DD variance (Eq. 2, verbatim — three covariance terms psi^B/psi^A/psi^X over m pre- and r post-periods, psi^X entering negatively), the McKenzie special case (Eq. 3), the increase-MDE condition (Eq. 4); Eq. 1 uses the t-distribution; pcpanel is the panel parity reference. The audits surfaced discrepancies between the authoritative PowerAnalysis surfaces and the source material. Per the agreed approach these are DOCUMENTED as under-review now (not yet fixed — reconciliation is deferred to PR-B and tracked in TODO.md): - REGISTRY.md ## PowerAnalysis: umbrella **Note:** enumerating the four discrepancies (t-vs-normal-z multiplier; SE R^2/cluster-m terms; missing T(1-T) allocation factor in the displayed sample-size formula; panel (1+(T-1)rho)/T is an equicorrelated/Moulton design effect, NOT Burlig SCR — an attribution overclaim). - REGISTRY.md R-equivalents table: annotate the PowerAnalysis row as under-review (analytical path is normal-based, so pwr.t.test is not the faithful parity target; panel parity ref is Stata pcpanel) — resolves the cross-reference inconsistency the audits introduced. - power.py: docstring notes on PowerAnalysis and simulate_power flagging the panel attribution and normal-vs-t approximation as under review; the class docstring panel-variance display corrected from a self-canceling factor to the implemented (1/N_treated + 1/N_control) * (1+(T-1)rho)/T (docstring-only; no logic change). - references.rst: clarify the analytical panel path uses an equicorrelated approximation, not Burlig's SCR formula. - TODO.md: tracker row (Methodology/Correctness) for the PR-B reconciliation. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

github-actions · 2026-05-31T15:36:51Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2470be69608d9cf018d21fc4aef17deabe444e0e

Overall Assessment

✅ Looks good

Executive Summary

The current diff is scoped to PowerAnalysis documentation, methodology registry text, source-audit notes, and TODO.md; I did not find any estimator logic, weighting, variance, inference, or default-behavior changes in the executable code paths. The only diff_diff/power.py edits are docstring updates at diff_diff/power.py:L1222-L1236 and diff_diff/power.py:L1902-L1915.
Cross-checking the implementation confirms the four documented discrepancies are real: the analytical path uses normal critical values via stats.norm.ppf, the basic DiD variance is 2*sigma^2*(1/n_T+1/n_C), required-N uses the allocation factor treat_frac*(1-treat_frac), and the panel path uses a Moulton-style ((1 + (T-1)*rho) / T) factor rather than Burlig’s SCR variance. See diff_diff/power.py:L1273-L1350 and diff_diff/power.py:L1530-L1579.
Those discrepancies are now explicitly marked with a **Note:** in the authoritative registry and tracked in TODO.md, so under the review policy they are mitigated P3-informational items rather than blockers. See docs/methodology/REGISTRY.md:L3214-L3227 and TODO.md:L75-L77.
The class docstring’s panel-variance display now matches the implementation, and the R-equivalence/reference text now clearly says the analytical panel path is under review rather than presenting Burlig SCR or pwr.t.test() as a faithful parity target. See diff_diff/power.py:L1226-L1228, docs/methodology/REGISTRY.md:L3371-L3371, and docs/references.rst:L246-L248.
The prior AI-review SunAbraham P1 does not carry forward on this head: current code and tests still treat vcov_type="conley" as supported, so that rollback-specific regression is not part of this diff. See diff_diff/sun_abraham.py:L581-L585, diff_diff/sun_abraham.py:L2251-L2262, tests/test_sun_abraham.py:L1657-L1667, diff_diff/wooldridge.py:L447-L450, and tests/test_wooldridge.py:L1755-L1762.

Methodology

Severity: P3-informational. Impact: The PR correctly documents four existing PowerAnalysis source-material mismatches in the authoritative REGISTRY and tracks the follow-up in TODO.md: Bloom/Burlig t vs implemented z, the registry SE expression vs the implemented basic-DiD variance, the missing allocation factor in the displayed sample-size formula, and the analytical panel path’s Moulton approximation vs Burlig SCR. Because these are now explicitly labeled with **Note:** and tracked for PR-B, they are mitigated under the stated review policy, not P1 blockers. Concrete fix: None required in PR-A; reconcile or re-attribute in PR-B. References: docs/methodology/REGISTRY.md:L3214-L3227, TODO.md:L75-L77, docs/methodology/papers/bloom-1995-review.md:L122-L150, docs/methodology/papers/burlig-preonas-woerman-2020-review.md:L125-L150, diff_diff/power.py:L1273-L1350, diff_diff/power.py:L1530-L1579.

Code Quality

No findings. The docstring correction in diff_diff/power.py aligns the displayed panel formula with _compute_variance() instead of introducing new behavioral changes. See diff_diff/power.py:L1226-L1228 and diff_diff/power.py:L1331-L1343.

Performance

No findings. The diff is documentation-only for runtime purposes.

Maintainability

No findings. The new paper-review files are explicitly marked non-authoritative, and the follow-up ownership is clear in TODO.md, which keeps this docs-first PR from silently becoming the final methodology contract. See docs/methodology/papers/bloom-1995-review.md:L13-L18, docs/methodology/papers/burlig-preonas-woerman-2020-review.md:L13-L18, and TODO.md:L75-L77.

Tech Debt

No findings. The deferred reconciliation is properly tracked under “Tech Debt from Code Reviews,” so the remaining methodology cleanup is visible and non-blocking for this PR. See TODO.md:L73-L77.

Security

No findings.

Documentation/Tests

No findings in the changed diff. I did not run tests because this environment is missing pytest and numpy, so this review is based on static inspection only.

… Eq.2 equicorrelated; tracker -> Complete Reconciles diff_diff/power.py with the Bloom (1995) + Burlig, Preonas & Woerman (2020) source audits (paper reviews added in PR-A #506). Behavior change: the analytical panel-DiD variance was the Moulton (1+(T-1)rho)/T factor (wrong period-scaling, ~4x too small at rho=0/m=r=5, AND opposite rho-sign). Replaced with the within-unit equicorrelated special case of Burlig Eq. 2, sigma^2 (1/n_T+1/n_C)(1/m+1/r)(1-rho), so within-unit correlation now LOWERS the MDE. The MDE multiplier stays the normal-z Bloom multiplier (documented as a deliberate large-sample approximation to Burlig's t). - power.py: equicorrelated variance in _compute_variance + _compute_required_n; input validation for ALL designs (n_pre>=1, n_post>=1, rho in [-1/(T-1), 1)) enforced BEFORE the 2x2-vs-panel router, so invalid two-period shapes no longer fall through silently; the (1-rho) factor applies at T=2 too (Burlig footnote 11, the m=r=1 case), so rho is never silently ignored and rho=0 recovers Bloom's 2*sigma^2; docstrings rewritten; PR-A under-review notes removed. - REGISTRY ## PowerAnalysis equation block rewritten (z not t; unified equicorrelated SE with the 2x2 as the m=r=1 special case; cluster-m and inverted-R^2 terms removed; both reference surfaces; checklist ticked). - New tests/test_methodology_power.py (Bloom Table 1; 2x2 + panel closed forms; literal-equicorrelated Monte-Carlo; sample_size<->mde round-trip; input-guard + rho-at-T=2 + compute_* wrapper validation; base-R qnorm parity). - benchmarks/R/generate_power_golden.R + benchmarks/data/r_power_golden.json. - tests/test_power.py: inverted test_icc_effect + test_extreme_icc to Burlig's sign. - references.rst: + Frison & Pocock (1992), McKenzie (2012) lineage. - docs/tutorials/06_power_analysis.ipynb: corrected rho cells + summary. - METHODOLOGY_REVIEW.md row -> Complete; TODO row removed; CHANGELOG. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

igerber force-pushed the feature/power-analysis-paper-review branch from f70b9d6 to aa64e26 Compare May 31, 2026 15:25

igerber force-pushed the feature/power-analysis-paper-review branch from aa64e26 to 2470be6 Compare May 31, 2026 15:32

igerber added the ready-for-ci Triggers CI test workflows label May 31, 2026

igerber merged commit fe20a9a into main May 31, 2026
33 of 34 checks passed

igerber deleted the feature/power-analysis-paper-review branch May 31, 2026 16:55

igerber mentioned this pull request Jun 1, 2026

Bump version to 3.5.0 #516

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PowerAnalysis methodology review (PR-A): Bloom 1995 + Burlig 2020 source audits#506

PowerAnalysis methodology review (PR-A): Bloom 1995 + Burlig 2020 source audits#506
igerber merged 1 commit into
mainfrom
feature/power-analysis-paper-review

igerber commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 31, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant