Skip to content

PowerAnalysis methodology review (PR-B): Burlig Eq.2 equicorrelated panel variance + tracker Complete#512

Merged
igerber merged 1 commit into
mainfrom
feature/power-analysis-source-validation
May 31, 2026
Merged

PowerAnalysis methodology review (PR-B): Burlig Eq.2 equicorrelated panel variance + tracker Complete#512
igerber merged 1 commit into
mainfrom
feature/power-analysis-source-validation

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 31, 2026

Summary

  • PowerAnalysis methodology-review (PR-B): corrects the analytical panel-DiD variance. Replaces the Moulton (1+(T−1)ρ)/T design-effect factor — wrong two ways versus the source (period-scaling ~4× too small at ρ=0, m=r=5, and the opposite ρ-sign) — with the within-unit equicorrelated special case of Burlig, Preonas & Woerman (2020) Eq. 2: Var(ATT) = σ²(1/n_T+1/n_C)(1/m+1/r)(1−ρ). Within-unit (serial) correlation now lowers the MDE (the DiD cancels the shared within-unit component). The 2×2 path is the m=r=1 case (footnote 11), so (1−ρ) applies there too; ρ=0 recovers Bloom's 2σ². The MDE multiplier stays the normal-z Bloom multiplier (documented deviation from Burlig's t).
  • Input validation for all designs, enforced before the 2×2-vs-panel router: n_pre≥1, n_post≥1, ρ∈[−1/(T−1),1), σ≥0 (finite), positive group counts, treat_frac∈(0,1) — each raises ValueError (previously invalid two-period shapes / out-of-range ρ fell through silently).
  • Docs/bookkeeping: REGISTRY ## PowerAnalysis equation block + both reference surfaces + checklist rewritten; references.rst adds McKenzie (2012) + Frison & Pocock (1992) as the equicorrelated lineage; tutorial 06_power_analysis.ipynb corrected; METHODOLOGY_REVIEW.md row → Complete + queue pruned; TODO.md row removed; CHANGELOG.md. PR-A under-review Notes removed across REGISTRY / power.py / references.rst.

Methodology references (required if estimator / math changes)

  • Method name(s): PowerAnalysis (analytical MDE / power / sample-size; compute_* wrappers)
  • Paper / source link(s): Bloom (1995), Evaluation Review 19(5):547-556, https://doi.org/10.1177/0193841X9501900504 ; Burlig, Preonas & Woerman (2020), JDE 144:102458, https://doi.org/10.1016/j.jdeveco.2020.102458 ; lineage McKenzie (2012) + Frison & Pocock (1992). Source audits: docs/methodology/papers/bloom-1995-review.md, burlig-preonas-woerman-2020-review.md.
  • Any intentional deviations from the source (and why): (1) the MDE multiplier uses the normal (z) distribution following Bloom — a deliberate large-sample approximation to Burlig Eq.1's t — labelled **Deviation from R:** in REGISTRY (parity reference is normal-based, not pwr.t.test). (2) Only the equicorrelated special case of Burlig Eq.2 is implemented (single ρ); the fully general serial-correlation-robust form (independent ψ^B/ψ^A/ψ^X) is not, and is documented as such.

Validation

  • Tests added/updated: NEW tests/test_methodology_power.py (Bloom Table 1 multipliers; 2×2 + panel closed forms; literal-equicorrelated Monte-Carlo validation of the panel variance; sample_sizemde round-trip; input-guard + ρ-at-T=2 + compute_* wrapper validation; base-R qnorm parity). tests/test_power.py: inverted test_icc_effect + test_extreme_icc to Burlig's sign. R parity infra: NEW benchmarks/R/generate_power_golden.R + benchmarks/data/r_power_golden.json (9 fixtures incl. a 2×2 ρ>0 fixture).
  • Backtest / simulation / notebook evidence: docs/tutorials/06_power_analysis.ipynb corrected (ρ cells + summary), nbmake-verified; pytest tests/test_methodology_power.py tests/test_power.py = 227 passed locally (41 methodology + 186 unit).

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

Overall Assessment

⚠️ Needs changes

Executive Summary

  • Affected method: PowerAnalysis.sample_size / compute_sample_size analytical 2×2 path.
  • The implementation, methodology tests, and new R goldens all apply the equicorrelated (1−ρ) factor when n_pre = n_post = 1.
  • The authoritative methodology contract still omits that factor in the basic 2×2 required-N formula at docs/methodology/REGISTRY.md:L3262-L3266, so the PR leaves a source-facing methodology mismatch unresolved.
  • I did not find additional P1+ issues in the changed code paths; the new variance formula, routing, and validation logic otherwise line up with the Burlig-equicorrelated special case documented in the paper reviews.
  • Review is based on static inspection; I could not run pytest locally because this environment does not have pytest or numpy.

Methodology

  • P1 Impact: docs/methodology/REGISTRY.md:L3244-L3246 correctly says the 2×2 path is the m=r=1 special case with (1−ρ), but the sample-size equation immediately below drops that factor in the basic 2×2 branch at docs/methodology/REGISTRY.md:L3265-L3266. The code does include (1−ρ) for 2×2 required N at diff_diff/power.py:L1614-L1624, and the new tests/goldens assert that behavior at tests/test_methodology_power.py:L129-L138 and tests/test_methodology_power.py:L493-L502. That leaves the registry internally inconsistent on the exact methodological change this PR makes, and it also makes the “sample-size corrected” claims in METHODOLOGY_REVIEW.md:L1249-L1251 and CHANGELOG.md:L12 inaccurate. Concrete fix: change the basic 2×2 formula to n = 2 M² σ² (1−ρ) / (δ² f(1−f)), or explicitly back out the new 2×2 ρ behavior everywhere else if that was not intended.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • No additional findings beyond the registry mismatch above.
  • Reviewer note: test execution was not possible in this environment because pytest and numpy are unavailable.

Path to Approval

  1. Fix the 2×2 sample-size equation in docs/methodology/REGISTRY.md:L3265-L3266 so it matches the implemented and tested (1−ρ) behavior for PowerAnalysis.sample_size / compute_sample_size.

@igerber igerber force-pushed the feature/power-analysis-source-validation branch from 0032b8d to 2fde00f Compare May 31, 2026 20:24
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 2fde00f727d7eba509fd4e4c5c432517eae51d5a


Overall Assessment

✅ Looks good

Executive Summary

  • Affected method: PowerAnalysis analytical mde / power / sample_size plus the compute_* wrappers in diff_diff/power.py.
  • The prior re-review blocker is resolved: the REGISTRY.md 2×2 sample-size equation now includes the (1-ρ) factor and matches the implementation and new methodology tests at docs/methodology/REGISTRY.md:L3262-L3267, diff_diff/power.py:L1614-L1636, and tests/test_methodology_power.py:L493-L502.
  • The variance rewrite matches the documented equicorrelated special case of Burlig Eq. 2, and the remaining departures from the paper (z instead of t, equicorrelated-only rather than full SCR) are explicitly labeled in the registry, so they are documented deviations rather than defects at docs/methodology/REGISTRY.md:L3223-L3229 and docs/methodology/REGISTRY.md:L3248-L3254.
  • The new validation closes the earlier silent fall-through of invalid two-period inputs into basic_did, and that validation propagates through the public wrappers at diff_diff/power.py:L1280-L1303, diff_diff/power.py:L1361-L1370, diff_diff/power.py:L1601-L1606, and tests/test_methodology_power.py:L304-L409.
  • I did not find any new unmitigated P0/P1 issues in the changed files. Review is static only because pytest is not installed in this environment.

Methodology

  • P3 Informational — Impact: the methodology contract is now internally consistent across source audit, registry, code, and tests for the changed PowerAnalysis path, including the 2×2 (1-ρ) behavior and the panel (1/m + 1/r)(1-ρ) variance at diff_diff/power.py:L1222-L1243, diff_diff/power.py:L1371-L1395, docs/methodology/REGISTRY.md:L3231-L3267, docs/methodology/papers/bloom-1995-review.md:L122-L154, and docs/methodology/papers/burlig-preonas-woerman-2020-review.md:L125-L151. Concrete fix: none.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • P3 Informational — Impact: the changed documentation and bookkeeping surfaces are aligned with the implemented behavior, including the completed tracker entry and updated reference notes at METHODOLOGY_REVIEW.md:L1238-L1257 and docs/references.rst:L246-L258. Concrete fix: none.
  • P3 Informational — Impact: I could not execute tests/test_methodology_power.py or tests/test_power.py locally because pytest is unavailable here, so this review relies on static inspection rather than runtime verification. Concrete fix: none required for the PR; rely on CI/local test execution.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 31, 2026
… Eq.2 equicorrelated; tracker -> Complete

Reconciles diff_diff/power.py with the Bloom (1995) + Burlig, Preonas & Woerman
(2020) source audits (paper reviews added in PR-A #506). Behavior change: the
analytical panel-DiD variance was the Moulton (1+(T-1)rho)/T factor (wrong
period-scaling, ~4x too small at rho=0/m=r=5, AND opposite rho-sign). Replaced
with the within-unit equicorrelated special case of Burlig Eq. 2,
sigma^2 (1/n_T+1/n_C)(1/m+1/r)(1-rho), so within-unit correlation now LOWERS the
MDE. The MDE multiplier stays the normal-z Bloom multiplier (documented as a
deliberate large-sample approximation to Burlig's t).

- power.py: equicorrelated variance in _compute_variance + _compute_required_n;
  input validation for ALL designs (n_pre>=1, n_post>=1, rho in [-1/(T-1), 1))
  enforced BEFORE the 2x2-vs-panel router, so invalid two-period shapes no longer
  fall through silently; the (1-rho) factor applies at T=2 too (Burlig footnote
  11, the m=r=1 case), so rho is never silently ignored and rho=0 recovers Bloom's
  2*sigma^2; docstrings rewritten; PR-A under-review notes removed.
- REGISTRY ## PowerAnalysis equation block rewritten (z not t; unified
  equicorrelated SE with the 2x2 as the m=r=1 special case; cluster-m and
  inverted-R^2 terms removed; both reference surfaces; checklist ticked).
- New tests/test_methodology_power.py (Bloom Table 1; 2x2 + panel closed forms;
  literal-equicorrelated Monte-Carlo; sample_size<->mde round-trip; input-guard +
  rho-at-T=2 + compute_* wrapper validation; base-R qnorm parity).
- benchmarks/R/generate_power_golden.R + benchmarks/data/r_power_golden.json.
- tests/test_power.py: inverted test_icc_effect + test_extreme_icc to Burlig's sign.
- references.rst: + Frison & Pocock (1992), McKenzie (2012) lineage.
- docs/tutorials/06_power_analysis.ipynb: corrected rho cells + summary.
- METHODOLOGY_REVIEW.md row -> Complete; TODO row removed; CHANGELOG.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@igerber igerber force-pushed the feature/power-analysis-source-validation branch from 2fde00f to bb63f49 Compare May 31, 2026 21:16
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: bb63f49031538bdb947c1cce918877f01651b7d7


Overall Assessment

✅ Looks good

Executive Summary

  • Affected surface: PowerAnalysis analytical mde / power / sample_size and the compute_* wrappers in diff_diff/power.py.
  • Re-review check: the prior methodology issue remains resolved. Code, registry, and methodology tests now consistently use Bloom’s normal-z multiplier plus the Burlig Eq. 2 equicorrelated special case, including the T=2 (1-ρ) path at diff_diff/power.py:L1222-L1243, diff_diff/power.py:L1362-L1395, diff_diff/power.py:L1615-L1637, docs/methodology/REGISTRY.md:L3230-L3280, and tests/test_methodology_power.py:L129-L250.
  • The new validation closes the earlier silent fall-through of invalid two-period inputs into basic_did, and that protection reaches the public wrappers at diff_diff/power.py:L1282-L1304, diff_diff/power.py:L1362-L1366, diff_diff/power.py:L1602-L1607, and tests/test_methodology_power.py:L304-L409.
  • Remaining departures from the source material are explicitly documented: Bloom-style z critical values instead of Burlig’s t, and equicorrelated-only support instead of the full SCR covariance structure, so these are not defects under the registry rules at docs/methodology/REGISTRY.md:L3237-L3243, docs/methodology/REGISTRY.md:L3262-L3268, and METHODOLOGY_REVIEW.md:L1253-L1257.
  • No unmitigated P0/P1 issues found in the changed files. This is a static re-review only; I could not execute tests locally because the environment is missing numpy, so imports fail before pytest.

Methodology

  • P3 Informational — Impact: the estimator contract is now aligned with the source-audit conclusions. Bloom’s review supports the normal-z multiplier, and Burlig’s review supports replacing the old Moulton factor with the equicorrelated Eq. 2 special case; the implementation, registry, and methodology tests all match that contract now. Concrete fix: none. docs/methodology/papers/bloom-1995-review.md:L122-L147, docs/methodology/papers/burlig-preonas-woerman-2020-review.md:L125-L151, diff_diff/power.py:L1222-L1243, diff_diff/power.py:L1362-L1395, docs/methodology/REGISTRY.md:L3228-L3280, tests/test_methodology_power.py:L119-L250
  • P3 Informational — Impact: the remaining methodological deviations are properly disclosed rather than silent: z instead of t, and equicorrelated-only rather than full SCR. Concrete fix: none. docs/methodology/REGISTRY.md:L3237-L3243, docs/methodology/REGISTRY.md:L3262-L3268, METHODOLOGY_REVIEW.md:L1253-L1257

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • P3 Informational — Impact: two small wording leftovers still describe rho with ICC / intra-cluster language, which slightly blurs the now-important distinction between within-unit serial equicorrelation (rho) and survey/design inflation (deff). Concrete fix: rename the remaining ICC / intra-cluster mentions to within-unit equicorrelation or serial equicorrelation. docs/tutorials/06_power_analysis.ipynb:L7-L7, docs/methodology/REGISTRY.md:L3296-L3296
  • P3 Informational — Impact: coverage on the changed path is materially improved: closed-form checks, T=2 routing/validation, Monte Carlo validation, and base-R parity all landed in the PR. Concrete fix: none. tests/test_methodology_power.py:L1-L49, tests/test_methodology_power.py:L129-L250, tests/test_methodology_power.py:L304-L502, tests/test_power.py:L170-L185, tests/test_power.py:L296-L315

@igerber igerber merged commit 2cc3962 into main May 31, 2026
26 checks passed
@igerber igerber deleted the feature/power-analysis-source-validation branch May 31, 2026 22:46
@igerber igerber mentioned this pull request Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant