SyntheticControl: in-space placebo permutation inference + reporting-stack integration#511
Conversation
…stack integration Implements ADH 2010 §2.4 in-space placebo inference for the classic SyntheticControl estimator (PR-1 / #501 shipped the estimator with no analytical SE), and wires SyntheticControlResults into the practitioner / DiagnosticReport / BusinessReport reporting stack. Methodology (diff_diff/synthetic_control.py, synthetic_control_results.py): - SyntheticControlResults.in_space_placebo(): reassigns treatment to each donor, refits a synthetic control for that pseudo-treated donor against the other J-1 donors (the real treated unit is excluded from every placebo pool -- its post-period is treatment-contaminated; matches SCtools::generate.placebos), and ranks the treated unit's post/pre RMSPE ratio among the J+1 units. - New fields placebo_p_value (= rank/(n_placebos+1); an upper-tail rank test on the unsigned RMSPE-ratio statistic, ties via >=), rmspe_ratio (treated statistic, set at fit), n_placebos/n_failed (effective reference-set sizes; non-converged placebos excluded from BOTH numerator and denominator). - placebo_p_value is a SEPARATE field from the always-NaN analytical p_value; it carries no SE/t-stat and does not flow through safe_inference; is_significant stays bound to p_value. - Fail-closed contract: a valid fit requires BOTH inner Frank-Wolfe AND outer-V convergence of the SELECTED incumbent; treated fit fails closed and placebos are excluded on either failure. The Powell polish validates the incumbent only when it converges back AT the incumbent's objective level (np.isclose), never on a success at a strictly-worse point. - Edge cases: scale-aware RMSPE-ratio floor (perfect pre-fit -> finite ratio), J<2 -> NaN+warn, J==2 -> degenerate+coarse warn, deterministic given seed. - get_placebo_df() returns the per-unit RMSPE-ratio table (incl. treated row + failed donors). Placebo compute is opt-in; each fit retains a _SyntheticControlFitSnapshot of the pivoted panel (excluded from pickling). Reporting (diagnostic_report.py, practitioner.py, business_report.py, _reporting_helpers.py): SCM routed like TROP (parallel-trends-free, fit-based). New scm_fit PT analogue (design_enforced_pt verdict reading pre_rmspe), a _scm_native diagnostic block surfacing pre_rmspe + donor-weight concentration + the placebo p-value when already computed (never triggering the refit implicitly), a practitioner _handle_synthetic_control with the placebo as the headline significance step, and a BusinessReport fit-based assumption block with ADH 2010 attribution. Also fixes a latent BR bug where headline is_significant was a non-JSON-serializable numpy bool_ when p_value is numpy NaN. Docs: REGISTRY.md §SyntheticControl (new **Note:** labels for donor-pool construction, failure handling, RMSPE-ratio floor, non-analytical-p-value split), REPORTING.md, docs/api/synthetic_control.rst, LLM guides, README, CHANGELOG. TODO.md tracks the ADH-2015 follow-up (in-time placebo / LOO) and a compact/lazy snapshot representation. Tests: ~19 placebo + convergence tests (infeasible donor counts, treated-fit non-convergence, Powell-succeeds-at-worse-point, failed-placebo exclusion, deterministic reruns, pickle behavior) + SCM reporting-routing tests across all three tools. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
|
Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Against the local ADH 2010 registry/docstrings, the new SCM placebo path is source-faithful; the remaining issues are non-blocking follow-ups. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…ty reason + n_starts validation CI review (PR #511) flagged two P2s: 1. summary() misdiagnosed the infeasible-placebo reason. The treated-fit fail-closed branch and the J<2 branch both leave n_placebos=0, n_failed=0, so reconstructing the cause from counts narrated a non-converged treated fit as "too few donors or all donor refits failed" (false for that path). Add an explicit `_placebo_status` recorded by in_space_placebo() ({treated_fit_nonconverged, too_few_donors, all_placebos_failed, ran}) and render the specific reason in both summary() and DiagnosticReport._scm_native instead of inferring it from counts. The status is a small string, so it survives pickling alongside _placebo_df / the scalar fields. 2. in_space_placebo(n_starts=...) skipped the positive-integer validation the estimator constructor enforces — int(0)/int(-1)/int(2.5) silently coerced into a degenerate or invalid permutation procedure. Mirror the constructor's check and fail fast with a ValueError. Also relabels the lingering "One-sided rank" code comment to "upper-tail rank on the unsigned RMSPE ratio" (matching the REGISTRY/CHANGELOG wording fix). New regressions: summary() names the treated-fit-failure reason (not the donor-side reason) under identical n_placebos/n_failed counts; invalid n_starts overrides raise and leave placebo state untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — this re-review resolves the prior P2s, and I did not find any unmitigated P0/P1 issues in the changed SCM estimator, placebo, or reporting paths. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…lobal filter mutation CI review (PR #511) P3: the new scm_fit fixture called warnings.filterwarnings("ignore"), which permanently mutates the process-wide warning filters and can silently weaken warning-regression coverage in later tests in the same worker. Wrap only the synthetic_control(...) call in warnings.catch_warnings() + simplefilter("ignore") so the suppression is scoped to the fixture's solver call. (Pre-existing sibling fixtures in this file use the same legacy global-filter pattern but are out of scope for this PR.) Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — I did not find any unmitigated P0/P1 issues in the changed SCM placebo-inference or reporting-stack paths. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
SyntheticControlestimator (PR-1 / Add SyntheticControl estimator (classic SCM, Abadie-Diamond-Hainmueller 2010) — PR-1 core #501 shipped the estimator with no analytical SE) and wiresSyntheticControlResultsinto the practitioner / DiagnosticReport / BusinessReport reporting stack.SyntheticControlResults.in_space_placebo(): reassigns treatment to each donor, refits a synthetic control for that pseudo-treated donor against the otherJ−1donors (the real treated unit is excluded from every placebo pool — its post-period is treatment-contaminated; matchesSCtools::generate.placebos), and ranks the treated unit's post/pre RMSPE ratio among theJ+1units.placebo_p_value(= rank/(n_placebos+1); an upper-tail rank test on the unsigned RMSPE-ratio statistic — direction-agnostic, detects an effect of either sign; ties via≥),rmspe_ratio(treated statistic, set at fit),n_placebos/n_failed(effective reference-set sizes; non-converged placebos excluded from BOTH numerator and denominator).placebo_p_valueis a separate field from the always-NaN analyticalp_value— no SE/t-stat, does not flow throughsafe_inference;is_significantstays bound top_value.np.isclose), never on a success at a strictly-worse point.inf),J<2→ NaN+warn,J==2→ degenerate+coarse warn, deterministic givenseed.get_placebo_df()returns the per-unit RMSPE-ratio table (incl. treated row + failed donors). Placebo compute is opt-in; each fit retains a pickle-excluded_SyntheticControlFitSnapshotof the pivoted panel.scm_fitPT analogue (design_enforced_ptverdict readingpre_rmspe), a_scm_nativeblock surfacingpre_rmspe+ donor-weight concentration + the placebo p-value when already computed (never triggering the refit implicitly), a practitioner_handle_synthetic_controlwith the placebo as the headline significance step, and a BusinessReport fit-based assumption block with ADH 2010 attribution. Also fixes a latent BR bug where headlineis_significantwas a non-JSON-serializable numpybool_whenp_valueis numpyNaN.Methodology references (required if estimator / math changes)
SCtools::generate.placebos(CRAN). Documented indocs/methodology/REGISTRY.md§SyntheticControl.**Note:**(its post-period is treatment-contaminated; matchesSCtools). (2) The reportedrmspe_ratiois on the root scale; ADH §3.4 reports the MSPE ratio (the square) — monotone-equivalent, so rank/p-value are identical (documented). (3)placebo_p_valueis kept separate from the analyticalp_value/is_significant(no analytical SE exists for classic SCM) — documented as the non-analytical-p-value split. (4) Scale-aware RMSPE-ratio floor + non-converged-placebo exclusion from numerator AND denominator — documented**Note:**labels. In-time placebo / leave-one-out (ADH 2015) are out of scope and tracked inTODO.md.Validation
tests/test_methodology_synthetic_control.py(~19 placebo + convergence tests: infeasible donor counts, treated-fit non-convergence, Powell-succeeds-at-worse-point regression, failed-placebo exclusion, deterministic reruns, pickle behavior, custom-V path,get_placebo_dfshape/status);tests/test_diagnostic_report.py+tests/test_business_report.py(SCM-native routing,scm_fitprose, target-parameter, robustness, infeasible/precomputed-rejection);tests/test_practitioner.py(placebo step dispatch). Full local run:428 passed, 1 skipped(pure-Python) across the four affected files.Security / privacy
🤖 Generated with Claude Code