diff --git a/CHANGELOG.md b/CHANGELOG.md
index 02de12ab..023904b5 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,7 +9,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- **New tutorial: `docs/tutorials/24_staggered_vs_collapsed_power.ipynb` — "Staggered Rollout or a Simple 2×2? A Power-Analysis Decision Guide".** A practitioner walkthrough for geo experiments (framed on a 50-state staggered rollout) on when to reach for Callaway-Sant'Anna vs collapsing to a familiar pre/post 2×2. Shows, with live paired Monte Carlo on `generate_staggered_data`, that the collapsed 2×2 silently targets a *diluted* estimand (reports ~60–94% of the true effect-on-treated as the rollout staggers, with near-zero CI coverage of the truth under a slow rollout), and that CS's minimum-detectable-lift penalty is a *fast-rollout* phenomenon that shrinks to parity as the rollout becomes more staggered. Fully self-contained (runs live, no committed data files); ends with a CS-vs-2×2 decision guide.
-- **New estimator: `SyntheticControl` — classic Synthetic Control Method (Abadie, Diamond & Hainmueller 2010; Abadie & Gardeazabal 2003).** Standalone estimator (`diff_diff/synthetic_control.py`) + `SyntheticControlResults` (`diff_diff/synthetic_control_results.py`) + `synthetic_control()` convenience function, exported from `diff_diff`. Builds a single treated unit's counterfactual as a convex combination of never-treated donor units — **donor (unit) weights only**, no time weights or ridge, distinct from `SyntheticDiD`. The inner simplex-constrained weighted-LS solve `W*(V)` reuses `utils._sc_weight_fw` (folding `V^½` into the predictor matrix, `intercept=False`, `zeta=0`); the diagonal predictor-importance matrix `V` is selected data-driven by minimizing pre-period outcome MSPE (`v_method="nested"`, softmax-on-simplex multistart Nelder-Mead + Powell polish) or supplied by the user (`v_method="custom"`). Predictors are built from `predictors`/`predictor_window`/`predictors_op`, `special_predictors`, and per-period outcome lags (`pre_period_outcomes`), in the R `Synth::dataprep` row order; per-row standardization (SD over donors+treated, ddof=1) matches the R `Synth::synth` source. Reports the gap path (`α̂_1t = Y_1t − Σ_j w_j Y_jt`), `att` (mean post-period gap), `pre_rmspe`, donor weights, `v_weights`, and a predictor-balance table. **No analytical standard error** — `se`/`t_stat`/`p_value`/`conf_int` are NaN (in-space placebo permutation inference with the post/pre RMSPE-ratio statistic is planned for a follow-up release; `_placebo_gaps`/`_rmspe_ratio`/`_fit_snapshot` are reserved on the results object). Ten validation gates baked in: predictor-period leakage, absorbing post-period suffix + no-anticipation cross-check against the treatment column, post-period canonicalization, donor-pool filtering before period derivation, empty-window rejection, poor-pre-fit `UserWarning` (RMSPE > SD of treated pre-outcomes), duplicate-predictor-label rejection, inner-solve non-convergence warning, order-independent gap-path rebuild, and the `standardize="none"` deviation; plus fail-closed `custom_v` cross-field rules and degenerate single-donor / single-pre-period handling. **R-`Synth` parity** (`tests/test_methodology_synthetic_control.py`, fixtures generated by `benchmarks/R/generate_synth_basque_golden.R` into `tests/data/`): two-tier on the Basque Country study — Tier-1 feeds R's `solution.v` via `custom_v` and reproduces the published donor weights (region 10 Cataluña 0.851 + region 14 Madrid 0.149) to `atol=1e-3` deterministically; Tier-2 (`@pytest.mark.slow`) checks the data-driven nested fit lands in a tolerance band (the nested `V` legitimately differs because the outer objective uses all pre periods, not R's `time.optimize.ssr` window). Documented in `docs/methodology/REGISTRY.md` §SyntheticControl (with `**Deviation from R:** standardize="none"` and `**Note:**` labels for the standardization formula, objective window, softmax `V` parametrization, and 1×SD poor-fit threshold), `docs/api/synthetic_control.rst`, the LLM guides, and `README.md`.
+- **`SyntheticControl` in-space placebo permutation inference + reporting-stack integration (ADH 2010 §2.4).** New `SyntheticControlResults.in_space_placebo()` provides the significance test classic SCM lacks an analytical SE for: it reassigns treatment to each donor, refits a synthetic control for that pseudo-treated donor against the **other `J−1` donors** (the real treated unit is excluded from every placebo pool — its post-period is treatment-contaminated; matches `SCtools::generate.placebos`), and ranks the treated unit's post/pre **RMSPE ratio** among the `J+1` units. New fields `placebo_p_value` (`= rank/(n_placebos+1)`, an upper-tail rank test on the unsigned RMSPE ratio — direction-agnostic, so it detects an effect of *either* sign rather than a signed/one-directional hypothesis; ties counted via `≥`), `rmspe_ratio` (the treated statistic, set at fit), and `n_placebos`/`n_failed` (effective reference-set sizes; non-converged placebos are excluded from BOTH numerator and denominator, never penalized into the rank). `placebo_p_value` is a **separate field** from the (always-NaN) `p_value` — it is a permutation p-value with no SE/t-stat and does not flow through `safe_inference`; `is_significant` stays bound to `p_value`. Edge cases fail closed: scale-aware RMSPE-ratio floor (a perfect pre-fit gives a finite ratio, not `inf`), `J<2` → NaN+warn, `J==2` → degenerate+coarse warn, deterministic given `seed`. New `get_placebo_df()` returns the per-unit RMSPE-ratio summary table (incl. the treated row and any failed donors) used for the rank. The design keeps the placebo *compute* opt-in — the per-donor refit loop runs only on the explicit `in_space_placebo()` call. To support that opt-in call, every fit retains a `_SyntheticControlFitSnapshot` of the pivoted panel (memory O(units × periods × predictor-vars), like `SyntheticDiD`'s snapshot for `in_time_placebo`; excluded from pickling). A compact/lazy snapshot representation is tracked as a follow-up in `TODO.md`. **Reporting-stack integration:** `SyntheticControlResults` is now routed through `DiagnosticReport` (fit-based `scm_fit` parallel-trends analogue → verdict `design_enforced_pt` reading `pre_rmspe`; `_scm_native` surfaces `pre_rmspe` + donor-weight concentration + the placebo p-value when already computed — never triggering the refit loop implicitly), `practitioner_next_steps` (`_handle_synthetic_control` with the placebo as the headline significance step), and `BusinessReport` (fit-based assumption block, ADH 2010 attribution, robustness via `estimator_native_diagnostics`; HonestDiD passthrough rejected like SDiD/TROP). Also fixes a latent BR bug where the headline `is_significant` was a non-JSON-serializable numpy `bool_` when `p_value` is a numpy `NaN`. Documented in `docs/methodology/REGISTRY.md` §SyntheticControl (new `**Note:**` labels for the donor-pool construction, failure handling, RMSPE-ratio floor, and the non-analytical-p-value split), `docs/methodology/REPORTING.md`, `docs/api/synthetic_control.rst`, the LLM guides, and `README.md`.
+- **New estimator: `SyntheticControl` — classic Synthetic Control Method (Abadie, Diamond & Hainmueller 2010; Abadie & Gardeazabal 2003).** Standalone estimator (`diff_diff/synthetic_control.py`) + `SyntheticControlResults` (`diff_diff/synthetic_control_results.py`) + `synthetic_control()` convenience function, exported from `diff_diff`. Builds a single treated unit's counterfactual as a convex combination of never-treated donor units — **donor (unit) weights only**, no time weights or ridge, distinct from `SyntheticDiD`. The inner simplex-constrained weighted-LS solve `W*(V)` reuses `utils._sc_weight_fw` (folding `V^½` into the predictor matrix, `intercept=False`, `zeta=0`); the diagonal predictor-importance matrix `V` is selected data-driven by minimizing pre-period outcome MSPE (`v_method="nested"`, softmax-on-simplex multistart Nelder-Mead + Powell polish) or supplied by the user (`v_method="custom"`). Predictors are built from `predictors`/`predictor_window`/`predictors_op`, `special_predictors`, and per-period outcome lags (`pre_period_outcomes`), in the R `Synth::dataprep` row order; per-row standardization (SD over donors+treated, ddof=1) matches the R `Synth::synth` source. Reports the gap path (`α̂_1t = Y_1t − Σ_j w_j Y_jt`), `att` (mean post-period gap), `pre_rmspe`, donor weights, `v_weights`, and a predictor-balance table. **No analytical standard error** — `se`/`t_stat`/`p_value`/`conf_int` are NaN; significance comes from in-space placebo permutation inference via `in_space_placebo()` (see the dedicated entry below). Ten validation gates baked in: predictor-period leakage, absorbing post-period suffix + no-anticipation cross-check against the treatment column, post-period canonicalization, donor-pool filtering before period derivation, empty-window rejection, poor-pre-fit `UserWarning` (RMSPE > SD of treated pre-outcomes), duplicate-predictor-label rejection, inner-solve non-convergence warning, order-independent gap-path rebuild, and the `standardize="none"` deviation; plus fail-closed `custom_v` cross-field rules and degenerate single-donor / single-pre-period handling. **R-`Synth` parity** (`tests/test_methodology_synthetic_control.py`, fixtures generated by `benchmarks/R/generate_synth_basque_golden.R` into `tests/data/`): two-tier on the Basque Country study — Tier-1 feeds R's `solution.v` via `custom_v` and reproduces the published donor weights (region 10 Cataluña 0.851 + region 14 Madrid 0.149) to `atol=1e-3` deterministically; Tier-2 (`@pytest.mark.slow`) checks the data-driven nested fit lands in a tolerance band (the nested `V` legitimately differs because the outer objective uses all pre periods, not R's `time.optimize.ssr` window). Documented in `docs/methodology/REGISTRY.md` §SyntheticControl (with `**Deviation from R:** standardize="none"` and `**Note:**` labels for the standardization formula, objective window, softmax `V` parametrization, and 1×SD poor-fit threshold), `docs/api/synthetic_control.rst`, the LLM guides, and `README.md`.
- **StaggeredTripleDifference methodology-review-tracker promotion: In Progress → Complete**, plus a new opt-in Eq-4.14 overall ATT. Closes the Ortiz-Villavicencio & Sant'Anna (2025, arXiv:2505.09942v3) primary-source review on the tracker (PR-A #499 added the paper review on file; this PR validates the source against it). New paper-equation-anchored Verified Components in `tests/test_methodology_staggered_triple_diff.py` (Theorem 4.1 / Eq. 4.5 RA=IPW=DR identification; Eq. 4.1 three-term DDD decomposition; Eqs. 4.11-4.12 optimal-GMM weight normalization + single-group reduction; Eq. 4.13 event-study cohort-share weighting; Eq. 4.14 / Cor. 4.2 overall) alongside the existing R cross-validation against `triplediff::ddd(panel=TRUE)` + `agg_ddd()`. **New feature — opt-in `overall_att_es` (paper Eq. 4.14 overall):** the unweighted mean of the post-treatment event-study effects ES(e), exposed on `StaggeredTripleDiffResults` (with `overall_se_es` / `overall_t_stat_es` / `overall_p_value_es` / `overall_conf_int_es`) and populated only when `aggregate="event_study"` / `"all"`. The default `overall_att` is unchanged (the Callaway-Sant'Anna simple post-treatment (g,t) average — the library-wide convention). Its analytical SE is the influence function of that mean (the average of the per-event-time combined IFs, routed through the same survey-aware variance estimator as the per-e effects via a new `_se_from_psi` helper); a multiplier-bootstrap SE replaces it under `n_bootstrap>0`. Computed via a side-channel stash on the shared `CallawaySantAnnaAggregationMixin._aggregate_event_study` (no return-signature change; CallawaySantAnna unaffected), over post-treatment `e >= -anticipation` (the library convention, matching `overall_att`). Cross-validated against R `agg_ddd(type="eventstudy")$overall.att` / `overall.se` (SE matches to ~0.1%). REGISTRY `## StaggeredTripleDifference`: the previously-unlabeled overall-aggregation prose is formalized under a `**Note:**` documenting both overalls, and the duplicate aggregation-weight deviation is consolidated (fixing a `P(G=g)` vs R `P(S=g)` mislabel). `METHODOLOGY_REVIEW.md` row L69 promoted to **Complete** (`Last Review = 2026-05-30`) with a Verified Components / R Comparison Results detail block; priority queue pruned. `docs/references.rst` Ortiz-Villavicencio entry pinned to arXiv:2505.09942v3.
- **SunAbraham + WooldridgeDiD-OLS `vcov_type="conley"` (Conley 1999 spatial-HAC) threading.** Both estimators now accept `vcov_type="conley"` with the five `conley_*` constructor params (`conley_coords`, `conley_cutoff_km`, `conley_metric`, `conley_kernel`, `conley_lag_cutoff`), reusing the already-`conleyreg`-validated `solve_ols` / `conley.py` machinery — within-period spatial HAC at `conley_lag_cutoff=0`, plus the within-unit Bartlett serial term at `conley_lag_cutoff>0` (the panel-aware path, since `conley_time`/`conley_unit` are always supplied — not pooled cross-sectional), no new variance code. Conley routes through each estimator's within-transform path; the unit auto-cluster is dropped on the conley path (an explicit `cluster=` enables the spatial+cluster product kernel); `survey_design=` / `weights` / `n_bootstrap>0` are rejected, and WooldridgeDiD conley is OLS-path-only (`method ∈ {logit, poisson}` + conley still rejected via the `method != "ols"` guard). `SunAbrahamResults` / `WooldridgeDiDResults` gain a `conley_lag_cutoff` field plus a Conley variance-label line in `summary()` (`SunAbrahamResults` also gains `cluster_name`). FWL-composability — the within-transform conley SE equals the full-dummy conley SE — is pinned in `tests/test_conley_vcov.py` (`TestConleySunAbraham` / `TestConleyWooldridge`). **`StackedDiD` conley remains deferred for a methodology reason** (the stacked design replicates units across sub-experiments, so Conley would see same-unit copies at distance 0; no `conleyreg` anchor; paper-gated) — its prior "same shape as the SunAbraham follow-up" framing is corrected in REGISTRY / TODO / the rejection message.
- **ConleySpatialHAC methodology-review-tracker promotion: In Progress → Complete.** Closes the Conley (1999) *Journal of Econometrics* 92(1) primary-source review on the methodology-review tracker. The paper review on file at `docs/methodology/papers/conley-1999-review.md` was previously merged (2026-05-09); this PR is the F.L.I.P. consolidation — new `tests/test_methodology_conley.py` with paper-equation-numbered Verified Components walk-through (~1600 LoC; 10 classes; 60 tests, 5 of them `@pytest.mark.slow`). Coverage: Eq. 4.2 cross-sectional sandwich (pairwise-distance specialization; the project's paper review identifies Eq. 4.2 page 18 as the real-valued/pairwise form, with Eq. 3.13 reserved for the lattice-indexed form), Eq. 4.2 HC0 + rank-1 limits, Andrews (1991) HAC lag truncation matching `conleyreg::time_dist.cpp`, haversine convention with Earth radius 6371.01 km, Phase 2 panel block-decomposed sandwich at `atol=1e-12`, sparse k-d-tree dense-vs-sparse bit-identity (Wave A #120 numerical correctness), and R `conleyreg` v0.1.9 parity at `atol=1e-6` on 6 fixtures (3 cross-sectional + 3 panel) plus the sparse-forced and time-asymmetric kernel parity contracts. Three dedicated deviations-area classes: `TestConleyLibraryExtensions` (Wave A library extensions — combined spatial+cluster product kernel #119, callable conley_metric validation #123, sparse k-d-tree activation #120, indefiniteness guard), `TestConleyDeviationsFromR` (1-D radial Bartlett vs paper's 2-D separable Eq. 3.14, time-label normalization via `np.unique`, independent temporal kernel deferred), and `TestConleyDeferrals` (5 fail-closed `NotImplementedError`/`TypeError` contracts: LinearRegression + survey_design, DiD/MPD/TWFE + survey_design, Conley + weights, SyntheticDiD + Conley, wild_bootstrap + Conley). Methodology-anchored tests extracted from `tests/test_conley_vcov.py`: full classes `TestConleyDirectHelper`, `TestConleyReductions`, `TestConleyReductionsAddendum`, `TestConleyParityR`, `TestConleyParitySpacetime`, `TestConleyPanelHelper`, `TestConleySparseRParityForced`; plus methodology-anchored tests from `TestConleyKernels`, `TestConleyDistanceMetrics`, `TestConleySparse`. File drops 4248 → 3113 lines after extraction. Defensive surface preserved: input validation, NaN/inf guards, dispatch-level validity, estimator-level integration smoke tests, set_params atomicity, sparse-path activation thresholds + density-gate fallback. `METHODOLOGY_REVIEW.md` row L91 promoted to **Complete** with `Last Review = 2026-05-26`; detail block rewritten with Verified Components / Test Coverage / R Comparison Results inline table / Corrections Made / Deviations / Outstanding Concerns. Priority queue at L1386 pruned: PreTrendsPower removed (already Complete since 2026-05-19) and ConleySpatialHAC removed (this PR); substantive-review-blocked renumbered #2-#5 → #1-#4 and consolidation-pass-blocked renumbered #6-#8 → #5-#6.
diff --git a/README.md b/README.md
index a4ce9993..1e738591 100644
--- a/README.md
+++ b/README.md
@@ -108,7 +108,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`.
- [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html) - Gardner (2022) two-stage estimator with GMM sandwich variance
- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing; supports survey-design variance under `survey_design=` for HC1 / CR1 (Wave E.1 Binder TSL) and Conley (Wave E.2 panel-aware stratified-Conley sandwich on per-period PSU totals; extended in Wave E.2 follow-up to `conley_lag_cutoff > 0` via panel-block composition with within-PSU serial Bartlett HAC — `lag>0` requires an effective PSU via explicit `survey_design.psu` or injected `cluster=
`); `SurveyDesign.subpopulation()` preserves full-design `n_psu` / `df_survey` via zero-padded scores (Wave E.3, R `svyrecvar(subset())` form)
- [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - Synthetic DiD combining standard DiD and synthetic control for few treated units
-- [SyntheticControl](https://diff-diff.readthedocs.io/en/stable/api/synthetic_control.html) - Abadie, Diamond & Hainmueller (2010) classic synthetic control for a single treated unit (donor-weight counterfactual, nested/custom V; no inference in this release — permutation/placebo planned)
+- [SyntheticControl](https://diff-diff.readthedocs.io/en/stable/api/synthetic_control.html) - Abadie, Diamond & Hainmueller (2010) classic synthetic control for a single treated unit (donor-weight counterfactual, nested/custom V; in-space placebo permutation inference via `in_space_placebo()`)
- [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html) - triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
- [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html) - Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves
- [HeterogeneousAdoptionDiD](https://diff-diff.readthedocs.io/en/stable/api/had.html) - de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) for designs where **no unit remains untreated**; local-linear estimator at the dose support boundary returning Weighted Average Slope (WAS) on Design 1' (`d̲ = 0` / QUG) or `WAS_{d̲}` on Design 1 (`d̲ > 0`, continuous-near-d̲ or mass-point), with a multi-period event-study extension (last-treatment cohort, pointwise CIs). **Panel-only** in this release - repeated cross-sections rejected by the validator. Alias `HAD`.
diff --git a/TODO.md b/TODO.md
index 5d20a4f9..5236208d 100644
--- a/TODO.md
+++ b/TODO.md
@@ -85,7 +85,7 @@ Deferred items from PR reviews that were not addressed before merge.
| ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
| Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
| Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low |
-| SyntheticControl: `SyntheticControlResults` not wired into the practitioner / DiagnosticReport / BusinessReport routing, so routing SCM results through those tools yields generic parallel-trends/HonestDiD guidance that doesn't fit SCM. Add SCM to the native-routed rejection sets (mirror SDiD/TROP) and surface SCM-native diagnostics (pre-fit / in-space placebo / in-time placebo / leave-one-out). Deferred to PR-2, where it pairs with the placebo-inference layer those reports would surface. | `practitioner.py`, `diagnostic_report.py`, `business_report.py` | SCM PR-1 → PR-2 | Medium |
+| SyntheticControl: in-time placebo + leave-one-out donor-robustness diagnostics are not implemented (ADH 2015, not the ADH 2010 scope of the current estimator), so `_scm_native` surfaces only pre-fit + in-space placebo. The practitioner / DiagnosticReport / BusinessReport routing and the in-space placebo permutation layer landed in PR-2; this remaining row covers adding the two ADH-2015 diagnostics (and surfacing them under `estimator_native_diagnostics`) in a later 2015-sourced PR. | `synthetic_control.py`, `diagnostic_report.py` | ADH-2015 follow-up | Low |
| ContinuousDiD deferred CGBS 2024 extensions: (a) `covariates=` kwarg not implemented (matches R `contdid` v0.1.0); (b) discrete-treatment saturated regression deferred (integer-valued dose currently warned, not routed to per-level coefficients); (c) lowest-dose-as-control per CGBS 2024 Remark 3.1 (when `P(D=0) = 0`) not implemented — estimator requires never-treated controls. REGISTRY `## ContinuousDiD` → Implementation Checklist marks these as deferred `[ ]` items. | `diff_diff/continuous_did.py` | — | Low |
| Survey-weighted Silverman bandwidth in EfficientDiD conditional Omega* — `_silverman_bandwidth()` uses unweighted mean/std for bandwidth selection; survey-weighted statistics would better reflect the population distribution but is a second-order refinement | `efficient_did_covariates.py` | — | Low |
| TROP: extend Wave 4's `_setup_trop_data` helper to also cover the duplicated bootstrap resampling loop in `_bootstrap_variance` / `_bootstrap_variance_global` (~40 LoC dedup; mirrors the data-setup helper pattern with a `fit_callable` parameter for the per-draw refit step). | `trop_local.py`, `trop_global.py` | follow-up | Low |
@@ -163,6 +163,7 @@ Deferred items from PR reviews that were not addressed before merge.
| MPD cluster+hc2_bm path computes CR2 precomputes twice — once via `solve_ols` → `_compute_cr2_bm` for vcov + per-coefficient DOF, then again via `_compute_cr2_bm_contrast_dof` from `MultiPeriodDiD.fit()` for the post-period-average contrast DOF. Both rebuild `H = X bread_inv X'`, the residual-maker `M`, and the per-cluster `A_g = (I - H_gg)^{-1/2}` matrices. O(n²k) redundant work; acceptable for typical cluster-robust DiD panel sizes (n ≤ a few thousand). Fix would plumb the contrast DOF through the existing CR2 vcov path (intrusive API change) or share the precomputes via a cached helper. | `linalg.py::_compute_cr2_bm_contrast_dof`, `estimators.py::MultiPeriodDiD.fit` | follow-up | Low |
| Rust-backend HC2 implementation. Current Rust path only supports HC1; HC2 and CR2 Bell-McCaffrey fall through to the NumPy backend. For large-n fits this is noticeable. | `rust/src/linalg.rs` | Phase 1a | Low |
| CR2 Bell-McCaffrey DOF uses a naive `O(n² k)` per-coefficient loop over cluster pairs. Pustejovsky-Tipton (2018) Appendix B has a scores-based formulation that avoids the full `n × n` `M` matrix. Switch when a user hits a large-`n` cluster-robust design. | `linalg.py::_compute_cr2_bm` | Phase 1a | Low |
+| `SyntheticControl` retains a full `_SyntheticControlFitSnapshot` (pivoted outcome/predictor panels) on EVERY fit to support the opt-in `in_space_placebo()`, so callers who never run the placebo still pay O(units × periods × predictor-vars) memory (same as `SyntheticDiD`'s always-on snapshot for `in_time_placebo`). Store a compact array/index representation instead of per-variable DataFrames, or build the snapshot lazily on first placebo call (would need to retain the source data, ~same cost). | `synthetic_control.py` snapshot build, `synthetic_control_results.py::_SyntheticControlFitSnapshot` | follow-up | Low |
#### Testing/Docs
diff --git a/diff_diff/_reporting_helpers.py b/diff_diff/_reporting_helpers.py
index f5711344..6cc70905 100644
--- a/diff_diff/_reporting_helpers.py
+++ b/diff_diff/_reporting_helpers.py
@@ -618,6 +618,23 @@ def describe_target_parameter(results: Any) -> Dict[str, Any]:
"reference": "REGISTRY.md Sec. TROP",
}
+ if name == "SyntheticControlResults":
+ return {
+ "name": "SCM ATT (mean post-treatment gap for the single treated unit)",
+ "definition": (
+ "The average over the post-treatment periods of the gap "
+ "``alpha_hat_{1t} = Y_{1t} - sum_j w_j Y_{jt}`` between the single "
+ "treated unit and its donor-weighted synthetic control (Abadie, "
+ "Diamond & Hainmueller 2010). There is no population-averaging or "
+ "sampling estimand — it is the effect on the one treated unit; "
+ "significance is assessed by in-space placebo permutation inference "
+ "(no analytical standard error)."
+ ),
+ "aggregation": "single_unit_gap",
+ "headline_attribute": "att",
+ "reference": "REGISTRY.md Sec. SyntheticControl",
+ }
+
# Default: unrecognized result class. Fall through with a neutral
# block — agents / downstream consumers can still dispatch on
# ``aggregation="unknown"`` and fall back to generic ATT narration.
diff --git a/diff_diff/business_report.py b/diff_diff/business_report.py
index 6ceeb4a5..f6edb275 100644
--- a/diff_diff/business_report.py
+++ b/diff_diff/business_report.py
@@ -203,6 +203,7 @@ def __init__(
if honest_did_results is not None and type(results).__name__ in {
"SyntheticDiDResults",
"TROPResults",
+ "SyntheticControlResults",
}:
raise ValueError(
f"{type(results).__name__} routes robustness to "
@@ -213,8 +214,9 @@ def __init__(
"object's native diagnostics "
"(SDiD: ``in_time_placebo()``, ``sensitivity_to_zeta_omega()``, "
"``pre_treatment_fit``; TROP: ``effective_rank``, "
- "``loocv_score``) — BusinessReport surfaces these "
- "automatically under ``estimator_native_diagnostics``."
+ "``loocv_score``; SyntheticControl: ``in_space_placebo()``, "
+ "``pre_rmspe``, ``get_placebo_df()``) — BusinessReport surfaces "
+ "these automatically under ``estimator_native_diagnostics``."
)
# Round-44 P1 CI review on PR #318: mirror the SDiD/TROP
@@ -646,10 +648,13 @@ def _extract_headline(self, dr_schema: Optional[Dict[str, Any]]) -> Dict[str, An
if att is None or not np.isfinite(att):
sign = "undefined"
ci_level = int(round((1.0 - display_alpha) * 100))
- is_significant = (
+ # bool(...) coerces away numpy bool_ — when ``p`` is a numpy NaN (e.g.
+ # SyntheticControl, whose analytical p_value is always NaN), ``np.isfinite``
+ # yields a numpy bool that is NOT JSON-serializable in the schema.
+ is_significant = bool(
p is not None and np.isfinite(p) and p < phrasing_alpha if p is not None else False
)
- near_threshold = (
+ near_threshold = bool(
p is not None
and np.isfinite(p)
and (phrasing_alpha - 0.01) < p < (phrasing_alpha + 0.001)
@@ -1002,16 +1007,25 @@ def _lift_robustness(dr: Optional[Dict[str, Any]]) -> Dict[str, Any]:
return {"status": "skipped", "reason": "auto_diagnostics=False"}
bacon = dr.get("bacon") or {}
native = dr.get("estimator_native_diagnostics") or {}
+ native_block = {
+ "status": native.get("status"),
+ "estimator": native.get("estimator"),
+ "pre_treatment_fit": native.get("pre_treatment_fit"),
+ }
+ # Classic SCM exposes pre_rmspe + donor-weight concentration + the (opt-in)
+ # in-space placebo rather than SDiD's pre_treatment_fit; surface those so the
+ # top-level robustness block is not empty for SyntheticControl.
+ if native.get("estimator") == "SyntheticControl":
+ native_block["pre_rmspe"] = native.get("pre_rmspe")
+ native_block["weight_concentration"] = native.get("weight_concentration")
+ native_block["in_space_placebo"] = native.get("in_space_placebo")
return {
"bacon": {
"status": bacon.get("status"),
"forbidden_weight": bacon.get("forbidden_weight"),
"verdict": bacon.get("verdict"),
},
- "estimator_native": {
- "status": native.get("status"),
- "pre_treatment_fit": native.get("pre_treatment_fit"),
- },
+ "estimator_native": native_block,
}
@@ -1153,6 +1167,20 @@ def _describe_assumption(estimator_name: str, results: Any = None) -> Dict[str,
"captured through latent factor loadings."
),
}
+ if estimator_name in {"SyntheticControlResults"}:
+ return {
+ # Distinct from SDiD's "synthetic_fit" weighted-PT analogue: classic
+ # SCM is a donor-weighted level match (matches the DR "scm_fit" method).
+ "parallel_trends_variant": "scm_fit",
+ "no_anticipation": True,
+ "description": (
+ "Classic synthetic control identifies the single treated unit's "
+ "counterfactual via a donor-weighted match to its pre-treatment "
+ "trajectory (a design-enforced fit, not a parallel-trends test); "
+ "significance comes from in-space placebo permutation inference "
+ "rather than an analytical standard error."
+ ),
+ }
if estimator_name == "ContinuousDiDResults":
# Callaway, Goodman-Bacon & Sant'Anna (2024), two-level PT:
# REGISTRY.md §ContinuousDiD > Identification.
@@ -1780,6 +1808,8 @@ def _pt_method_subject(method: Optional[str]) -> str:
return "Pre-treatment event-study coefficients"
if method == "synthetic_fit":
return "The synthetic-control pre-treatment fit"
+ if method == "scm_fit":
+ return "The synthetic-control donor-weighted pre-treatment fit"
if method == "factor":
return "The factor-model pre-treatment fit"
return "Pre-treatment data"
@@ -1806,7 +1836,9 @@ def _pt_method_stat_label(method: Optional[str]) -> Optional[str]:
return "joint p"
if method in {"slope_difference", "hausman"}:
return "p"
- if method in {"synthetic_fit", "factor"}:
+ if method in {"synthetic_fit", "scm_fit", "factor"}:
+ # Design-enforced fit-based paths have no p-value label (SCM's significance
+ # is the in-space placebo, not a PT joint test).
return None
return "joint p"
@@ -1846,6 +1878,13 @@ def _references_for(estimator_name: str) -> List[Dict[str, str]]:
"& Wager, S. (2021). Synthetic Difference in Differences."
),
},
+ "SyntheticControlResults": {
+ "role": "estimator",
+ "citation": (
+ "Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic "
+ "Control Methods for Comparative Case Studies. JASA, 105(490)."
+ ),
+ },
"SunAbrahamResults": {
"role": "estimator",
"citation": (
@@ -2181,11 +2220,20 @@ def _render_summary(schema: Dict[str, Any]) -> str:
"assumption." + sens_tail_see_reliable
)
elif verdict == "design_enforced_pt":
- sentences.append(
- "The synthetic control is designed to match the treated "
- "group's pre-period trajectory (SDiD's weighted-parallel-"
- "trends analogue)."
- )
+ if method == "scm_fit":
+ sentences.append(
+ "The synthetic control is designed to reproduce the treated "
+ "unit's pre-period trajectory via donor weights (classic SCM's "
+ "design-enforced analogue of parallel trends); significance "
+ "comes from in-space placebo permutation inference, not a "
+ "parallel-trends test."
+ )
+ else:
+ sentences.append(
+ "The synthetic control is designed to match the treated "
+ "group's pre-period trajectory (SDiD's weighted-parallel-"
+ "trends analogue)."
+ )
elif verdict == "inconclusive":
# Round-35 P1 CI review on PR #318: a ``verdict=="inconclusive"``
# state means one or more pre-period coefficients had
diff --git a/diff_diff/diagnostic_report.py b/diff_diff/diagnostic_report.py
index cdfdf21a..f474e91b 100644
--- a/diff_diff/diagnostic_report.py
+++ b/diff_diff/diagnostic_report.py
@@ -157,6 +157,21 @@
"estimator_native",
}
),
+ "SyntheticControlResults": frozenset(
+ # Classic SCM expresses its identifying assumption as design-enforced
+ # pre-treatment fit (the donor weights match the treated unit's
+ # pre-period trajectory), so — like SDiD — ``parallel_trends`` routes to
+ # the fit-based ``_pt_scm_fit`` (verdict ``design_enforced_pt``, NOT a PT
+ # hypothesis test; see prose). It also exposes ``estimator_native`` (the
+ # in-space placebo permutation inference + weight concentration).
+ # ``sensitivity`` is omitted (no HonestDiD analogue — significance testing
+ # is the native placebo); ``heterogeneity`` is omitted (one treated unit).
+ {
+ "parallel_trends",
+ "design_effect",
+ "estimator_native",
+ }
+ ),
"EfficientDiDResults": frozenset(
{
"parallel_trends",
@@ -195,6 +210,7 @@
# "hausman" — EfficientDiD.hausman_pretest (native PT-All vs PT-Post)
# "synthetic_fit" — SDiD weighted pre-treatment fit (surfaces pre_treatment_fit)
# "factor" — TROP factor-model identification (no PT; renders "N/A" prose)
+# "scm_fit" — classic SCM donor-weighted pre-treatment fit (surfaces pre_rmspe)
_PT_METHOD: Dict[str, str] = {
"DiDResults": "two_x_two",
"MultiPeriodDiDResults": "event_study",
@@ -210,6 +226,7 @@
"ChaisemartinDHaultfoeuilleResults": "event_study",
"SyntheticDiDResults": "synthetic_fit",
"TROPResults": "factor",
+ "SyntheticControlResults": "scm_fit",
}
@@ -257,11 +274,14 @@ class DiagnosticReport:
pre_periods, post_periods : list, optional
Explicit pre- and post-treatment period labels.
run_parallel_trends, run_sensitivity, run_placebo, run_bacon, run_design_effect, run_heterogeneity, run_epv, run_pretrends_power : bool
- Per-check opt-in flags. ``run_placebo`` defaults to ``False`` (opt-in,
- expensive, currently not implemented - placebo key remains reserved
- as ``skipped`` in the schema). All other checks default to ``True``
- and are further gated by estimator-type and instance-level
- applicability (see ``docs/methodology/REPORTING.md``).
+ Per-check opt-in flags. ``run_placebo`` defaults to ``False`` — the generic
+ placebo battery is not implemented in MVP, so the ``placebo`` key remains
+ reserved as ``skipped`` in the schema. (Exception: ``SyntheticControl``'s
+ in-space placebo permutation test IS implemented — run it via
+ ``results.in_space_placebo()``; its result is surfaced under
+ ``estimator_native_diagnostics.in_space_placebo``, not this generic section.)
+ All other checks default to ``True`` and are further gated by estimator-type
+ and instance-level applicability (see ``docs/methodology/REPORTING.md``).
sensitivity_M_grid : tuple of float, default (0.5, 1.0, 1.5, 2.0)
Grid of M values passed to ``HonestDiD.sensitivity``. Yields a
``SensitivityResults`` object with ``breakdown_M`` populated.
@@ -298,7 +318,9 @@ class DiagnosticReport:
Other sections (``design_effect``, ``heterogeneity``, ``epv``) are
read directly from the fitted result object and do not currently
accept precomputed values — there is no expensive call to bypass.
- ``placebo`` is reserved in the schema but opt-in / deferred in MVP.
+ ``placebo`` is reserved in the schema but opt-in / deferred in MVP for the
+ generic battery; ``SyntheticControl`` surfaces its in-space placebo under
+ ``estimator_native_diagnostics`` (run ``results.in_space_placebo()``).
outcome_label, treatment_label : str, optional
Plain-English labels used in prose rendering.
"""
@@ -390,11 +412,18 @@ def __init__(
# native-routing contract documented in REPORTING.md.
# Round-21 P1 CI review on PR #318 flagged this bypass.
_result_name = type(self._results).__name__
- _native_routed_names = {"SyntheticDiDResults", "TROPResults"}
+ _native_routed_names = {"SyntheticDiDResults", "TROPResults", "SyntheticControlResults"}
if _result_name in _native_routed_names:
_incompatible_keys = []
if "sensitivity" in self._precomputed:
_incompatible_keys.append("sensitivity")
+ # All native-routed estimators reject a precomputed ``parallel_trends``
+ # payload — their PT verdict is computed INTERNALLY (SDiD/SCM:
+ # design-enforced pre-treatment fit; TROP: factor-model identification),
+ # not supplied as a user p-value. SyntheticControl is no exception: its
+ # ``scm_fit`` route reads ``pre_rmspe`` from the fit, and the generic
+ # precomputed-PT adapter only understands p-value-style payloads, which
+ # are methodology-incompatible with SCM (it has no PT p-value test).
if "parallel_trends" in self._precomputed:
_incompatible_keys.append("parallel_trends")
# Round-32 P1 CI review on PR #318: ``pretrends_power`` is a
@@ -420,7 +449,9 @@ def __init__(
"Use the native diagnostics on the result object "
"(SDiD: ``in_time_placebo``, ``sensitivity_to_zeta_omega``, "
"``pre_treatment_fit``; TROP: ``effective_rank``, "
- "``loocv_score``) — DR surfaces these automatically."
+ "``loocv_score``; SyntheticControl: ``in_space_placebo()``, "
+ "``pre_rmspe``, ``get_placebo_df()``) — DR surfaces these "
+ "automatically under ``estimator_native_diagnostics``."
)
# Round-44 P1 CI review on PR #318: mirror the SDiD/TROP
@@ -572,6 +603,19 @@ def _compute_applicable_checks(self) -> Tuple[set, Dict[str, str]]:
# shape is stable: ``schema["placebo"]["status"] == "skipped"``
# always holds regardless of estimator. The opt-in execution path
# is deferred to a follow-up; ``REPORTING.md`` documents this.
+ # SyntheticControl is the exception — its in-space placebo permutation test
+ # IS implemented (run via results.in_space_placebo()) and surfaced under
+ # estimator_native_diagnostics, so its generic-section reason points there
+ # rather than claiming placebo is unimplemented (``setdefault`` wins over
+ # the generic message below).
+ if type(self._results).__name__ == "SyntheticControlResults":
+ skipped.setdefault(
+ "placebo",
+ "SyntheticControl's placebo battery is the in-space placebo "
+ "permutation test — run results.in_space_placebo(); the result is "
+ "surfaced under estimator_native_diagnostics.in_space_placebo, not "
+ "this generic section.",
+ )
skipped.setdefault(
"placebo",
"Placebo battery runs on opt-in only; not yet implemented in MVP. "
@@ -877,7 +921,7 @@ def _instance_skip_reason(self, check: str) -> Optional[str]:
continue
return "No group/event-study effects available to compute heterogeneity."
if check == "estimator_native":
- if name not in {"SyntheticDiDResults", "TROPResults"}:
+ if name not in {"SyntheticDiDResults", "TROPResults", "SyntheticControlResults"}:
return f"{name} does not expose native validation methods."
return None
return None
@@ -1073,11 +1117,29 @@ def _ran(key: str) -> bool:
# has effectively been performed; treating the sensitivity
# section as not-run would have ``next_steps`` redundantly
# recommend a check the report already executed (round-19
- # CI review on PR #318).
+ # CI review on PR #318). SyntheticControl is deliberately NOT
+ # in this set: its significance procedure is the OPT-IN in-space
+ # placebo (``in_space_placebo()``), which ``_scm_native`` reports
+ # but does not run, so the work is not complete merely because the
+ # native block ran — the practitioner ``placebo`` step must persist.
result_name = type(self._results).__name__
- if result_name in {"SyntheticDiDResults", "TROPResults"} and _ran("estimator_native"):
+ if result_name in {
+ "SyntheticDiDResults",
+ "TROPResults",
+ } and _ran("estimator_native"):
if "sensitivity" not in completed:
completed.append("sensitivity")
+ # SCM's significance step is the opt-in in-space placebo. Mark it
+ # complete only when the caller has run it AND it produced a VALID
+ # reference set (>=1 placebo + a finite p-value) — an attempted-but-
+ # infeasible run (J<2, treated-fit failure, or all donors failed) leaves
+ # placebo_p_value NaN and is NOT complete, so the recommendation persists.
+ if (
+ result_name == "SyntheticControlResults"
+ and int(getattr(self._results, "n_placebos", 0) or 0) > 0
+ and np.isfinite(getattr(self._results, "placebo_p_value", np.nan))
+ ):
+ completed.append("placebo")
if _ran("heterogeneity"):
completed.append("heterogeneity")
ns = practitioner_next_steps(
@@ -1114,6 +1176,8 @@ def _check_parallel_trends(self) -> Dict[str, Any]:
return self._pt_hausman()
if method == "synthetic_fit":
return self._pt_synthetic_fit()
+ if method == "scm_fit":
+ return self._pt_scm_fit()
if method == "factor":
return self._pt_factor()
return {
@@ -2086,6 +2150,10 @@ def _check_estimator_native(self) -> Dict[str, Any]:
TROP: factor-model fit metrics (``effective_rank``, ``loocv_score``,
selected ``lambda_*``).
+
+ SyntheticControl: pre-treatment fit (``pre_rmspe``), donor-weight
+ concentration, and — when already computed — the in-space placebo
+ permutation p-value (``in_space_placebo``).
"""
r = self._results
name = type(r).__name__
@@ -2093,6 +2161,8 @@ def _check_estimator_native(self) -> Dict[str, Any]:
return self._sdid_native(r)
if name == "TROPResults":
return self._trop_native(r)
+ if name == "SyntheticControlResults":
+ return self._scm_native(r)
return {
"status": "not_applicable",
"reason": f"{name} does not expose native validation methods.",
@@ -2172,6 +2242,93 @@ def _trop_native(self, r: Any) -> Dict[str, Any]:
},
}
+ def _scm_native(self, r: Any) -> Dict[str, Any]:
+ """Populate classic-SCM-native diagnostics section.
+
+ Always surfaces the pre-treatment fit (``pre_rmspe``) and donor-weight
+ concentration. The in-space placebo permutation inference (ADH 2010 §2.4)
+ is reported only when the user has ALREADY run ``in_space_placebo()`` —
+ DR never triggers it implicitly, because it refits one synthetic control
+ per donor (potentially many nested V searches) and the placebo layer is
+ opt-in by design. (This differs from SDiD's cheaper in-time-placebo sweep,
+ which ``_sdid_native`` runs inline.) Only the in-space placebo is exposed;
+ in-time placebo and leave-one-out are ADH 2015 (not implemented).
+ """
+ out: Dict[str, Any] = {"status": "ran", "estimator": "SyntheticControl"}
+ out["pre_rmspe"] = _to_python_float(getattr(r, "pre_rmspe", None))
+ out["n_donors"] = _to_python_scalar(getattr(r, "n_donors", None))
+ out["v_method"] = getattr(r, "v_method", None)
+ # Donor-weight concentration (Herfindahl + top weight): SCM places all mass
+ # on the donor simplex, so concentration is the natural "how few donors
+ # drive the synthetic" diagnostic.
+ try:
+ weights = [float(w) for w in r.donor_weights.values()]
+ out["weight_concentration"] = {
+ "herfindahl": _to_python_float(sum(w * w for w in weights)),
+ "top_weight": _to_python_float(max(weights) if weights else None),
+ "n_donors_with_weight": len(weights),
+ }
+ except Exception as exc: # noqa: BLE001
+ out["weight_concentration"] = {
+ "status": "error",
+ "reason": f"donor_weights unavailable: {type(exc).__name__}: {exc}",
+ }
+ # In-space placebo: surface only if already computed (opt-in; see docstring).
+ if getattr(r, "_placebo_df", None) is not None:
+ n_placebos = int(getattr(r, "n_placebos", 0) or 0)
+ placebo_p = getattr(r, "placebo_p_value", np.nan)
+ block = {
+ "placebo_p_value": _to_python_float(placebo_p),
+ "rmspe_ratio": _to_python_float(getattr(r, "rmspe_ratio", None)),
+ "n_placebos": _to_python_scalar(n_placebos),
+ "n_failed": _to_python_scalar(getattr(r, "n_failed", None)),
+ }
+ # Distinguish a valid run from an attempted-but-infeasible one so BR/DR
+ # consumers see an explicit status/reason rather than a bare NaN p-value.
+ # The SPECIFIC cause comes from the results' recorded ``_placebo_status``
+ # (n_placebos/n_failed alone cannot tell a non-converged treated fit
+ # apart from too-few-donors).
+ if n_placebos > 0 and np.isfinite(placebo_p):
+ block["status"] = "ran"
+ else:
+ placebo_status = getattr(r, "_placebo_status", None)
+ _reasons = {
+ "treated_fit_nonconverged": (
+ "in_space_placebo() was run but the treated unit's own SCM "
+ "fit did not converge at fit time, so its RMSPE ratio is not "
+ "a valid optimum to rank against placebos; placebo_p_value "
+ "is NaN."
+ ),
+ "too_few_donors": (
+ "in_space_placebo() was run but fewer than 2 donors are "
+ "available (each placebo is fit against the other donors); "
+ "placebo_p_value is NaN."
+ ),
+ "all_placebos_failed": (
+ "in_space_placebo() was run but every donor refit failed to "
+ "converge, so no placebo entered the reference set; "
+ "placebo_p_value is NaN."
+ ),
+ }
+ block["status"] = "infeasible"
+ block["reason"] = _reasons.get(
+ placebo_status,
+ "in_space_placebo() was run but produced no valid reference set "
+ "(fewer than 2 donors, a non-converged treated fit, or all donor "
+ "refits failed); placebo_p_value is NaN.",
+ )
+ out["in_space_placebo"] = block
+ else:
+ out["in_space_placebo"] = {
+ "status": "not_run",
+ "reason": (
+ "Call results.in_space_placebo() to run in-space placebo "
+ "permutation inference (opt-in; refits one synthetic control "
+ "per donor)."
+ ),
+ }
+ return out
+
# -- Heterogeneity helpers --------------------------------------------
def _collect_effect_scalars(self) -> List[float]:
@@ -2504,6 +2661,31 @@ def _pt_synthetic_fit(self) -> Dict[str, Any]:
"verdict": "design_enforced_pt",
}
+ def _pt_scm_fit(self) -> Dict[str, Any]:
+ """Classic SCM donor-weighted pre-treatment-fit PT analogue.
+
+ Classic synthetic control (Abadie-Diamond-Hainmueller 2010) is not
+ parallel-trends-based: the donor weights are chosen to match the treated
+ unit's pre-period trajectory, so a small ``pre_rmspe`` is the
+ design-enforced substitute for a PT test. Unlike SDiD this reads
+ ``pre_rmspe`` (always populated on a successful fit). The in-space placebo
+ permutation inference lives in ``estimator_native_diagnostics``.
+ """
+ r = self._results
+ fit = _to_python_float(getattr(r, "pre_rmspe", None))
+ if fit is None:
+ return {
+ "status": "skipped",
+ "method": "scm_fit",
+ "reason": "SyntheticControlResults.pre_rmspe is not populated on this fit.",
+ }
+ return {
+ "status": "ran",
+ "method": "scm_fit",
+ "pre_treatment_fit_rmse": fit,
+ "verdict": "design_enforced_pt",
+ }
+
def _pt_factor(self) -> Dict[str, Any]:
"""TROP has no PT concept — its identification is factor-model-based."""
return {
@@ -3077,7 +3259,10 @@ def _check_headline(check: str, section: Dict[str, Any]) -> Optional[Any]:
if check == "epv":
return section.get("min_epv")
if check == "estimator_native":
- return section.get("pre_treatment_fit")
+ # SDiD reports ``pre_treatment_fit``; classic SCM reports ``pre_rmspe`` (its
+ # design-enforced pre-fit) — fall back so SCM's tabular headline is not None.
+ fit = section.get("pre_treatment_fit")
+ return fit if fit is not None else section.get("pre_rmspe")
return None
@@ -3112,6 +3297,8 @@ def _pt_subject_phrase(method: Optional[str]) -> str:
return "Pre-treatment event-study coefficients"
if method == "synthetic_fit":
return "The synthetic-control pre-treatment fit"
+ if method == "scm_fit":
+ return "The synthetic-control donor-weighted pre-treatment fit"
if method == "factor":
return "The factor-model pre-treatment fit"
return "Pre-treatment data"
@@ -3139,7 +3326,7 @@ def _pt_stat_label(method: Optional[str]) -> Optional[str]:
return "joint p"
if method in {"slope_difference", "hausman"}:
return "p"
- if method in {"synthetic_fit", "factor"}:
+ if method in {"synthetic_fit", "scm_fit", "factor"}:
return None
return "joint p"
@@ -3299,15 +3486,30 @@ def _render_overall_interpretation(schema: Dict[str, Any], labels: Dict[str, str
)
elif verdict == "design_enforced_pt":
rmse = pt.get("pre_treatment_fit_rmse")
- sentences.append(
- f"The synthetic control matches the treated group's "
- f"pre-period trajectory with RMSE = "
- f"{rmse:.3g} (SDiD's design-enforced analogue of parallel "
- f"trends)."
- if isinstance(rmse, (int, float))
- else "SDiD's synthetic control is designed to satisfy the "
- "weighted parallel-trends analogue."
- )
+ if pt.get("method") == "scm_fit":
+ # Classic SCM: a single treated UNIT, donor-weighted level match;
+ # significance is the in-space placebo, not SDiD's weighted PT.
+ sentences.append(
+ f"The synthetic control reproduces the treated unit's "
+ f"pre-period trajectory with pre-RMSPE = {rmse:.3g} (classic "
+ f"SCM's donor-weighted, design-enforced analogue of parallel "
+ f"trends; significance is assessed via in-space placebo "
+ f"permutation, not a parallel-trends test)."
+ if isinstance(rmse, (int, float))
+ else "Classic SCM's donor-weighted synthetic control is designed "
+ "to match the treated unit's pre-period trajectory; significance "
+ "comes from in-space placebo permutation inference."
+ )
+ else:
+ sentences.append(
+ f"The synthetic control matches the treated group's "
+ f"pre-period trajectory with RMSE = "
+ f"{rmse:.3g} (SDiD's design-enforced analogue of parallel "
+ f"trends)."
+ if isinstance(rmse, (int, float))
+ else "SDiD's synthetic control is designed to satisfy the "
+ "weighted parallel-trends analogue."
+ )
elif verdict == "inconclusive":
# Round-35 P1 CI review on PR #318: DR summary / overall
# interpretation must surface the inconclusive state
diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt
index e12801c4..b9a54e21 100644
--- a/diff_diff/guides/llms-full.txt
+++ b/diff_diff/guides/llms-full.txt
@@ -616,7 +616,7 @@ scm.fit(
) -> SyntheticControlResults
```
-**Inference:** NONE analytical — `se`/`t_stat`/`p_value`/`conf_int` are NaN. `att` is the mean post-period gap; use placebo/permutation inference for significance (planned follow-up). Predictor periods must lie within the pre window; `post_periods` must be a contiguous suffix cross-checked against `D` (no anticipation).
+**Inference:** NONE analytical — `se`/`t_stat`/`p_value`/`conf_int` are always NaN. `att` is the mean post-period gap. Significance via in-space placebo permutation inference: `results.in_space_placebo()` reassigns treatment to each donor, refits against the other J-1 donors (the real treated unit is excluded from every placebo pool), and sets `placebo_p_value = rank/(n_placebos+1)` from the post/pre RMSPE-ratio. The permutation `placebo_p_value` is a SEPARATE field from the (NaN) `p_value`; `is_significant` stays bound to `p_value`. Predictor periods must lie within the pre window; `post_periods` must be a contiguous suffix cross-checked against `D` (no anticipation).
**Usage:**
@@ -1278,7 +1278,10 @@ Returned by `SyntheticControl.fit()`.
| Attribute | Type | Description |
|-----------|------|-------------|
| `att` | `float` | Mean post-period gap (reported point estimate) |
-| `se`, `t_stat`, `p_value`, `conf_int` | `float` / tuple | Always NaN — no analytical SE (permutation inference planned) |
+| `se`, `t_stat`, `p_value`, `conf_int` | `float` / tuple | Always NaN — no analytical SE (use `in_space_placebo()`) |
+| `placebo_p_value` | `float` | In-space placebo permutation p-value, `rank/(n_placebos+1)` (NaN until `in_space_placebo()` runs) |
+| `rmspe_ratio` | `float` | Treated unit's post/pre RMSPE ratio = sqrt(post-MSPE/pre-MSPE) (the placebo test statistic; set at fit) |
+| `n_placebos`, `n_failed` | `int` | Placebos in the reference set / excluded for non-convergence |
| `n_obs` | `int` | Treated + donor rows over all periods |
| `n_donors` | `int` | Donor units in the (post-filter) pool |
| `n_pre_periods`, `n_post_periods` | `int` | Period counts |
@@ -1292,7 +1295,7 @@ Returned by `SyntheticControl.fit()`.
| `pre_periods`, `post_periods` | `list` | Calendar-sorted periods |
| `v_method`, `standardize` | `str` | Echoed configuration |
-**Methods:** `summary()`, `print_summary()`, `to_dict()`, `to_dataframe()`, `get_gap_df()`, `get_weights_df()`
+**Methods:** `in_space_placebo()` (opt-in permutation inference; refits one synthetic control per donor), `get_placebo_df()` (per-unit RMSPE-ratio table incl. the treated row), `summary()`, `print_summary()`, `to_dict()`, `to_dataframe()`, `get_gap_df()`, `get_weights_df()`
### TripleDifferenceResults
diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt
index f56e63cd..5261b046 100644
--- a/diff_diff/guides/llms.txt
+++ b/diff_diff/guides/llms.txt
@@ -60,7 +60,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")`
- [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html): Gardner (2022) two-stage estimator with GMM sandwich variance
- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing; supports `SurveyDesign(weights, strata, psu, fpc)` under `vcov_type="hc1"` with optional `cluster=` for CR1 via Gerber (2026) Binder TSL (Wave E.1) and under `vcov_type="conley"` via a panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2 cross-sectional `conley_lag_cutoff=0`) extended in Wave E.2 follow-up to `conley_lag_cutoff > 0` via panel-block composition with within-PSU serial Bartlett HAC (Newey-West 1987 separable form; `lag>0` requires an effective PSU via explicit `survey_design.psu` or injected `cluster=`), both composed with the Wave D Gardner GMM correction; `SurveyDesign.subpopulation()` preserves full-design `n_psu` / `df_survey` via zero-padded scores at the meat-helper boundary (Wave E.3, R `svyrecvar(subset())` form) (replicate weights queued as follow-up)
- [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html): Synthetic DiD combining standard DiD and synthetic control methods for few treated units
-- [SyntheticControl](https://diff-diff.readthedocs.io/en/stable/api/synthetic_control.html): Abadie, Diamond & Hainmueller (2010) classic synthetic control for ONE treated unit — donor-weight counterfactual, nested or custom predictor-importance V, gap path + pre-RMSPE; no inference in this release (inference fields are NaN — permutation/placebo planned)
+- [SyntheticControl](https://diff-diff.readthedocs.io/en/stable/api/synthetic_control.html): Abadie, Diamond & Hainmueller (2010) classic synthetic control for ONE treated unit — donor-weight counterfactual, nested or custom predictor-importance V, gap path + pre-RMSPE; no analytical SE (inference fields NaN), significance via in-space placebo permutation inference (`in_space_placebo()`, post/pre RMSPE-ratio, p = rank/(n_placebos+1))
- [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html): Triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
- [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html): Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves
- [HeterogeneousAdoptionDiD](https://diff-diff.readthedocs.io/en/stable/api/had.html): de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) for designs where **no unit remains untreated**; local-linear estimator at the dose support boundary returning Weighted Average Slope (WAS) on Design 1' (`d̲=0` / QUG) or `WAS_{d̲}` on Design 1 (`d̲>0`, continuous-near-d̲ or mass-point), with multi-period event-study extension (last-treatment cohort, pointwise CIs). **Panel-only** in this release (repeated cross-sections rejected by the validator). Alias `HAD`.
diff --git a/diff_diff/practitioner.py b/diff_diff/practitioner.py
index 128f378a..b0d6811f 100644
--- a/diff_diff/practitioner.py
+++ b/diff_diff/practitioner.py
@@ -20,6 +20,7 @@
"estimator_selection",
"estimation",
"sensitivity",
+ "placebo",
"heterogeneity",
"robustness",
}
@@ -37,6 +38,7 @@
"StackedDiDResults": "StackedDiD",
"SyntheticDiDResults": "SyntheticDiD",
"TROPResults": "TROP",
+ "SyntheticControlResults": "SyntheticControl",
"EfficientDiDResults": "EfficientDiD",
"ContinuousDiDResults": "ContinuousDiD",
"TripleDifferenceResults": "TripleDifference (DDD)",
@@ -71,7 +73,7 @@ def practitioner_next_steps(
Steps the caller has already completed. Valid names:
``"target_parameter"``, ``"assumptions"``, ``"parallel_trends"``,
``"estimator_selection"``, ``"estimation"``, ``"sensitivity"``,
- ``"heterogeneity"``, ``"robustness"``.
+ ``"placebo"``, ``"heterogeneity"``, ``"robustness"``.
verbose : bool, default True
If True, print a human-readable summary to stdout.
@@ -656,6 +658,72 @@ def _handle_trop(results: Any):
return steps, warnings
+def _handle_synthetic_control(results: Any):
+ steps = [
+ _step(
+ baker_step=6,
+ label="In-space placebo permutation inference",
+ why=(
+ "Classic SCM has no analytical standard error. Significance "
+ "comes from the in-space placebo test (Abadie-Diamond-Hainmueller "
+ "2010, Section 2.4): reassign treatment to each donor, refit, and "
+ "rank the treated unit's post/pre RMSPE ratio "
+ "(p = rank/(n_placebos+1), excluding non-converged placebos)."
+ ),
+ code=(
+ "placebo_df = results.in_space_placebo()\n"
+ "print(f'placebo p-value: {results.placebo_p_value:.3f} "
+ "(n={results.n_placebos})')\n"
+ "print(placebo_df) # per-unit RMSPE-ratio table used for the rank"
+ ),
+ priority="high",
+ # SCM's significance test IS the placebo; tag it "placebo" (not
+ # "sensitivity") so it survives once the native diagnostics block runs,
+ # mirroring _handle_trop.
+ step_name="placebo",
+ ),
+ _step(
+ baker_step=3,
+ label="Demonstrate pre-treatment fit (SCM identification)",
+ why=(
+ "SCM's identifying assumption is design-enforced fit, not a "
+ "parallel-trends test: it is only credible when the synthetic "
+ "control reproduces the treated unit's pre-period path. Report "
+ "the pre-RMSPE and predictor-balance table; a poor fit means do "
+ "not use SCM (ADH 2010 p. 495)."
+ ),
+ code=(
+ "print(f'pre-treatment RMSPE: {results.pre_rmspe:.4f}')\n"
+ "print(results.predictor_balance)\n"
+ "print(results.get_weights_df()) # donor weight concentration"
+ ),
+ priority="high",
+ # Design-enforced fit IS SCM's parallel-trends analogue (mirrors the
+ # DiagnosticReport ``scm_fit`` PT routing); tagging it "parallel_trends"
+ # keeps it from being auto-suppressed as the completed estimation step.
+ step_name="parallel_trends",
+ ),
+ _step(
+ baker_step=4,
+ label="Curate the donor pool",
+ why=(
+ "Donors exposed to the same/similar intervention or to large "
+ "confounding shocks contaminate the comparison (ADH 2010 "
+ "pp. 498-499). Restrict the donor pool to clean, comparable units."
+ ),
+ code=(
+ "# Exclude contaminated donors explicitly:\n"
+ "# synthetic_control(..., donor_pool=[clean, comparable, units])"
+ ),
+ priority="medium",
+ step_name="estimator_selection",
+ ),
+ _robustness_compare_step("SyntheticDiD or CS"),
+ ]
+ warnings = _check_nan_att(results)
+ return steps, warnings
+
+
def _handle_efficient(results: Any):
steps = [
_parallel_trends_step(staggered=True),
@@ -1301,6 +1369,7 @@ def _handle_generic(results: Any):
"StackedDiDResults": _handle_stacked,
"SyntheticDiDResults": _handle_synthetic,
"TROPResults": _handle_trop,
+ "SyntheticControlResults": _handle_synthetic_control,
"EfficientDiDResults": _handle_efficient,
"ContinuousDiDResults": _handle_continuous,
"TripleDifferenceResults": _handle_triple,
diff --git a/diff_diff/synthetic_control.py b/diff_diff/synthetic_control.py
index 8f349124..164568a2 100644
--- a/diff_diff/synthetic_control.py
+++ b/diff_diff/synthetic_control.py
@@ -17,9 +17,11 @@
time weights and ridge regularization: classic SCM uses **donor weights only** and a
level-matching estimator, plus the outer ``V`` search SyntheticDiD has no analog for.
-Inference: classic SCM has **no analytical standard error** — the paper proposes
-permutation/placebo inference (a later PR). ``se``/``t_stat``/``p_value``/``conf_int``
-are always NaN here; ``att`` (mean post-period gap) is the reported estimate.
+Inference: classic SCM has **no analytical standard error**, so
+``se``/``t_stat``/``p_value``/``conf_int`` are always NaN; ``att`` (mean post-period
+gap) is the reported estimate. Significance comes from in-space placebo permutation
+inference (ADH 2010 §2.4) via ``SyntheticControlResults.in_space_placebo()`` — a
+separate ``placebo_p_value`` field, distinct from the NaN analytical ``p_value``.
Numerics provenance: the standardization divisor and the inner/outer optimization
scheme are NOT specified in ADH (2010) — they are pinned from the R ``Synth`` package
@@ -35,7 +37,10 @@
import pandas as pd
from scipy.optimize import minimize
-from diff_diff.synthetic_control_results import SyntheticControlResults
+from diff_diff.synthetic_control_results import (
+ SyntheticControlResults,
+ _SyntheticControlFitSnapshot,
+)
from diff_diff.utils import _sc_weight_fw, safe_inference, warn_if_not_converged
__all__ = ["SyntheticControl", "synthetic_control"]
@@ -389,6 +394,9 @@ def fit(
# mspe_v is the OUTER-objective value; it is populated only when a nested V
# search actually ran (None on the custom and single-donor paths).
mspe_v: Optional[float] = None
+ # ``outer_converged`` tracks whether the nested V search reached an optimum
+ # (trivially True when there is no outer search: custom V or a single donor).
+ outer_converged = True
if self.v_method == "custom":
v = self._prepare_custom_v(k)
w, converged = _inner_solve_W(X1s, X0s, v, self.inner_max_iter, self.inner_min_decrease)
@@ -405,7 +413,7 @@ def fit(
w = np.array([1.0])
converged = True
else:
- v, w, converged, mspe_v = _outer_solve_V(
+ v, w, converged, mspe_v, outer_converged = _outer_solve_V(
X1,
X0,
X1s,
@@ -426,6 +434,12 @@ def fit(
pre_rmspe = float(np.sqrt(np.mean(pre_gaps**2)))
att = float(np.mean(post_gaps))
+ # Treated unit's post/pre RMSPE ratio — the in-space placebo test statistic
+ # (ADH 2010 §2.4). Cheap to compute now; the placebo reference distribution
+ # is built on demand by SyntheticControlResults.in_space_placebo().
+ treated_scale = float(np.max(np.abs(Z1))) if Z1.size else 0.0
+ rmspe_ratio = _rmspe_ratio(pre_gaps, post_gaps, treated_scale)
+
# Poor-fit warning (REGISTRY contract: warn when pre-RMSPE exceeds the SD of
# the treated unit's pre-period outcomes). This includes a FLAT treated pre-path
# (pre_sd == 0): any non-trivial RMSPE then means the synthetic cannot reproduce
@@ -490,12 +504,45 @@ def fit(
standardize=self.standardize,
alpha=self.alpha,
mspe_v=mspe_v,
+ rmspe_ratio=rmspe_ratio,
+ )
+ # Retain the panel state needed to refit each donor as a pseudo-treated
+ # unit for in-space placebo inference (ADH 2010 §2.4). Stored as a plain
+ # attribute (not a dataclass field) and excluded from pickling via
+ # SyntheticControlResults.__getstate__ (it holds the full panel).
+ # COPY all caller-owned mutable inputs (the custom_v array, the
+ # optimizer_options dict, the specs list) so post-fit mutation of the
+ # estimator's inputs cannot silently change in_space_placebo() output on an
+ # already-returned results object. (pivots are freshly pivoted here, and the
+ # period/id lists are re-listed below, so those are not caller-aliased.)
+ results._fit_snapshot = _SyntheticControlFitSnapshot(
+ pivots=pivots,
+ specs=list(specs),
+ outcome=outcome,
+ all_periods=list(all_periods),
+ pre_periods=list(pre_periods),
+ post_periods=list(post_periods),
+ donor_ids=list(donor_ids),
+ treated_id=treated_id,
+ standardize=self.standardize,
+ v_method=self.v_method,
+ custom_v=(
+ None if self.custom_v is None else np.array(self.custom_v, dtype=float, copy=True)
+ ),
+ n_starts=self.n_starts,
+ seed=self.seed,
+ optimizer_options=(
+ None if self.optimizer_options is None else dict(self.optimizer_options)
+ ),
+ inner_max_iter=self.inner_max_iter,
+ inner_min_decrease=self.inner_min_decrease,
)
- # Reserved for PR-2 (placebo inference) / PR-3 (conformal). Set as plain
- # attributes so dataclasses.asdict/fields cannot reach them.
- results._placebo_gaps = None
- results._rmspe_ratio = None
- results._fit_snapshot = None
+ # Persist whether the treated unit's own fit reached a valid optimum — both
+ # the inner Frank-Wolfe weight solve AND (on the nested path) the outer V
+ # search. in_space_placebo() fails closed otherwise: ranking a statistic from
+ # a truncated / under-optimized treated fit would not be a valid ADH 2010
+ # §2.4 permutation (and an under-optimized V is anti-conservative).
+ results._fit_converged = bool(converged and outer_converged)
self.results_ = results
self.is_fitted_ = True
@@ -1022,17 +1069,20 @@ def _outer_solve_V(
optimizer_options: Optional[Dict[str, Any]],
inner_max_iter: int,
inner_min_decrease: float,
-) -> Tuple[np.ndarray, np.ndarray, bool, float]:
+) -> Tuple[np.ndarray, np.ndarray, bool, float, bool]:
"""Data-driven V selection: minimize pre-period outcome MSPE of W*(V).
Multistart Nelder-Mead over ``theta`` (V = softmax(theta)) plus a derivative-free
- Powell polish from the best point. Returns ``(v_star, w_star, converged, mspe)``.
+ Powell polish from the best point. Returns
+ ``(v_star, w_star, inner_converged, mspe, outer_converged)``.
"""
k = X1s.shape[0]
if k == 1:
+ # Single predictor: V is fixed (no outer search), so outer convergence is
+ # trivially satisfied.
v = np.array([1.0])
w, converged = _inner_solve_W(X1s, X0s, v, inner_max_iter, inner_min_decrease)
- return v, w, converged, float(np.mean((Z1 - Z0 @ w) ** 2))
+ return v, w, converged, float(np.mean((Z1 - Z0 @ w) ** 2)), True
# Track inner Frank-Wolfe non-convergence across ALL intermediate evaluations so
# the outer search cannot silently rank truncated W*(V) solves (codex). `_inner_solve_W`
@@ -1077,26 +1127,41 @@ def objective(theta: np.ndarray) -> float:
_st["total"] += start_total
_st["nonconv"] += start_nonconv
+ # Track convergence of the SELECTED (lowest-objective) incumbent, NOT "any start
+ # converged" — a non-winning start's success says nothing about whether the V we
+ # actually return is a valid optimum. ``best_success`` follows ``best_x``.
best_x: np.ndarray = starts[0]
best_fun = np.inf
- outer_converged = False
+ best_success = False
for theta0 in starts:
res = minimize(objective, theta0, method="Nelder-Mead", options=nm_options)
- outer_converged = outer_converged or bool(res.success)
if res.fun < best_fun:
best_fun = float(res.fun)
best_x = res.x
+ best_success = bool(res.success)
- # Derivative-free polish from the incumbent (best-of, mirrors R optimx).
+ # Derivative-free polish from the incumbent (best-of, mirrors R optimx). If the
+ # polish improves on the incumbent it becomes the selected solution (carry its
+ # success); if it does not improve but converged AT the incumbent's objective
+ # level, that validates the incumbent as an optimum.
res_p = minimize(objective, best_x, method="Powell", options=powell_options)
- outer_converged = outer_converged or bool(res_p.success)
if res_p.fun < best_fun:
best_fun = float(res_p.fun)
best_x = res_p.x
-
- # Surface a silent under-optimized V: if neither the multistart Nelder-Mead nor
- # the Powell polish reported success (e.g. optimizer_options={"maxiter": 1}), the
- # selected V / donor weights / ATT may be sub-optimal.
+ best_success = bool(res_p.success)
+ elif bool(res_p.success) and np.isclose(res_p.fun, best_fun, rtol=1e-5, atol=1e-8):
+ # Powell did not improve but converged back AT the incumbent (same objective
+ # within tolerance) -> the incumbent is a validated optimum. A Powell run that
+ # "succeeds" at a STRICTLY WORSE objective ended elsewhere and says nothing
+ # about the selected incumbent's validity, so it must NOT flip best_success
+ # (which would silently admit an under-optimized V from a success=False start).
+ best_success = True
+ outer_converged = best_success
+
+ # Surface a silent under-optimized V: if the SELECTED outer solution did not
+ # converge (e.g. optimizer_options={"maxiter": 1}, or the lowest-objective
+ # incumbent came from a truncated run), the selected V / donor weights / ATT may
+ # be sub-optimal.
if not outer_converged:
warnings.warn(
"Outer V-search (Nelder-Mead / Powell) did not converge; the selected "
@@ -1125,7 +1190,7 @@ def objective(theta: np.ndarray) -> float:
v_star = _softmax(best_x)
w_star, converged = _inner_solve_W(X1s, X0s, v_star, inner_max_iter, inner_min_decrease)
mspe = float(np.mean((Z1 - Z0 @ w_star) ** 2))
- return v_star, w_star, converged, mspe
+ return v_star, w_star, converged, mspe, outer_converged
def _compute_gap_path(
@@ -1141,3 +1206,107 @@ def _compute_gap_path(
synthetic = donor_block @ w
gaps = treated_series - synthetic
return {period: float(g) for period, g in zip(all_periods, gaps)}
+
+
+# =============================================================================
+# in-space placebo permutation inference (ADH 2010 §2.4) — used by
+# SyntheticControlResults.in_space_placebo() via function-level import
+# =============================================================================
+
+
+def _mspe(gap_path: Dict[Any, float], periods: List[Any]) -> float:
+ """Mean squared prediction error of ``gap_path`` over ``periods``."""
+ if not periods:
+ return float("nan")
+ g = np.array([gap_path[p] for p in periods], dtype=float)
+ return float(np.mean(g**2))
+
+
+def _rmspe_ratio(pre_gaps: np.ndarray, post_gaps: np.ndarray, scale: float) -> float:
+ """Post/pre RMSPE ratio — the in-space placebo test statistic (ADH 2010 §2.4).
+
+ Returns ``RMSPE_post / RMSPE_pre = sqrt(MSPE_post / MSPE_pre)`` (the root-scale
+ ratio, matching the ``rmspe_ratio`` name). ADH 2010 §3.4 reports the *MSPE*
+ ratio (the square of this); the two are monotone-equivalent on nonnegative
+ values, so the permutation rank and p-value are identical either way — only the
+ reported statistic's scale differs.
+
+ The pre-period MSPE denominator is floored at a scale-aware
+ ``1e-8 * max(scale, 1)**2`` (squared-outcome units) BEFORE the square root, so a
+ (near-)perfect pre-treatment fit (pre-MSPE → 0) yields a large-but-FINITE ratio
+ rather than ``inf``/``nan`` (which would corrupt the permutation rank).
+ ``scale`` is the magnitude of the unit's pre-period outcomes. Mirrors the
+ ``_fit_tol`` poor-fit guard in ``fit()``.
+ """
+ pre_mspe = float(np.mean(pre_gaps**2)) if pre_gaps.size else float("nan")
+ post_mspe = float(np.mean(post_gaps**2)) if post_gaps.size else float("nan")
+ floor = 1e-8 * max(float(scale), 1.0) ** 2
+ return float(np.sqrt(post_mspe / max(pre_mspe, floor)))
+
+
+def _placebo_fit_unit(
+ snap: _SyntheticControlFitSnapshot,
+ unit: Any,
+ donor_pool: List[Any],
+ n_starts: int,
+) -> Optional[Tuple[Dict[Any, float], float]]:
+ """Refit a synthetic control for one (pseudo-)treated ``unit`` vs ``donor_pool``.
+
+ Reuses the exact predictor build / standardization / weight solve / gap-path
+ path as ``fit()`` — none of those helpers reads estimator (``self``) state, so
+ the refit is driven entirely by the snapshot. The real treated unit is never in
+ ``donor_pool`` (the caller passes the other ``J−1`` donors). Returns
+ ``(gap_path, rmspe_ratio)``, or ``None`` when the weight solve does not converge
+ (the caller excludes such placebos from the permutation reference set).
+ Per-placebo ``UserWarning``s (poor fit, zero-variance row, non-convergence) are
+ suppressed here; the caller surfaces an aggregate count.
+ """
+ X1, X0, _ = _build_predictor_matrix(snap.pivots, snap.specs, unit, donor_pool)
+ # Belt-and-suspenders: fit() already gated non-finite outcomes over the full
+ # treated+donor panel, so a donor reassigned as pseudo-treated has finite cells.
+ if not (np.all(np.isfinite(X1)) and np.all(np.isfinite(X0))):
+ return None
+ X1s, X0s, _ = _standardize(X1, X0, snap.standardize)
+ Y = snap.pivots[snap.outcome]
+ Z1 = Y.loc[snap.pre_periods, unit].to_numpy(dtype=float)
+ Z0 = Y.loc[snap.pre_periods, donor_pool].to_numpy(dtype=float)
+ # ``outer_converged`` is trivially True when there is no outer V search (custom
+ # V or a single-donor degenerate pool).
+ outer_converged = True
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ if snap.v_method == "custom":
+ # custom_v length == k (= len(specs)) is unit-agnostic, so it stays
+ # valid for every placebo; it was finiteness/non-negativity-checked at
+ # construction. Just trace-normalize (matches _prepare_custom_v).
+ v = np.asarray(snap.custom_v, dtype=float).ravel()
+ v = v / v.sum()
+ w, converged = _inner_solve_W(X1s, X0s, v, snap.inner_max_iter, snap.inner_min_decrease)
+ elif len(donor_pool) == 1:
+ # Degenerate: a single donor forces w = [1.0]; V is irrelevant.
+ w, converged = np.array([1.0]), True
+ else:
+ _, w, converged, _, outer_converged = _outer_solve_V(
+ X1,
+ X0,
+ X1s,
+ X0s,
+ Z1,
+ Z0,
+ n_starts,
+ snap.seed,
+ snap.optimizer_options,
+ snap.inner_max_iter,
+ snap.inner_min_decrease,
+ )
+ # Exclude a placebo whose fit is not a valid optimum — a truncated inner W OR an
+ # under-optimized outer V search. An under-optimized placebo V fits the pre-period
+ # worse, shrinking its RMSPE ratio and biasing the permutation p-value
+ # anti-conservatively, so such placebos must not silently enter the rank.
+ if not (converged and outer_converged):
+ return None
+ gap_path = _compute_gap_path(Y, w, unit, donor_pool, snap.all_periods)
+ pre_gaps = np.array([gap_path[p] for p in snap.pre_periods], dtype=float)
+ post_gaps = np.array([gap_path[p] for p in snap.post_periods], dtype=float)
+ scale = float(np.max(np.abs(Z1))) if Z1.size else 0.0
+ return gap_path, _rmspe_ratio(pre_gaps, post_gaps, scale)
diff --git a/diff_diff/synthetic_control_results.py b/diff_diff/synthetic_control_results.py
index e36c8352..8e412057 100644
--- a/diff_diff/synthetic_control_results.py
+++ b/diff_diff/synthetic_control_results.py
@@ -5,12 +5,15 @@
``synthetic_control.py`` to mirror the TROP estimator/results split.
The classic synthetic control of Abadie, Diamond & Hainmueller (2010) produces a
-gap path and donor/predictor weights but **no analytical standard error** — the
-paper proposes permutation/placebo inference instead (a later PR). Accordingly
-``se``/``t_stat``/``p_value``/``conf_int`` are always NaN on this object; the
-point estimate ``att`` (average post-period gap) is the reported quantity.
+gap path and donor/predictor weights but **no analytical standard error**.
+Accordingly ``se``/``t_stat``/``p_value``/``conf_int`` are always NaN on this
+object; the point estimate ``att`` (average post-period gap) is the reported
+quantity. Significance comes from in-space placebo permutation inference via
+:meth:`SyntheticControlResults.in_space_placebo` (a separate ``placebo_p_value``
+field, not the NaN ``p_value``).
"""
+import warnings
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Tuple
@@ -22,6 +25,40 @@
__all__ = ["SyntheticControlResults"]
+@dataclass
+class _SyntheticControlFitSnapshot:
+ """Panel state retained for post-hoc in-space placebo refits.
+
+ Holds everything ``SyntheticControlResults.in_space_placebo()`` needs to
+ refit ANY donor as the pseudo-treated unit without re-reading the original
+ DataFrame. Built in ``SyntheticControl.fit()`` and excluded from pickling by
+ ``SyntheticControlResults.__getstate__`` (it retains the full treated+donor
+ outcome/predictor panel — a privacy/size hazard if serialized).
+
+ ``specs`` is annotated ``List[Any]`` rather than ``List[_PredictorSpec]`` to
+ avoid an import cycle (``_PredictorSpec`` lives in ``synthetic_control.py``,
+ which imports this module). ``donor_ids`` is an ORDERED list so the placebo
+ iteration order — and therefore the rank / p-value — is deterministic.
+ """
+
+ pivots: Dict[str, pd.DataFrame]
+ specs: List[Any]
+ outcome: str
+ all_periods: List[Any]
+ pre_periods: List[Any]
+ post_periods: List[Any]
+ donor_ids: List[Any]
+ treated_id: Any
+ standardize: str
+ v_method: str
+ custom_v: Optional[Any]
+ n_starts: int
+ seed: Optional[int]
+ optimizer_options: Optional[Dict[str, Any]]
+ inner_max_iter: int
+ inner_min_decrease: float
+
+
@dataclass
class SyntheticControlResults:
"""
@@ -82,8 +119,22 @@ class SyntheticControlResults:
``"std"`` (per-row SD scaling) or ``"none"``.
alpha : float
Significance level recorded for downstream (placebo) inference.
+ rmspe_ratio : float
+ The treated unit's post/pre RMSPE ratio = ``sqrt(MSPE_post / MSPE_pre)`` —
+ the in-space placebo test statistic (ADH 2010 §2.4), computed at fit time.
+ placebo_p_value : float
+ In-space placebo permutation p-value (``rank / (n_placebos + 1)``), NaN
+ until :meth:`in_space_placebo` is run. SEPARATE from the (always-NaN)
+ analytical ``p_value``; ``is_significant`` stays bound to ``p_value``.
+ n_placebos, n_failed : int
+ Donor placebos that entered the permutation reference set / were excluded
+ for non-convergence. Both 0 until :meth:`in_space_placebo` is run.
survey_metadata : Any, optional
Reserved; always None in this release.
+
+ Significance for classic SCM comes from :meth:`in_space_placebo` (opt-in
+ in-space placebo permutation inference); :meth:`get_placebo_df` returns the
+ per-unit RMSPE-ratio table used for the rank.
"""
att: float
@@ -108,13 +159,57 @@ class SyntheticControlResults:
alpha: float = 0.05
mspe_v: Optional[float] = None
survey_metadata: Optional[Any] = field(default=None)
-
- # Reserved for PR-2 (placebo inference) / PR-3 (conformal). These are plain
- # (un-annotated) class attributes, NOT dataclass fields, so dataclasses.fields()
- # and dataclasses.asdict() cannot reach them; fit() sets them per instance.
- _placebo_gaps = None
- _rmspe_ratio = None
- _fit_snapshot = None
+ # In-space placebo permutation inference (Abadie-Diamond-Hainmueller 2010
+ # Section 2.4), populated by ``in_space_placebo()``. ``rmspe_ratio`` (the
+ # treated unit's post/pre RMSPE ratio) is computed at fit time; the rest stay
+ # at their no-inference defaults until a placebo run. NOTE: the permutation
+ # ``placebo_p_value`` is deliberately SEPARATE from ``p_value`` (which stays
+ # NaN) — it is not an analytical p-value, has no SE / t-stat, and does not
+ # flow through ``safe_inference``. ``is_significant`` likewise stays bound to
+ # the (NaN) ``p_value``, NOT ``placebo_p_value``.
+ placebo_p_value: float = np.nan
+ rmspe_ratio: float = np.nan
+ n_placebos: int = 0
+ n_failed: int = 0
+
+ def __post_init__(self) -> None:
+ # Internal state set per instance by ``fit()`` / ``in_space_placebo()``.
+ # Declared here (not as dataclass fields) so ``dataclasses.fields()`` /
+ # ``dataclasses.asdict()`` cannot reach the retained panel state.
+ # ``_fit_snapshot`` (full panel) and ``_placebo_gaps`` (per-unit gap paths)
+ # are panel-derived and nulled on pickle by ``__getstate__``; ``_placebo_df``
+ # holds the small per-unit aggregate table returned by ``get_placebo_df()``.
+ self._fit_snapshot: Optional[_SyntheticControlFitSnapshot] = None
+ self._placebo_gaps: Optional[Dict[Any, Dict[Any, float]]] = None
+ self._placebo_df: Optional[pd.DataFrame] = None
+ # Whether the treated unit's own inner Frank-Wolfe weight solve converged.
+ # in_space_placebo() fails closed when this is False: a truncated treated
+ # fit makes the ranked statistic (rmspe_ratio) not a valid SCM optimum.
+ self._fit_converged: bool = True
+ # Explicit reason an in-space placebo run was infeasible/absent, set by
+ # in_space_placebo(). summary() / _scm_native render THIS instead of
+ # reconstructing the cause from counts — n_placebos/n_failed alone cannot
+ # tell a non-converged treated fit ("treated_fit_nonconverged", n_failed=0)
+ # apart from too few donors ("too_few_donors", also n_failed=0). Values:
+ # None (not run), "ran", "treated_fit_nonconverged", "too_few_donors",
+ # "all_placebos_failed". A small string, so it survives pickling.
+ self._placebo_status: Optional[str] = None
+
+ def __getstate__(self) -> Dict[str, Any]:
+ """Exclude panel-derived internal state from pickling.
+
+ ``_fit_snapshot`` retains the full treated+donor panel and ``_placebo_gaps``
+ the per-unit gap paths — both panel-derived, a privacy/size hazard if the
+ pickle is sent elsewhere. The scalar placebo fields (``placebo_p_value``,
+ ``rmspe_ratio``, ``n_placebos``, ``n_failed``) and the small ``_placebo_df``
+ aggregate table survive. An unpickled result keeps all public fields; a
+ diagnostic call that needs the snapshot (``in_space_placebo``) then raises a
+ ValueError directing the user to re-fit. Mirrors ``SyntheticDiDResults``.
+ """
+ state = self.__dict__.copy()
+ state["_fit_snapshot"] = None
+ state["_placebo_gaps"] = None
+ return state
def __repr__(self) -> str:
"""Concise string representation."""
@@ -209,12 +304,70 @@ def summary(self, alpha: Optional[float] = None) -> str:
f"{'ATT (avg gap)':<15} {self.att:>12.4f} {'n/a':>12} " f"{'n/a':>10} {'n/a':>10}",
"-" * 75,
"",
- "Inference: classic SCM has no analytical standard error.",
- "Use permutation / placebo inference for significance testing",
- "(Abadie-Diamond-Hainmueller 2010, Section 2.4).",
- "=" * 75,
]
)
+ # Three states: (1) placebo never run -> point to in_space_placebo();
+ # (2) run with a valid reference set -> show the permutation p-value;
+ # (3) run but infeasible (no placebo entered the rank, e.g. J<2 or all
+ # donors failed) -> say so explicitly rather than implying it was not run.
+ # ``_placebo_df is not None`` is the "attempted" signal (survives pickling).
+ placebo_attempted = self._placebo_df is not None
+ if placebo_attempted and np.isfinite(self.placebo_p_value):
+ # The classic analytical fields above stay n/a (no SE); this is the
+ # permutation p-value of the post/pre RMSPE ratio, p = rank/(n_placebos+1).
+ lines.extend(
+ [
+ "In-space placebo permutation inference "
+ "(Abadie-Diamond-Hainmueller 2010, Section 2.4):",
+ f"{' RMSPE ratio (post/pre):':<34} {self.rmspe_ratio:>10.4f}",
+ f"{' Permutation p-value:':<34} {self.placebo_p_value:>10.4f}",
+ f"{' Placebos in reference set:':<34} {self.n_placebos:>10d}"
+ + (f" ({self.n_failed} excluded)" if self.n_failed else ""),
+ "",
+ "(Analytical SE is still undefined for classic SCM; the "
+ "p-value above is permutation-based.)",
+ "=" * 75,
+ ]
+ )
+ elif placebo_attempted:
+ # Render the SPECIFIC reason recorded by in_space_placebo(); the count
+ # fields (n_placebos=0, n_failed=0) cannot tell a non-converged treated
+ # fit apart from too-few-donors, so do not reconstruct it from counts.
+ status = getattr(self, "_placebo_status", None)
+ if status == "treated_fit_nonconverged":
+ reason = [
+ "In-space placebo was skipped: the treated unit's own SCM fit "
+ "did not converge at fit time (inner Frank-Wolfe weight solve",
+ "and/or outer V search), so its RMSPE ratio is not a valid "
+ "optimum to rank against placebos. placebo_p_value is undefined",
+ "— re-fit with a larger inner_max_iter / looser "
+ "inner_min_decrease and/or a larger optimizer_options['maxiter']",
+ "/ more n_starts.",
+ ]
+ elif status == "too_few_donors":
+ reason = [
+ "In-space placebo inference requires at least 2 donors (each "
+ "placebo is fit against the other donors); too few were",
+ "available. placebo_p_value is undefined. Inspect " "get_placebo_df().",
+ ]
+ else: # "all_placebos_failed" (or a legacy unpickle without the status)
+ reason = [
+ "In-space placebo permutation inference was attempted but "
+ "produced no valid reference set",
+ f"(0 placebos entered the rank; {self.n_failed} failed to "
+ "converge). placebo_p_value is undefined — all donor refits",
+ "failed. Inspect get_placebo_df().",
+ ]
+ lines.extend([*reason, "=" * 75])
+ else:
+ lines.extend(
+ [
+ "Inference: classic SCM has no analytical standard error.",
+ "Run in_space_placebo() for in-space permutation inference",
+ "(Abadie-Diamond-Hainmueller 2010, Section 2.4).",
+ "=" * 75,
+ ]
+ )
return "\n".join(lines)
@@ -248,6 +401,13 @@ def to_dict(self) -> Dict[str, Any]:
"treated_unit": self.treated_unit,
"v_method": self.v_method,
"standardize": self.standardize,
+ # In-space placebo permutation inference. rmspe_ratio is set at fit;
+ # placebo_p_value / n_placebos / n_failed stay at their no-inference
+ # defaults (NaN / 0) until in_space_placebo() runs.
+ "rmspe_ratio": self.rmspe_ratio,
+ "placebo_p_value": self.placebo_p_value,
+ "n_placebos": self.n_placebos,
+ "n_failed": self.n_failed,
}
if self.survey_metadata is not None:
sm = self.survey_metadata
@@ -293,3 +453,277 @@ def get_weights_df(self) -> pd.DataFrame:
[{"unit": unit, "weight": w} for unit, w in items],
columns=["unit", "weight"],
)
+
+ _PLACEBO_COLS = ["unit", "pre_mspe", "post_mspe", "rmspe_ratio", "is_treated", "status"]
+
+ def get_placebo_df(self) -> pd.DataFrame:
+ """
+ Get the in-space placebo distribution as a DataFrame (one row per unit).
+
+ This is a per-unit SUMMARY table (one row per unit), enough to reproduce
+ the permutation rank and a ratio-distribution plot — NOT the per-period
+ placebo gap paths needed for the classic "spaghetti" plot (those are
+ retained internally on ``_placebo_gaps`` for the successful placebos).
+ Columns: ``unit``, ``pre_mspe``, ``post_mspe``, ``rmspe_ratio``,
+ ``is_treated``, ``status`` (``"treated"`` / ``"placebo"`` / ``"failed"``).
+ The treated unit is always present as a single ``is_treated=True,
+ status="treated"`` row (its ratio is the original J-donor fit). After a
+ placebo run **that produced a reference set** (``>= 2`` donors AND a
+ converged treated fit), the table has ``n_donors + 1`` rows — every donor
+ appears, including those whose refit did not converge (``status="failed"``
+ with NaN metrics, excluded from the rank). In the degenerate / fail-closed
+ cases (fewer than 2 donors, or a treated fit that did not converge) the
+ placebo loop does not run, so only the treated row is returned.
+
+ Populated by :meth:`in_space_placebo`; the summary table is retained on
+ pickling, so it is still returned after a round-trip. Before any placebo
+ run — including on an unpickled result that never ran one — only the
+ treated row is returned.
+
+ Returns
+ -------
+ pandas.DataFrame
+ """
+ if self._placebo_df is not None:
+ return self._placebo_df.copy()
+ from diff_diff.synthetic_control import _mspe
+
+ pre = _mspe(self.gap_path, self.pre_periods)
+ post = _mspe(self.gap_path, self.post_periods)
+ return pd.DataFrame(
+ [
+ {
+ "unit": self.treated_unit,
+ "pre_mspe": pre,
+ "post_mspe": post,
+ "rmspe_ratio": self.rmspe_ratio,
+ "is_treated": True,
+ "status": "treated",
+ }
+ ],
+ columns=self._PLACEBO_COLS,
+ )
+
+ def in_space_placebo(
+ self,
+ n_starts: Optional[int] = None,
+ ) -> pd.DataFrame:
+ """
+ In-space placebo permutation inference (Abadie-Diamond-Hainmueller 2010,
+ Section 2.4).
+
+ Reassigns the treatment to each donor in turn, re-estimates a synthetic
+ control for that pseudo-treated donor against the OTHER donors, and ranks
+ the real treated unit's post/pre RMSPE ratio among all units. Populates
+ ``placebo_p_value``, ``n_placebos`` and ``n_failed`` on this object
+ (``rmspe_ratio`` — the treated unit's own ratio — is set at fit time) and
+ returns the placebo distribution via :meth:`get_placebo_df`.
+
+ The real treated unit is **excluded from every placebo's donor pool**: its
+ post-period outcome is treatment-contaminated, so allowing a placebo to
+ load weight on it would bias the placebo gap. The ranking set is therefore
+ the ``J+1`` units ``{treated} ∪ {J placebos}``, with each placebo fit
+ against the other ``J-1`` donors (this matches the standard
+ ``SCtools::generate.placebos`` construction). The post/pre RMSPE ratio
+ normalizes by pre-treatment fit, which obviates the pre-fit-cutoff
+ filtering of ADH Figures 5-7 (journal p. 502), so no pre-fit filter is
+ offered — every converged placebo enters the rank.
+
+ The permutation ``placebo_p_value`` is intentionally distinct from
+ ``p_value`` (which stays NaN — classic SCM has no analytical SE) and from
+ ``is_significant`` (which also stays bound to the NaN ``p_value``).
+
+ A placebo is **excluded** from the reference set (counted in ``n_failed``)
+ when its fit is not a valid optimum — EITHER its inner Frank-Wolfe weight
+ solve did not converge (a truncated ``W`` is unusable) OR its outer ``V``
+ search did not converge (an under-optimized ``V`` fits the pre-period worse,
+ shrinking its RMSPE ratio and biasing the permutation p-value
+ anti-conservatively). Each placebo refit **inherits the original fit's
+ ``optimizer_options`` / ``n_starts``**, so valid inference requires settings
+ adequate for the outer ``V`` search to converge: production defaults do;
+ with cheap settings, raise ``n_starts`` here or re-fit with a larger
+ ``optimizer_options['maxiter']`` (otherwise placebos are dropped as failed).
+ The treated unit's own fit is held to the same standard — if its inner OR
+ outer search did not converge, the whole run fails closed (see below).
+
+ Parameters
+ ----------
+ n_starts : int, optional
+ Override the multistart count for each placebo's nested V search.
+ Default None inherits the original fit's ``n_starts``. The placebo
+ loop is the cost driver (one outer V search per donor); lower it for a
+ faster, coarser scan.
+
+ Returns
+ -------
+ pandas.DataFrame
+ The placebo distribution (see :meth:`get_placebo_df`).
+
+ Raises
+ ------
+ ValueError
+ If the fit snapshot is unavailable (e.g. this result was unpickled).
+ """
+ if self._fit_snapshot is None:
+ raise ValueError(
+ "in_space_placebo() requires the fit snapshot on the results "
+ "object. This result appears to have been loaded from "
+ "serialization (which excludes the snapshot) or produced by an "
+ "older estimator version. Re-fit to enable in-space placebo "
+ "inference."
+ )
+ from diff_diff.synthetic_control import _mspe, _placebo_fit_unit
+
+ snap = self._fit_snapshot
+ donors = list(snap.donor_ids)
+ n_donors = len(donors)
+ if n_starts is None:
+ n_starts_eff = snap.n_starts
+ else:
+ # Mirror the estimator constructor's validation (synthetic_control.py)
+ # so a bad override fails fast instead of silently coercing (e.g. via
+ # int(0)/int(-1)) into a degenerate or invalid permutation procedure.
+ if not isinstance(n_starts, (int, np.integer)) or n_starts < 1:
+ raise ValueError(f"n_starts override must be a positive integer, got {n_starts!r}")
+ n_starts_eff = int(n_starts)
+
+ treated_pre = _mspe(self.gap_path, snap.pre_periods)
+ treated_post = _mspe(self.gap_path, snap.post_periods)
+ treated_ratio = self.rmspe_ratio
+
+ rows: List[Dict[str, Any]] = [
+ {
+ "unit": snap.treated_id,
+ "pre_mspe": treated_pre,
+ "post_mspe": treated_post,
+ "rmspe_ratio": treated_ratio,
+ "is_treated": True,
+ "status": "treated",
+ }
+ ]
+
+ # Fail closed when the treated unit's OWN fit did not converge at fit time
+ # (inner Frank-Wolfe weight solve OR outer V search): ranking a statistic
+ # from a truncated / under-optimized treated fit would not be a valid ADH
+ # 2010 §2.4 permutation (placebos already fail-closed on non-convergence, so
+ # the treated unit must too). ``_fit_converged`` folds both failure modes, so
+ # the remediation names the knobs for each.
+ if not self._fit_converged:
+ warnings.warn(
+ "In-space placebo skipped: the treated unit's own SCM fit did not "
+ "converge at fit time (inner Frank-Wolfe weight solve and/or outer V "
+ "search), so its RMSPE ratio is not a valid optimum to rank against "
+ "placebos. placebo_p_value is NaN — re-fit with a larger "
+ "inner_max_iter / looser inner_min_decrease (inner) and/or a larger "
+ "optimizer_options['maxiter'] / more n_starts (outer V search).",
+ UserWarning,
+ stacklevel=2,
+ )
+ self.placebo_p_value = np.nan
+ self.n_placebos = 0
+ self.n_failed = 0
+ self._placebo_gaps = {}
+ self._placebo_status = "treated_fit_nonconverged"
+ self._placebo_df = pd.DataFrame(rows, columns=self._PLACEBO_COLS)
+ return self._placebo_df.copy()
+
+ if n_donors < 2:
+ warnings.warn(
+ "In-space placebo inference requires at least 2 donors (each "
+ f"placebo is fit against the other donors); only {n_donors} "
+ "available. placebo_p_value is NaN.",
+ UserWarning,
+ stacklevel=2,
+ )
+ self.placebo_p_value = np.nan
+ self.n_placebos = 0
+ self.n_failed = 0
+ self._placebo_gaps = {}
+ self._placebo_status = "too_few_donors"
+ self._placebo_df = pd.DataFrame(rows, columns=self._PLACEBO_COLS)
+ return self._placebo_df.copy()
+
+ if n_donors == 2:
+ warnings.warn(
+ "In-space placebo with 2 donors: each placebo is fit against a "
+ "single donor (degenerate weight w=[1]) with no V search, so the "
+ "permutation p-value is coarse (only 2 placebos enter the "
+ "reference set; the smallest attainable p-value is 1/3).",
+ UserWarning,
+ stacklevel=2,
+ )
+
+ placebo_gaps: Dict[Any, Dict[Any, float]] = {}
+ ranked_ratios: List[float] = []
+ n_failed = 0
+
+ for j in donors:
+ pool = [d for d in donors if d != j]
+ fitted = _placebo_fit_unit(snap, j, pool, n_starts_eff)
+ if fitted is None:
+ # Non-converged inner Frank-Wolfe weight solve (a truncated W is
+ # unusable for ranking): exclude from BOTH the numerator and the
+ # denominator (never penalize a truncated solve into the rank).
+ # Still record the donor with NaN metrics so get_placebo_df()
+ # returns the full treated + every-donor unit set.
+ n_failed += 1
+ rows.append(
+ {
+ "unit": j,
+ "pre_mspe": np.nan,
+ "post_mspe": np.nan,
+ "rmspe_ratio": np.nan,
+ "is_treated": False,
+ "status": "failed",
+ }
+ )
+ continue
+ gap_path_j, ratio_j = fitted
+ placebo_gaps[j] = gap_path_j
+ pre_j = _mspe(gap_path_j, snap.pre_periods)
+ post_j = _mspe(gap_path_j, snap.post_periods)
+ ranked_ratios.append(ratio_j)
+ rows.append(
+ {
+ "unit": j,
+ "pre_mspe": pre_j,
+ "post_mspe": post_j,
+ "rmspe_ratio": ratio_j,
+ "is_treated": False,
+ "status": "placebo",
+ }
+ )
+
+ n_placebos = len(ranked_ratios)
+ if n_placebos == 0:
+ warnings.warn(
+ "No in-space placebo entered the reference set (all donors "
+ f"failed to converge or were filtered out of {n_donors}); "
+ "placebo_p_value is NaN.",
+ UserWarning,
+ stacklevel=2,
+ )
+ p_value = np.nan
+ else:
+ # Upper-tail rank on the (unsigned) RMSPE ratio, treated unit included
+ # as the "+1". Ties counted via ``>=`` so the p-value is conservative.
+ # (The ratio squares the gaps -> direction-agnostic, NOT a signed test.)
+ rank = 1 + sum(1 for r in ranked_ratios if r >= treated_ratio)
+ p_value = rank / (n_placebos + 1)
+
+ if n_failed > 0:
+ warnings.warn(
+ f"{n_failed} of {n_donors} in-space placebos failed to converge "
+ "and were excluded from the permutation distribution; "
+ f"placebo_p_value uses the remaining {n_placebos}.",
+ UserWarning,
+ stacklevel=2,
+ )
+
+ self.placebo_p_value = float(p_value)
+ self.n_placebos = int(n_placebos)
+ self.n_failed = int(n_failed)
+ self._placebo_gaps = placebo_gaps
+ self._placebo_status = "ran" if n_placebos > 0 else "all_placebos_failed"
+ self._placebo_df = pd.DataFrame(rows, columns=self._PLACEBO_COLS)
+ return self._placebo_df.copy()
diff --git a/docs/api/synthetic_control.rst b/docs/api/synthetic_control.rst
index a9377898..506c7422 100644
--- a/docs/api/synthetic_control.rst
+++ b/docs/api/synthetic_control.rst
@@ -19,9 +19,11 @@ over the post periods.
- Aggregate / few-unit comparative case studies (states, regions, countries).
**Inference:** classic SCM has **no analytical standard error**. ``se``, ``t_stat``,
-``p_value`` and ``conf_int`` are NaN; ``att`` (the mean post-period gap) is the reported
-estimate. The paper's in-space placebo permutation inference (post/pre RMSPE-ratio,
-``rank/(J+1)`` p-value) is planned for a follow-up release.
+``p_value`` and ``conf_int`` are always NaN; ``att`` (the mean post-period gap) is the
+reported estimate. Significance comes from **in-space placebo permutation inference** via
+:meth:`~diff_diff.SyntheticControlResults.in_space_placebo` (post/pre RMSPE-ratio statistic,
+``placebo_p_value = rank/(n_placebos+1)``). This permutation p-value is a separate field
+from the (NaN) ``p_value``; ``is_significant`` stays bound to ``p_value``.
**Distinct from** :class:`~diff_diff.SyntheticDiD` (Arkhangelsky et al. 2021), which adds
time weights and ridge regularization; classic SCM uses **donor weights only** plus the
@@ -67,6 +69,8 @@ Results container for synthetic control estimation.
.. autosummary::
+ ~SyntheticControlResults.in_space_placebo
+ ~SyntheticControlResults.get_placebo_df
~SyntheticControlResults.summary
~SyntheticControlResults.print_summary
~SyntheticControlResults.to_dict
@@ -155,6 +159,13 @@ Quick estimation with the convenience function::
)
print(f"ATT: {results.att:.3f}, pre-RMSPE: {results.pre_rmspe:.3f}")
+In-space placebo permutation inference (opt-in; refits one synthetic control per donor)::
+
+ placebo_df = results.in_space_placebo() # reassigns treatment to each donor
+ print(f"placebo p-value: {results.placebo_p_value:.3f} "
+ f"(n_placebos={results.n_placebos})") # p = rank/(n_placebos+1)
+ print(placebo_df) # per-unit RMSPE-ratio table used for the permutation rank
+
Supplying a fixed predictor-importance matrix (skips the outer V search)::
import numpy as np
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 736cbd92..26420281 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -1984,7 +1984,7 @@ Classic synthetic control (donor/unit weights only) for a single treated unit, d
- **Outer solve (`v_method="nested"`):** choose diagonal PSD `V` minimizing the pre-period **outcome** MSPE `mean((Z1 − Z0·W*(V))²)`. `v_method="custom"` skips the outer search and uses a user-supplied `custom_v` (trace-normalized).
- **Effect:** gap path `α̂_1t = Y_1t − Σ_j w_j·Y_jt`; `att` = mean post-period gap; `pre_rmspe` = pre-period fit diagnostic.
-**Inference:** **No analytical standard error** (Section 2.4) — `se`/`t_stat`/`p_value`/`conf_int` are NaN. The paper proposes in-space placebo permutation inference with the post/pre RMSPE-ratio statistic and `rank/(J+1)` p-value (deferred to a follow-up PR; `_placebo_gaps`/`_rmspe_ratio` are reserved on the results object).
+**Inference:** **No analytical standard error** (Section 2.4) — `se`/`t_stat`/`p_value`/`conf_int` are always NaN. Significance comes from **in-space placebo permutation inference** via `SyntheticControlResults.in_space_placebo()`: reassign treatment to each donor, refit a synthetic control for it, and rank the treated unit's post/pre RMSPE ratio (`rmspe_ratio` = `RMSPE_post / RMSPE_pre` = `sqrt(MSPE_post / MSPE_pre)`) among all units; `placebo_p_value = rank / (n_placebos + 1)`, where `rank = 1 + #{placebos with ratio ≥ treated ratio}` — an **upper-tail rank test on the (unsigned) RMSPE-ratio statistic**, ties counted conservatively via `≥`. Because the ratio squares the gaps it is direction-agnostic: a large ratio signals an effect of *either* sign, so this is NOT a signed/one-directional ("one-sided") hypothesis test on the treatment-effect direction. ADH 2010 §3.4 reports the *MSPE* ratio (the square of `rmspe_ratio`); the two are monotone-equivalent, so the rank and p-value are identical — only the reported statistic's scale differs. `rmspe_ratio` (the treated statistic) is computed at fit time; `placebo_p_value` / `n_placebos` / `n_failed` are populated by the opt-in `in_space_placebo()` call. `get_placebo_df()` returns the per-unit ratio table used for the rank (the per-period placebo gap paths for a "spaghetti" plot are retained internally, not in this summary).
**Notes / deviations:**
- **Note:** The standardization divisor `divisor = sqrt(apply(cbind(X0,X1), 1, var))` (per-predictor SD over donors+treated, ddof=1) and the inner/outer optimizer are **not specified in ADH 2010** (which defers these numerics to Abadie & Gardeazabal 2003 App. B / the `Synth` software). The divisor is pinned from the R `Synth::synth` source; `solution.v` lives in this scaled predictor space, so the deterministic R-parity test feeds `custom_v` in the same scaled space.
@@ -1994,8 +1994,12 @@ Classic synthetic control (donor/unit weights only) for a single treated unit, d
- **Deviation from R:** `standardize="none"` disables predictor standardization entirely; R `Synth` always scales by the predictor SD. Provided for diagnostics; changes the geometry of the `V` objective.
- **Note:** predictor rows support only **equal-weight** linear combinations of pre-period values — `mean` (`k_s = 1/T0`), `sum` (`k_s = 1`), and per-period outcome lags (identity, a single `k_s = 1`). ADH (2010) §2.3 defines the general form `Ȳ_i^{K_m} = Σ_s k_s Y_is` with *arbitrary* weights `k_s`; this release does NOT accept user-supplied non-uniform `K_m` weight vectors (and `median` and other non-linear aggregations are intentionally excluded). The supported set still spans the standard `Synth::dataprep` `predictors.op` + `special.predictors` usage; arbitrary-weight `K_m` is a deferred extension.
- **Deviation from R:** predictor/outcome **aggregation fails closed on any non-finite (NaN/inf) cell**, whereas R `Synth::dataprep` hardwires `na.rm=TRUE` (aggregating over the observed cells of a partially-missing window). The fail-closed contract is deliberate: na-dropping silently aggregates different period subsets across units, yielding incomparable predictors with no warning. The analyst must restrict `predictor_window` / `special_predictors` / `pre_period_outcomes` periods (and the outcome panel) to where each variable is observed; both partially- and fully-missing windows raise `ValueError`. Only the row *ordering* matches `dataprep`, not the missing-data handling.
+- **Note (in-space placebo donor pool):** the real treated unit is **excluded from every placebo's donor pool** — when donor `j` is pseudo-treated it is fit against the other `J−1` donors, never the real treated unit (whose post-period is treatment-contaminated). The ranking set is still the `J+1` units {treated} ∪ {J placebos}. ADH 2010 §2.4 does not spell out placebo donor-pool composition; this matches the standard `SCtools::generate.placebos` construction (rotate the pseudo-treated identity through the donor pool; the original treated unit is never re-added as a donor).
+- **Note (placebo failure handling):** a placebo is **excluded from both the numerator and the denominator** of the rank (never penalized into it) and tallied in `n_failed` when its fit is not a valid optimum — EITHER its **inner Frank-Wolfe weight solve** did not converge (a truncated `W` is unusable) OR its **outer `V` search** did not converge (an under-optimized `V` fits the pre-period worse, shrinking the RMSPE ratio and biasing the p-value anti-conservatively, so it must not silently enter the rank). The reported p-value uses the **effective** count `rank / (n_placebos + 1)`, where `n_placebos` is the number of placebos that entered the reference set. Failed donors still appear in `get_placebo_df()` (`status="failed"`, NaN metrics), so once a reference set is produced the table is the full treated + every-donor unit set (`n_donors + 1` rows). In the fail-closed cases the placebo loop does not run and only the treated row is returned: `J < 2` → `placebo_p_value` is NaN with a warning (no placebo distribution; `J == 2` warns the distribution is coarse), and a treated fit whose own **inner OR outer** search did not converge also fails closed (ranking a truncated / under-optimized treated statistic would not be a valid permutation). **Caveat:** each placebo refit inherits the original fit's `optimizer_options` / `n_starts`, so valid inference requires settings adequate for the outer `V` search to converge to a comparable-quality synthetic (production defaults do; cheap settings under-optimize placebo `V` and those placebos are dropped as failed — raise `n_starts` on `in_space_placebo()` or re-fit with a larger `optimizer_options['maxiter']`).
+- **Note (RMSPE-ratio floor):** the reported `rmspe_ratio = sqrt(MSPE_post / MSPE_pre)` floors the pre-period MSPE denominator at a scale-aware `1e-8 · max(|pre-outcomes|, 1)²` (before the square root) so a (near-)perfect pre-fit (`pre-MSPE → 0`) yields a large-but-FINITE ratio rather than `inf`/`nan` (which would corrupt the rank). Ties (`ratio_j ≥ treated_ratio`) are counted, making the p-value conservative. Mirrors the `_fit_tol` poor-fit guard.
+- **Note (placebo p-value is non-analytical):** `placebo_p_value` is deliberately a SEPARATE field from `p_value` (which stays NaN) — it is a permutation p-value with no SE / t-stat, so it does not flow through `safe_inference`. `is_significant` likewise stays bound to the (NaN) `p_value`, NOT `placebo_p_value`; a tool gating on `is_significant` will see `False` even when `placebo_p_value` is small. The reporting stack surfaces the placebo p-value through `estimator_native_diagnostics`, never the analytical headline.
-**Reference implementation:** authors' `Synth` package for R/MATLAB/Stata (`Synth::synth`). **R-parity anchor:** the Basque Country study (Abadie-Gardeazabal 2003, `data("basque")`) — published synthetic = region 10 (Cataluña) 0.851 + region 14 (Madrid) 0.149, `loss.v` 0.0089. Two-tier test (`tests/test_methodology_synthetic_control.py`): Tier-1 feeds R's `solution.v` via `custom_v` → donor weights match to atol 1e-3 (deterministic); Tier-2 checks the nested fit in a band.
+**Reference implementation:** authors' `Synth` package for R/MATLAB/Stata (`Synth::synth`); in-space placebo construction follows `SCtools::generate.placebos`. **R-parity anchor:** the Basque Country study (Abadie-Gardeazabal 2003, `data("basque")`) — published synthetic = region 10 (Cataluña) 0.851 + region 14 (Madrid) 0.149, `loss.v` 0.0089. Two-tier test (`tests/test_methodology_synthetic_control.py`): Tier-1 feeds R's `solution.v` via `custom_v` → donor weights match to atol 1e-3 (deterministic); Tier-2 checks the nested fit in a band.
**Requirements checklist:**
- [x] Donor weights on the unit simplex; exactly one treated unit, block assignment.
@@ -2003,7 +2007,7 @@ Classic synthetic control (donor/unit weights only) for a single treated unit, d
- [x] Inner simplex-constrained weighted LS via `_sc_weight_fw` with diagonal PSD `V`.
- [x] Outer nested `V` (pre-period outcome MSPE) + user-supplied `custom_v`.
- [x] Gap path + pre-period RMSPE + predictor-balance table.
-- [x] No analytical SE (NaN inference); placebo permutation inference deferred to a follow-up PR.
+- [x] No analytical SE (NaN inference); in-space placebo permutation inference (`in_space_placebo()`, `rank/(n_placebos+1)`) with the real treated unit excluded from every placebo pool, effective-count denominator, and a scale-aware RMSPE-ratio floor.
- [x] Predictor-leakage, absorbing-suffix/no-anticipation, empty-window, duplicate-label, and inner-non-convergence validation gates.
---
diff --git a/docs/methodology/REPORTING.md b/docs/methodology/REPORTING.md
index ea16530f..58058312 100644
--- a/docs/methodology/REPORTING.md
+++ b/docs/methodology/REPORTING.md
@@ -263,9 +263,16 @@ a library setting.
pre-period), and routes sensitivity to `in_time_placebo()` +
`sensitivity_to_zeta_omega()`. `TROPResults` surfaces factor-model
diagnostics (`effective_rank`, `loocv_score`, selected `lambda_*`)
- under `estimator_native_diagnostics`. `EfficientDiDResults` PT runs
- through `EfficientDiD.hausman_pretest` (the estimator's native
- PT-All vs PT-Post check).
+ under `estimator_native_diagnostics`. `SyntheticControlResults`
+ routes parallel-trends to the `scm_fit` analogue (`pre_rmspe`,
+ verdict `design_enforced_pt`) and surfaces `pre_rmspe`, donor-weight
+ concentration, and the in-space placebo permutation p-value under
+ `estimator_native_diagnostics` — the placebo block is populated only
+ when the caller has already run `in_space_placebo()` (opt-in; DR never
+ triggers the per-donor refit loop implicitly), and it omits
+ HonestDiD-style `sensitivity` (significance IS the placebo).
+ `EfficientDiDResults` PT runs through `EfficientDiD.hausman_pretest`
+ (the estimator's native PT-All vs PT-Post check).
- **Note:** Pre-trends verdict is a three-bin heuristic, not a field
convention. DR maps the joint p-value as follows:
diff --git a/tests/test_business_report.py b/tests/test_business_report.py
index 940ca1ef..d5668e95 100644
--- a/tests/test_business_report.py
+++ b/tests/test_business_report.py
@@ -24,6 +24,7 @@
from unittest.mock import patch
import numpy as np
+import pandas as pd
import pytest
import diff_diff as dd
@@ -39,6 +40,7 @@
generate_did_data,
generate_factor_data,
generate_staggered_data,
+ synthetic_control,
)
from diff_diff.business_report import BUSINESS_REPORT_SCHEMA_VERSION
@@ -111,6 +113,38 @@ def sdid_fit():
return sdid, fdf
+@pytest.fixture # function-scoped: some tests run in_space_placebo(), which mutates the result
+def scm_fit():
+ rng = np.random.default_rng(0)
+ T, T0, n = 8, 6, 4
+ years = list(range(2000, 2000 + T))
+ donors = {
+ j: rng.normal(10, 2) + rng.normal(0, 0.3) * np.arange(T) + rng.normal(0, 0.15, T)
+ for j in range(n)
+ }
+ treated = 0.6 * donors[0] + 0.4 * donors[1] + rng.normal(0, 0.08, T)
+ treated[T0:] += 3.0
+ rows = []
+ for j in range(n):
+ for t in range(T):
+ rows.append({"unit": f"d{j}", "year": years[t], "y": donors[j][t], "treated": 0})
+ for t in range(T):
+ rows.append({"unit": "treated", "year": years[t], "y": treated[t], "treated": int(t >= T0)})
+ df = pd.DataFrame(rows)
+ res = synthetic_control(
+ df,
+ "y",
+ "treated",
+ "unit",
+ "year",
+ seed=0,
+ n_starts=1,
+ optimizer_options={"maxiter": 50},
+ inner_min_decrease=1e-3,
+ )
+ return res, df
+
+
@pytest.fixture(scope="module")
def edid_fit():
from diff_diff import EfficientDiD
@@ -726,6 +760,71 @@ class TripleDifferenceResults:
assert "DDD" in desc
assert "Ortiz-Villavicencio" in desc or "2025" in desc
+
+class TestSCMBusinessReport:
+ """Classic SCM routes through BusinessReport with the fit-based assumption
+ block, native-diagnostics robustness, and an ADH 2010 attribution."""
+
+ def test_scm_assumption_block_is_fit_based(self, scm_fit):
+ res, _ = scm_fit
+ assumption = BusinessReport(res, auto_diagnostics=False).to_dict()["assumption"]
+ # Distinct from SDiD's "synthetic_fit" — classic SCM is a donor-weighted match.
+ assert assumption["parallel_trends_variant"] == "scm_fit"
+ desc = assumption["description"].lower()
+ assert "placebo" in desc # significance is placebo-based, not analytical SE
+ assert "donor-weighted" in desc or "donor" in desc
+
+ def test_scm_report_is_json_serializable(self, scm_fit):
+ # SCM's analytical p_value is NaN; the headline is_significant must be a
+ # plain bool (not numpy bool_) so the AI-legible schema serializes.
+ res, _ = scm_fit
+ schema = BusinessReport(res, auto_diagnostics=True).to_dict()
+ json.dumps(schema) # must not raise
+ assert isinstance(schema["headline"]["is_significant"], bool)
+
+ def test_scm_robustness_uses_native_diagnostics(self, scm_fit):
+ # SCM omits HonestDiD "sensitivity", so robustness must come from the
+ # estimator-native diagnostics block, which must be populated and ran.
+ res, _ = scm_fit
+ diag = BusinessReport(res, auto_diagnostics=True).to_dict()["diagnostics"]
+ assert diag["status"] == "ran"
+ native = diag["schema"]["estimator_native_diagnostics"]
+ assert native["estimator"] == "SyntheticControl"
+ assert isinstance(native["pre_rmspe"], float)
+
+ def test_scm_appendix_attributes_adh_2010(self, scm_fit):
+ res, _ = scm_fit
+ blob = json.dumps(BusinessReport(res, auto_diagnostics=False).to_dict()).lower()
+ assert "abadie" in blob and "2010" in blob
+
+ def test_scm_rejects_honest_did_passthrough(self, scm_fit):
+ res, _ = scm_fit
+ with pytest.raises(ValueError, match="estimator_native_diagnostics"):
+ BusinessReport(res, honest_did_results=object(), auto_diagnostics=False)
+
+ def test_scm_target_parameter_is_named(self, scm_fit):
+ # BR must name SCM's headline estimand (single-unit mean post-treatment gap),
+ # not fall back to "ATT (unrecognized result class)".
+ res, _ = scm_fit
+ tp = BusinessReport(res, auto_diagnostics=False).to_dict()["target_parameter"]
+ assert "unrecognized" not in tp["name"].lower()
+ assert "scm" in tp["name"].lower() or "synthetic control" in tp["name"].lower()
+
+ def test_scm_robustness_block_surfaces_native_fields(self, scm_fit):
+ # The top-level robustness block must surface SCM-native content (pre_rmspe,
+ # weight concentration, the placebo result) rather than the SDiD-shaped
+ # pre_treatment_fit=None left over from the generic lift.
+ res, _ = scm_fit
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res.in_space_placebo()
+ rob = BusinessReport(res, auto_diagnostics=True).to_dict()["robustness"]
+ native = rob["estimator_native"]
+ assert native["estimator"] == "SyntheticControl"
+ assert isinstance(native["pre_rmspe"], float)
+ assert "weight_concentration" in native
+ assert native["in_space_placebo"]["n_placebos"] == res.n_placebos
+
def test_staggered_triple_diff_assumption_uses_ddd_not_generic_pt(self):
class StaggeredTripleDiffResults:
pass
diff --git a/tests/test_diagnostic_report.py b/tests/test_diagnostic_report.py
index 32e7a50d..66e594a5 100644
--- a/tests/test_diagnostic_report.py
+++ b/tests/test_diagnostic_report.py
@@ -20,6 +20,7 @@
from unittest.mock import patch
import numpy as np
+import pandas as pd
import pytest
import diff_diff as dd
@@ -34,6 +35,7 @@
generate_did_data,
generate_factor_data,
generate_staggered_data,
+ synthetic_control,
)
from diff_diff.diagnostic_report import (
DIAGNOSTIC_REPORT_SCHEMA_VERSION,
@@ -135,6 +137,48 @@ def sdid_fit():
return sdid, fdf
+def _scm_panel(n_donors=4, T=8, T0=6, effect=3.0, seed=0):
+ """Small balanced SCM panel: treated = convex mix of two donors + post effect."""
+ rng = np.random.default_rng(seed)
+ years = list(range(2000, 2000 + T))
+ donors = {
+ j: rng.normal(10, 2) + rng.normal(0, 0.3) * np.arange(T) + rng.normal(0, 0.15, T)
+ for j in range(n_donors)
+ }
+ treated = 0.6 * donors[0] + 0.4 * donors[1] + rng.normal(0, 0.08, T)
+ treated[T0:] += effect
+ rows = []
+ for j in range(n_donors):
+ for t in range(T):
+ rows.append({"unit": f"d{j}", "year": years[t], "y": donors[j][t], "treated": 0})
+ for t in range(T):
+ rows.append({"unit": "treated", "year": years[t], "y": treated[t], "treated": int(t >= T0)})
+ return pd.DataFrame(rows)
+
+
+@pytest.fixture # function-scoped: some tests run in_space_placebo(), which mutates the result
+def scm_fit():
+ df = _scm_panel(n_donors=4)
+ # Cheap optimizer settings keep the pure-Python fixture fast; the report path,
+ # not solver accuracy, is what these tests exercise. Scope the warning
+ # suppression to this call via catch_warnings() so it does not mutate the
+ # global filter state for later tests in the same worker.
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res = synthetic_control(
+ df,
+ "y",
+ "treated",
+ "unit",
+ "year",
+ seed=0,
+ n_starts=1,
+ optimizer_options={"maxiter": 50},
+ inner_min_decrease=1e-3,
+ )
+ return res, df
+
+
# ---------------------------------------------------------------------------
# Schema contract
# ---------------------------------------------------------------------------
@@ -2005,6 +2049,141 @@ def test_sdid_does_not_call_honest_did(self, sdid_fit):
mock.assert_not_called()
+class TestSCMNative:
+ """Classic SCM routes to the fit-based PT analogue + native diagnostics."""
+
+ def test_scm_applicable_checks(self, scm_fit):
+ res, _ = scm_fit
+ applicable = DiagnosticReport(res).applicable_checks
+ assert "estimator_native" in applicable
+ assert "parallel_trends" in applicable # design-enforced fit analogue
+ # SCM is not HonestDiD/heterogeneity territory (one treated unit).
+ assert "sensitivity" not in applicable
+ assert "heterogeneity" not in applicable
+
+ def test_scm_pt_uses_scm_fit_method(self, scm_fit):
+ res, _ = scm_fit
+ pt = DiagnosticReport(res).to_dict()["parallel_trends"]
+ assert pt["method"] == "scm_fit"
+ assert pt["verdict"] == "design_enforced_pt"
+ assert isinstance(pt.get("pre_treatment_fit_rmse"), float)
+
+ def test_scm_native_section_populated(self, scm_fit):
+ res, _ = scm_fit
+ native = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"]
+ assert native["status"] == "ran"
+ assert native["estimator"] == "SyntheticControl"
+ assert isinstance(native["pre_rmspe"], float)
+ assert "herfindahl" in native["weight_concentration"]
+ # Placebo is opt-in: NOT auto-run inside the report.
+ assert native["in_space_placebo"]["status"] == "not_run"
+ # In-time placebo / leave-one-out are ADH 2015 (not implemented here).
+ assert "in_time_placebo" not in native and "leave_one_out" not in native
+
+ def test_scm_native_surfaces_placebo_after_optin_run(self, scm_fit):
+ res, _ = scm_fit
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res.in_space_placebo()
+ native = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"]
+ block = native["in_space_placebo"]
+ assert block["n_placebos"] == res.n_placebos
+ assert block["placebo_p_value"] == pytest.approx(res.placebo_p_value)
+
+ def test_scm_does_not_call_honest_did(self, scm_fit):
+ """HonestDiD sensitivity should NOT run on SCM (fit-based / native path)."""
+ res, _ = scm_fit
+ with patch("diff_diff.honest_did.HonestDiD.sensitivity_analysis") as mock:
+ DiagnosticReport(res).to_dict()
+ mock.assert_not_called()
+
+ def test_scm_significance_not_marked_done_until_placebo_run(self, scm_fit):
+ """SCM significance is the OPT-IN in-space placebo, which DR never runs.
+
+ Unlike SDiD/TROP (whose native block IS the sensitivity work), SCM's
+ native block only reports the placebo as ``not_run``, so the in-space
+ placebo next-step must persist — significance is not complete merely
+ because the native diagnostics ran.
+ """
+ res, _ = scm_fit # the fixture does NOT run the placebo
+ schema = DiagnosticReport(res).to_dict()
+ labels = " ".join(s.get("label", "") for s in schema.get("next_steps", [])).lower()
+ assert "placebo" in labels # the significance recommendation still surfaces
+
+ def test_scm_placebo_step_completes_after_run(self, scm_fit):
+ """Once the opt-in placebo has been run, DR stops recommending it."""
+ res, _ = scm_fit
+ before = DiagnosticReport(res).to_dict()["next_steps"]
+ assert any("placebo" in s.get("label", "").lower() for s in before)
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res.in_space_placebo() # opt-in significance procedure now done
+ after = DiagnosticReport(res).to_dict()["next_steps"]
+ assert not any("placebo" in s.get("label", "").lower() for s in after)
+
+ def test_scm_rejects_precomputed_parallel_trends_and_sensitivity(self, scm_fit):
+ # Like SDiD/TROP, SCM computes its PT verdict internally (the scm_fit
+ # design-enforced pre-fit), so a precomputed parallel_trends p-value payload
+ # is methodology-incompatible and rejected — as is sensitivity (no HonestDiD
+ # analogue). SCM's PT still runs natively (scm_fit) without a precomputed input.
+ res, _ = scm_fit
+ with pytest.raises(ValueError, match="estimator_native_diagnostics"):
+ DiagnosticReport(res, precomputed={"parallel_trends": {"joint_p_value": 0.5}})
+ with pytest.raises(ValueError, match="estimator_native_diagnostics"):
+ DiagnosticReport(res, precomputed={"sensitivity": {"breakdown_M": 1.0}})
+ # Without a precomputed input, the native scm_fit PT verdict still renders.
+ assert DiagnosticReport(res).to_dict()["parallel_trends"]["method"] == "scm_fit"
+
+ def test_scm_pt_prose_is_scm_specific_not_sdid(self, scm_fit):
+ # The design_enforced_pt prose must describe SCM's donor-weighted, single-
+ # treated-unit, placebo-based identification — NOT SDiD's weighted-PT analogue.
+ res, _ = scm_fit
+ interp = DiagnosticReport(res).to_dict()["overall_interpretation"].lower()
+ assert "in-space placebo" in interp or "donor-weighted" in interp
+ assert "sdid" not in interp
+
+ def test_scm_target_parameter_is_named(self, scm_fit):
+ res, _ = scm_fit
+ tp = DiagnosticReport(res).to_dict()["target_parameter"]
+ assert "unrecognized" not in tp["name"].lower()
+ assert "scm" in tp["name"].lower() or "synthetic control" in tp["name"].lower()
+
+ def test_scm_generic_placebo_section_points_to_native(self, scm_fit):
+ # The generic ``placebo`` section must not claim placebo is "not yet
+ # implemented" for SCM — its in-space placebo IS implemented and surfaced
+ # under estimator_native_diagnostics (avoids contradictory report state).
+ res, _ = scm_fit
+ reason = (DiagnosticReport(res).to_dict()["placebo"].get("reason") or "").lower()
+ assert "not yet implemented" not in reason
+ assert "in_space_placebo" in reason or "estimator_native" in reason
+
+ def test_scm_native_marks_infeasible_placebo(self):
+ # An attempted-but-infeasible placebo (here J<2) surfaces an explicit
+ # status="infeasible" + reason in _scm_native, not a bare NaN p-value.
+ rng = np.random.default_rng(0)
+ T, T0 = 8, 6
+ years = list(range(2000, 2000 + T))
+ d0 = rng.normal(10, 2, T)
+ treated = d0 + rng.normal(0, 0.1, T)
+ treated[T0:] += 3.0
+ rows = [{"unit": "d0", "year": years[t], "y": d0[t], "treated": 0} for t in range(T)]
+ rows += [
+ {"unit": "treated", "year": years[t], "y": treated[t], "treated": int(t >= T0)}
+ for t in range(T)
+ ]
+ df = pd.DataFrame(rows)
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res = synthetic_control(
+ df, "y", "treated", "unit", "year", seed=0, n_starts=1, inner_min_decrease=1e-3
+ )
+ res.in_space_placebo() # only 1 donor -> infeasible
+ native = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"]
+ block = native["in_space_placebo"]
+ assert block["status"] == "infeasible"
+ assert np.isnan(block["placebo_p_value"]) and "reason" in block
+
+
# ---------------------------------------------------------------------------
# Error handling
# ---------------------------------------------------------------------------
diff --git a/tests/test_methodology_synthetic_control.py b/tests/test_methodology_synthetic_control.py
index 00507bd8..ae563d01 100644
--- a/tests/test_methodology_synthetic_control.py
+++ b/tests/test_methodology_synthetic_control.py
@@ -24,6 +24,8 @@
from __future__ import annotations
import json
+import pickle
+import warnings
from pathlib import Path
import numpy as np
@@ -742,8 +744,13 @@ def test_result_accessors_render():
assert set(gdf.columns) == {"period", "gap", "phase"}
wdf = res.get_weights_df()
assert list(wdf.columns) == ["unit", "weight"]
- # Reserved PR-2/3 attributes present and None.
- assert res._placebo_gaps is None and res._rmspe_ratio is None and res._fit_snapshot is None
+ # PR-2: fit() populates the placebo refit snapshot and the treated unit's
+ # RMSPE ratio; the placebo reference distribution is not computed until
+ # in_space_placebo() runs (placebo_p_value stays NaN, gaps/df unset).
+ assert res._fit_snapshot is not None
+ assert res._placebo_gaps is None and res._placebo_df is None
+ assert np.isfinite(res.rmspe_ratio)
+ assert np.isnan(res.placebo_p_value) and res.n_placebos == 0
def test_inferred_post_matches_explicit():
@@ -836,3 +843,489 @@ def test_basque_tier2_nested_band():
top2 = [u for u, _ in sorted(res.donor_weights.items(), key=lambda kv: -kv[1])[:2]]
assert set(top2) == {10, 14}
assert res.donor_weights.get(10, 0) + res.donor_weights.get(14, 0) > 0.7
+
+
+# ---------------------------------------------------------------------------
+# In-space placebo permutation inference (Abadie-Diamond-Hainmueller 2010 §2.4)
+# ---------------------------------------------------------------------------
+
+
+def _fit_for_placebo(n_donors=4, effect=3.0, **kw):
+ """Fit with cheap settings on a panel carrying a strong post-treatment effect."""
+ df, _, _ = _make_panel(n_donors=n_donors, effect=effect)
+ opts = dict(_FAST)
+ opts.update(kw)
+ with warnings.catch_warnings(): # single-donor / poor-fit fit warnings are not under test
+ warnings.simplefilter("ignore")
+ return synthetic_control(df, "y", "treated", "unit", "year", seed=0, **opts)
+
+
+def test_in_space_placebo_strong_effect_ranks_treated_first():
+ # A 3.0-unit post effect on a treated unit that is a clean convex mix of two
+ # donors -> treated RMSPE ratio is the most extreme -> rank 1 -> p = 1/(J+1).
+ res = _fit_for_placebo(n_donors=4)
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ pdf = res.in_space_placebo()
+ assert res.n_placebos == 4 and res.n_failed == 0
+ treated_ratio = pdf.loc[pdf["is_treated"], "rmspe_ratio"].iloc[0]
+ placebo_ratios = pdf.loc[~pdf["is_treated"], "rmspe_ratio"]
+ assert (treated_ratio > placebo_ratios).all() # treated is the most extreme unit
+ assert res.placebo_p_value == pytest.approx(1 / (res.n_placebos + 1))
+ # Exactly one treated row; the placebo rows are exactly the donor units.
+ assert int(pdf["is_treated"].sum()) == 1
+ assert pdf.loc[pdf["is_treated"], "unit"].iloc[0] == "treated"
+ assert set(pdf.loc[~pdf["is_treated"], "unit"]) == {"d0", "d1", "d2", "d3"}
+
+
+def test_in_space_placebo_excludes_real_treated_from_donor_pools():
+ # The real treated unit is never in the donor universe, so it cannot serve as
+ # a donor for any placebo (ADH 2010 contamination guard; SCtools convention).
+ res = _fit_for_placebo(n_donors=4)
+ snap = res._fit_snapshot
+ assert snap.treated_id == "treated" and "treated" not in snap.donor_ids
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res.in_space_placebo()
+ # Each donor became a placebo exactly once; the treated unit is not a placebo.
+ assert "treated" not in res._placebo_gaps
+ assert set(res._placebo_gaps) == set(snap.donor_ids)
+
+
+def test_in_space_placebo_p_in_valid_discrete_set():
+ res = _fit_for_placebo(n_donors=4)
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res.in_space_placebo()
+ valid = {(k + 1) / (res.n_placebos + 1) for k in range(res.n_placebos + 1)}
+ assert any(res.placebo_p_value == pytest.approx(v) for v in valid)
+
+
+def test_in_space_placebo_does_not_touch_analytical_inference():
+ # The permutation p-value is SEPARATE from the analytical fields, which stay
+ # NaN; is_significant stays bound to the (NaN) p_value, not placebo_p_value.
+ res = _fit_for_placebo(n_donors=4)
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res.in_space_placebo()
+ assert np.isfinite(res.placebo_p_value)
+ assert_nan_inference(
+ {"se": res.se, "t_stat": res.t_stat, "p_value": res.p_value, "conf_int": res.conf_int}
+ )
+ assert res.is_significant is False
+
+
+def test_in_space_placebo_deterministic():
+ res = _fit_for_placebo(n_donors=4)
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ p1 = res.in_space_placebo()
+ first_p = res.placebo_p_value
+ p2 = res.in_space_placebo()
+ assert res.placebo_p_value == first_p # bit-equal p-value across runs
+ pd.testing.assert_frame_equal(p1, p2) # identical rows AND row order
+
+
+def test_in_space_placebo_requires_two_donors():
+ res = _fit_for_placebo(n_donors=1)
+ with pytest.warns(UserWarning, match="at least 2 donors"):
+ pdf = res.in_space_placebo()
+ assert np.isnan(res.placebo_p_value) and res.n_placebos == 0
+ assert len(pdf) == 1 and bool(pdf["is_treated"].iloc[0])
+
+
+def test_in_space_placebo_two_donors_warns_coarse():
+ res = _fit_for_placebo(n_donors=2)
+ with pytest.warns(UserWarning, match="coarse"):
+ res.in_space_placebo()
+ # 2 placebos -> reference set of 3 -> p in {1/3, 2/3, 1}.
+ assert res.n_placebos == 2
+ assert any(res.placebo_p_value == pytest.approx(v) for v in (1 / 3, 2 / 3, 1.0))
+
+
+def test_in_space_placebo_fails_closed_on_nonconverged_treated_fit():
+ # inner_max_iter=1 truncates the treated unit's own Frank-Wolfe solve, so its
+ # RMSPE ratio is not a valid optimum. in_space_placebo() must fail closed
+ # (NaN p-value + warning), NOT rank a truncated treated statistic.
+ df, _, _ = _make_panel(n_donors=4, effect=3.0)
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res = synthetic_control(
+ df,
+ "y",
+ "treated",
+ "unit",
+ "year",
+ seed=0,
+ n_starts=1,
+ inner_max_iter=1,
+ optimizer_options={"maxiter": 5},
+ )
+ assert res._fit_converged is False # treated fit was truncated
+ with pytest.warns(UserWarning, match="did not converge at fit time"):
+ pdf = res.in_space_placebo()
+ assert np.isnan(res.placebo_p_value)
+ assert res.n_placebos == 0 and res.n_failed == 0 # the placebo loop never ran
+ assert len(pdf) == 1 and bool(pdf["is_treated"].iloc[0]) # treated row only
+
+
+def test_in_space_placebo_pickle_drops_snapshot_keeps_scalars():
+ res = _fit_for_placebo(n_donors=4)
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res.in_space_placebo()
+ restored = pickle.loads(pickle.dumps(res))
+ # Scalars survive; panel-derived state is dropped.
+ assert restored.placebo_p_value == res.placebo_p_value
+ assert restored.rmspe_ratio == res.rmspe_ratio
+ assert restored.n_placebos == res.n_placebos and restored.n_failed == res.n_failed
+ assert restored._fit_snapshot is None and restored._placebo_gaps is None
+ # The small aggregate table survives, so get_placebo_df still works...
+ assert len(restored.get_placebo_df()) == len(res.get_placebo_df())
+ # ...but a re-run of the refit raises (the snapshot is gone).
+ with pytest.raises(ValueError, match="requires the fit snapshot"):
+ restored.in_space_placebo()
+
+
+def test_in_space_placebo_custom_v_path():
+ df, _, _ = _make_panel(n_donors=4)
+ # Default predictors = all pre-period outcomes -> k = number of pre periods (T0).
+ k = 6
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res = synthetic_control(
+ df,
+ "y",
+ "treated",
+ "unit",
+ "year",
+ v_method="custom",
+ custom_v=np.ones(k),
+ inner_min_decrease=1e-3,
+ )
+ pdf = res.in_space_placebo()
+ assert res.n_placebos == 4 and np.isfinite(res.placebo_p_value)
+ assert len(pdf) == 5
+
+
+def test_get_placebo_df_before_run_returns_treated_row_only():
+ res = _fit_for_placebo(n_donors=4)
+ pdf = res.get_placebo_df()
+ assert len(pdf) == 1
+ assert bool(pdf["is_treated"].iloc[0]) and pdf["status"].iloc[0] == "treated"
+ assert set(pdf.columns) == {
+ "unit",
+ "pre_mspe",
+ "post_mspe",
+ "rmspe_ratio",
+ "is_treated",
+ "status",
+ }
+
+
+def test_rmspe_ratio_floors_zero_pre_mspe():
+ # Perfect pre-fit (pre-MSPE == 0) must yield a large-but-finite ratio, not
+ # inf/nan (which would corrupt the permutation rank).
+ from diff_diff.synthetic_control import _rmspe_ratio
+
+ pre = np.zeros(5)
+ assert np.isfinite(_rmspe_ratio(pre, np.array([1.0, 2.0, 3.0]), scale=10.0))
+ # A zero-effect (post all zero) placebo has ratio 0 — the least extreme.
+ assert _rmspe_ratio(pre, np.zeros(3), scale=10.0) == 0.0
+
+
+def test_in_space_placebo_perfect_treated_fit_finite_ratio():
+ # 2-donor panel where the treated unit EQUALS d0 in the pre-period -> the inner
+ # FW solve lands on w=[1, 0], so the treated pre-MSPE is (bit-)exactly 0. The
+ # RMSPE ratio must stay FINITE (scale-aware floor), never inf/nan.
+ rng = np.random.default_rng(2)
+ T, T0 = 8, 6
+ years = list(range(2000, 2000 + T))
+ d0 = rng.normal(10, 2, T)
+ d1 = rng.normal(5, 2, T)
+ treated = d0.copy()
+ treated[T0:] += 5.0 # identical to d0 in the pre-period, clean post effect
+ rows = []
+ for name, series in (("d0", d0), ("d1", d1)):
+ for t in range(T):
+ rows.append({"unit": name, "year": years[t], "y": series[t], "treated": 0})
+ for t in range(T):
+ rows.append({"unit": "treated", "year": years[t], "y": treated[t], "treated": int(t >= T0)})
+ df = pd.DataFrame(rows)
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res = synthetic_control(
+ df,
+ "y",
+ "treated",
+ "unit",
+ "year",
+ v_method="custom",
+ custom_v=np.ones(T0),
+ inner_min_decrease=1e-3,
+ )
+ assert res.pre_rmspe == pytest.approx(0.0, abs=1e-9)
+ assert np.isfinite(res.rmspe_ratio) and res.rmspe_ratio > 0
+
+
+def test_in_space_placebo_immune_to_post_fit_mutation():
+ # The fit snapshot must COPY caller-owned mutable inputs (custom_v,
+ # optimizer_options), so mutating them after fit() cannot silently change
+ # in_space_placebo() output on an already-returned results object.
+ df, _, _ = _make_panel(n_donors=4)
+ cv = np.ones(6) # k = 6 default pre-period-outcome predictors
+ opts = {"maxiter": 50}
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res = synthetic_control(
+ df,
+ "y",
+ "treated",
+ "unit",
+ "year",
+ v_method="custom",
+ custom_v=cv,
+ optimizer_options=opts,
+ inner_min_decrease=1e-3,
+ )
+ p1 = res.in_space_placebo().copy()
+ pval1 = res.placebo_p_value
+ snap = res._fit_snapshot
+ assert snap.custom_v is not cv and snap.optimizer_options is not opts
+ # Mutate the caller-owned originals AFTER fit -> placebo output must not change.
+ cv[:] = [1e6, 1.0, 1.0, 1.0, 1.0, 1.0]
+ opts["maxiter"] = 1
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ p2 = res.in_space_placebo().copy()
+ assert res.placebo_p_value == pval1
+ pd.testing.assert_frame_equal(p1, p2)
+
+
+def test_get_placebo_df_includes_failed_donors(monkeypatch):
+ # When the treated fit IS valid but some per-donor placebo refits fail to
+ # converge, get_placebo_df() must still list EVERY unit (treated + each donor)
+ # so callers can tell which donors failed -> exactly n_donors + 1 rows.
+ # (A truncated treated fit instead fails the whole placebo run closed, tested
+ # separately; here we simulate isolated donor failures with a converged treated
+ # fit by monkeypatching the per-donor refit to fail for the first two donors.)
+ import importlib
+
+ # diff_diff.synthetic_control the SUBMODULE is shadowed by the re-exported
+ # synthetic_control FUNCTION on the package, so import the module explicitly.
+ sc = importlib.import_module("diff_diff.synthetic_control")
+
+ res = _fit_for_placebo(n_donors=4) # treated fit converges (normal settings)
+ real_fit_unit = sc._placebo_fit_unit
+ calls = {"n": 0}
+
+ def flaky_fit_unit(snap, unit, donor_pool, n_starts):
+ calls["n"] += 1
+ if calls["n"] <= 2: # first two donor refits "fail"
+ return None
+ return real_fit_unit(snap, unit, donor_pool, n_starts)
+
+ monkeypatch.setattr(sc, "_placebo_fit_unit", flaky_fit_unit)
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ pdf = res.in_space_placebo()
+ assert len(pdf) == res.n_donors + 1 # treated + every donor, regardless of failures
+ assert res.n_failed == 2 and res.n_placebos == res.n_donors - 2
+ failed = pdf[pdf["status"] == "failed"]
+ assert len(failed) == 2 and failed["rmspe_ratio"].isna().all() # NaN metrics
+
+
+def test_in_space_placebo_fails_closed_on_underoptimized_outer_v():
+ # An under-optimized OUTER V search (optimizer maxiter=1) leaves the treated
+ # fit's V non-optimal even though the inner solve converges. Its RMSPE ratio is
+ # therefore not a valid optimum, so in_space_placebo() must FAIL CLOSED rather
+ # than silently rank an anti-conservatively under-optimized statistic.
+ df, _, _ = _make_panel(n_donors=4, effect=3.0)
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res = synthetic_control(
+ df,
+ "y",
+ "treated",
+ "unit",
+ "year",
+ seed=0,
+ n_starts=1,
+ optimizer_options={"maxiter": 1}, # outer V search cannot converge
+ inner_min_decrease=1e-3, # inner still converges -> isolates the outer path
+ )
+ assert res._fit_converged is False # outer V non-convergence -> invalid fit
+ with pytest.warns(UserWarning, match="did not converge at fit time"):
+ res.in_space_placebo()
+ assert np.isnan(res.placebo_p_value)
+ assert res.n_placebos == 0 and res.n_failed == 0 # placebo loop never ran
+
+
+def test_outer_v_convergence_tracks_selected_incumbent(monkeypatch):
+ # _outer_solve_V must report convergence of the SELECTED (lowest-objective)
+ # incumbent, NOT "any start converged". Here the first multistart succeeds with a
+ # HIGH objective while the winning (lowest-objective) start reports success=False;
+ # the fit must be flagged non-converged so in_space_placebo() fails closed.
+ import importlib
+
+ from scipy.optimize import OptimizeResult
+
+ sc = importlib.import_module("diff_diff.synthetic_control")
+ calls = {"n": 0}
+
+ def fake_minimize(fun, x0, **kwargs):
+ calls["n"] += 1
+ x0 = np.asarray(x0, dtype=float)
+ if kwargs.get("method") == "Nelder-Mead":
+ # 1st start: high objective but converged; later: low objective (wins) but NOT.
+ if calls["n"] == 1:
+ return OptimizeResult(x=x0, fun=10.0, success=True)
+ return OptimizeResult(x=x0, fun=1.0, success=False)
+ # Powell polish: neither improves on nor converges at the incumbent.
+ return OptimizeResult(x=x0, fun=5.0, success=False)
+
+ monkeypatch.setattr(sc, "minimize", fake_minimize)
+ df, _, _ = _make_panel(n_donors=4)
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res = synthetic_control(
+ df, "y", "treated", "unit", "year", seed=0, n_starts=2, inner_min_decrease=1e-3
+ )
+ # The winning incumbent came from a success=False run -> selected V is not a
+ # validated optimum, so the fit must not be marked converged.
+ assert res._fit_converged is False
+ with pytest.warns(UserWarning, match="did not converge at fit time"):
+ res.in_space_placebo()
+ assert np.isnan(res.placebo_p_value)
+
+
+def test_outer_v_powell_success_at_worse_point_does_not_validate(monkeypatch):
+ # The Powell polish must validate the SELECTED incumbent only when it converges
+ # back AT the incumbent's objective level. Here the winning (lowest-objective)
+ # Nelder-Mead start reports success=False, and Powell "succeeds" but at a STRICTLY
+ # WORSE objective (it ended elsewhere). Powell's success says nothing about the
+ # selected incumbent, so the fit must stay non-converged and fail closed -- a flag
+ # of "converged" here would silently admit an under-optimized V into the placebo
+ # ranking and produce wrong permutation inference.
+ import importlib
+
+ from scipy.optimize import OptimizeResult
+
+ sc = importlib.import_module("diff_diff.synthetic_control")
+ calls = {"n": 0}
+
+ def fake_minimize(fun, x0, **kwargs):
+ calls["n"] += 1
+ x0 = np.asarray(x0, dtype=float)
+ if kwargs.get("method") == "Nelder-Mead":
+ # Single start: lowest objective wins but reports success=False.
+ return OptimizeResult(x=x0, fun=1.0, success=False)
+ # Powell polish: SUCCEEDS, but at a strictly worse objective than the incumbent.
+ return OptimizeResult(x=x0, fun=5.0, success=True)
+
+ monkeypatch.setattr(sc, "minimize", fake_minimize)
+ df, _, _ = _make_panel(n_donors=4)
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res = synthetic_control(
+ df, "y", "treated", "unit", "year", seed=0, n_starts=1, inner_min_decrease=1e-3
+ )
+ # Powell's success at a worse point must NOT flip the selected incumbent to converged.
+ assert res._fit_converged is False
+ with pytest.warns(UserWarning, match="did not converge at fit time"):
+ res.in_space_placebo()
+ assert np.isnan(res.placebo_p_value)
+
+
+def test_to_dict_includes_placebo_scalars():
+ res = _fit_for_placebo(n_donors=4)
+ d = res.to_dict()
+ for key in ("placebo_p_value", "rmspe_ratio", "n_placebos", "n_failed"):
+ assert key in d
+ # Before the placebo run: rmspe_ratio is finite (fit-time), placebo_p_value NaN.
+ assert np.isfinite(d["rmspe_ratio"]) and np.isnan(d["placebo_p_value"])
+ assert d["n_placebos"] == 0 and d["n_failed"] == 0
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res.in_space_placebo()
+ d2 = res.to_dict()
+ assert np.isfinite(d2["placebo_p_value"]) and d2["n_placebos"] == 4
+
+
+def test_summary_distinguishes_infeasible_placebo_from_not_run():
+ # summary() must tell "placebo never run" apart from "placebo run but produced
+ # no valid reference set" (J<2 here -> placebo_p_value NaN but it WAS attempted),
+ # and name the SPECIFIC infeasibility reason (too few donors).
+ df, _, _ = _make_panel(n_donors=1)
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res = synthetic_control(df, "y", "treated", "unit", "year", seed=0, **_FAST)
+ before = res.summary()
+ res.in_space_placebo() # infeasible: single donor -> no placebo distribution
+ after = res.summary()
+ assert "Run in_space_placebo()" in before # never run
+ assert np.isnan(res.placebo_p_value) and res._placebo_df is not None # attempted
+ assert res._placebo_status == "too_few_donors"
+ assert "requires at least 2 donors" in after # specific reason, not "not run"
+ assert "Run in_space_placebo()" not in after # not mislabeled as "not run"
+
+
+def test_summary_treated_fit_failure_names_specific_reason():
+ # When the treated unit's OWN fit fails to converge, in_space_placebo() fails
+ # closed with n_placebos=0, n_failed=0 -- the SAME counts as the J<2 case. The
+ # CI codex P2: summary() must not reconstruct the reason from those counts and
+ # narrate "too few donors or all donor refits failed" (false here); it must name
+ # the treated-fit non-convergence recorded in _placebo_status.
+ df, _, _ = _make_panel(n_donors=4)
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ res = synthetic_control(
+ df,
+ "y",
+ "treated",
+ "unit",
+ "year",
+ seed=0,
+ n_starts=1,
+ optimizer_options={"maxiter": 1}, # outer V cannot converge -> fail closed
+ inner_min_decrease=1e-3,
+ )
+ assert res._fit_converged is False
+ with pytest.warns(UserWarning, match="did not converge at fit time"):
+ res.in_space_placebo()
+ after = res.summary()
+ assert res._placebo_status == "treated_fit_nonconverged"
+ assert res.n_placebos == 0 and res.n_failed == 0 # same counts as J<2
+ assert "treated unit's own SCM fit" in after and "did not converge" in after
+ # Must NOT misdiagnose as the donor-side reason.
+ assert "too few" not in after.lower()
+ assert "all donor refits" not in after.lower()
+
+
+def test_in_space_placebo_rejects_invalid_n_starts():
+ # CI codex P2: the n_starts override must fail fast on non-positive / non-integer
+ # values (mirroring the estimator constructor) rather than silently coercing via
+ # int(...) into a degenerate one-start (or invalid) permutation procedure.
+ res = _fit_for_placebo(n_donors=4)
+ for bad in (0, -1, -5):
+ with pytest.raises(ValueError, match="n_starts override must be a positive integer"):
+ res.in_space_placebo(n_starts=bad)
+ for bad in (2.5, "3"):
+ with pytest.raises(ValueError, match="n_starts override must be a positive integer"):
+ res.in_space_placebo(n_starts=bad) # type: ignore[arg-type]
+ # The placebo state must be untouched by a rejected override.
+ assert res._placebo_status is None and res._placebo_df is None
+
+
+def test_rmspe_ratio_is_root_scale():
+ # The reported statistic is the ROOT-scale ratio RMSPE_post/RMSPE_pre =
+ # sqrt(MSPE_post/MSPE_pre), NOT the MSPE ratio. Hand-worked: pre-MSPE = 4,
+ # post-MSPE = 9 -> sqrt(9/4) = 1.5 (the MSPE ratio would be 9/4 = 2.25).
+ from diff_diff.synthetic_control import _rmspe_ratio
+
+ pre = np.array([2.0, 2.0]) # MSPE = 4
+ post = np.array([3.0, 3.0]) # MSPE = 9
+ assert _rmspe_ratio(pre, post, scale=10.0) == pytest.approx(1.5)
+ # Zero post-effect -> ratio 0; perfect pre-fit -> finite (floored), not inf.
+ assert _rmspe_ratio(pre, np.zeros(2), scale=10.0) == pytest.approx(0.0)
+ assert np.isfinite(_rmspe_ratio(np.zeros(2), post, scale=10.0))
diff --git a/tests/test_practitioner.py b/tests/test_practitioner.py
index 99ccec5d..4153d840 100644
--- a/tests/test_practitioner.py
+++ b/tests/test_practitioner.py
@@ -20,6 +20,7 @@
from diff_diff.results import DiDResults, SyntheticDiDResults
from diff_diff.stacked_did_results import StackedDiDResults
from diff_diff.sun_abraham import SunAbrahamResults
+from diff_diff.synthetic_control_results import SyntheticControlResults
from diff_diff.triple_diff import TripleDifferenceResults
from diff_diff.trop_results import TROPResults
from diff_diff.two_stage_results import TwoStageDiDResults
@@ -114,6 +115,15 @@ def mock_trop_results():
return r
+@pytest.fixture
+def mock_scm_results():
+ r = SyntheticControlResults.__new__(SyntheticControlResults)
+ r.att = 1.2
+ r.se = np.nan
+ r.placebo_p_value = np.nan
+ return r
+
+
@pytest.fixture
def mock_efficient_results():
r = EfficientDiDResults.__new__(EfficientDiDResults)
@@ -336,6 +346,21 @@ def test_trop_results(self, mock_trop_results):
assert "control_group" not in all_code
assert "anticipation" not in all_code
+ def test_synthetic_control_results(self, mock_scm_results):
+ output = practitioner_next_steps(mock_scm_results, verbose=False)
+ assert output["estimator"] == "SyntheticControl"
+ assert len(output["next_steps"]) > 0
+ # The in-space placebo step must surface (it is SCM's significance test)
+ # and must not be auto-suppressed as the completed estimation step.
+ all_code = " ".join(s.get("code", "") for s in output["next_steps"])
+ all_labels = " ".join(s.get("label", "") for s in output["next_steps"]).lower()
+ assert "in_space_placebo" in all_code
+ assert "placebo" in all_labels
+ # SCM is not a staggered DiD: no control-group / anticipation knobs.
+ handler_steps = [s for s in output["next_steps"] if s["baker_step"] > 2]
+ handler_code = " ".join(s.get("code", "") for s in handler_steps)
+ assert "control_group" not in handler_code and "anticipation" not in handler_code
+
def test_efficient_results(self, mock_efficient_results):
output = practitioner_next_steps(mock_efficient_results, verbose=False)
assert len(output["next_steps"]) > 0