From 315fa27eba51f0e292c9dd63c1d08e5421141e8a Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 31 May 2026 15:00:53 -0400
Subject: [PATCH 1/3] SyntheticControl: in-space placebo permutation inference
 + reporting-stack integration
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Implements ADH 2010 §2.4 in-space placebo inference for the classic
SyntheticControl estimator (PR-1 / #501 shipped the estimator with no
analytical SE), and wires SyntheticControlResults into the practitioner /
DiagnosticReport / BusinessReport reporting stack.

Methodology (diff_diff/synthetic_control.py, synthetic_control_results.py):
- SyntheticControlResults.in_space_placebo(): reassigns treatment to each
  donor, refits a synthetic control for that pseudo-treated donor against the
  other J-1 donors (the real treated unit is excluded from every placebo pool
  -- its post-period is treatment-contaminated; matches
  SCtools::generate.placebos), and ranks the treated unit's post/pre RMSPE
  ratio among the J+1 units.
- New fields placebo_p_value (= rank/(n_placebos+1); an upper-tail rank test on
  the unsigned RMSPE-ratio statistic, ties via >=), rmspe_ratio (treated
  statistic, set at fit), n_placebos/n_failed (effective reference-set sizes;
  non-converged placebos excluded from BOTH numerator and denominator).
- placebo_p_value is a SEPARATE field from the always-NaN analytical p_value;
  it carries no SE/t-stat and does not flow through safe_inference;
  is_significant stays bound to p_value.
- Fail-closed contract: a valid fit requires BOTH inner Frank-Wolfe AND outer-V
  convergence of the SELECTED incumbent; treated fit fails closed and placebos
  are excluded on either failure. The Powell polish validates the incumbent
  only when it converges back AT the incumbent's objective level (np.isclose),
  never on a success at a strictly-worse point.
- Edge cases: scale-aware RMSPE-ratio floor (perfect pre-fit -> finite ratio),
  J<2 -> NaN+warn, J==2 -> degenerate+coarse warn, deterministic given seed.
- get_placebo_df() returns the per-unit RMSPE-ratio table (incl. treated row +
  failed donors). Placebo compute is opt-in; each fit retains a
  _SyntheticControlFitSnapshot of the pivoted panel (excluded from pickling).

Reporting (diagnostic_report.py, practitioner.py, business_report.py,
_reporting_helpers.py): SCM routed like TROP (parallel-trends-free, fit-based).
New scm_fit PT analogue (design_enforced_pt verdict reading pre_rmspe), a
_scm_native diagnostic block surfacing pre_rmspe + donor-weight concentration +
the placebo p-value when already computed (never triggering the refit
implicitly), a practitioner _handle_synthetic_control with the placebo as the
headline significance step, and a BusinessReport fit-based assumption block with
ADH 2010 attribution. Also fixes a latent BR bug where headline is_significant
was a non-JSON-serializable numpy bool_ when p_value is numpy NaN.

Docs: REGISTRY.md §SyntheticControl (new **Note:** labels for donor-pool
construction, failure handling, RMSPE-ratio floor, non-analytical-p-value
split), REPORTING.md, docs/api/synthetic_control.rst, LLM guides, README,
CHANGELOG. TODO.md tracks the ADH-2015 follow-up (in-time placebo / LOO) and a
compact/lazy snapshot representation.

Tests: ~19 placebo + convergence tests (infeasible donor counts, treated-fit
non-convergence, Powell-succeeds-at-worse-point, failed-placebo exclusion,
deterministic reruns, pickle behavior) + SCM reporting-routing tests across all
three tools.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md                                |   3 +-
 README.md                                   |   2 +-
 TODO.md                                     |   3 +-
 diff_diff/_reporting_helpers.py             |  17 +
 diff_diff/business_report.py                |  76 +++-
 diff_diff/diagnostic_report.py              | 224 +++++++++-
 diff_diff/guides/llms-full.txt              |   9 +-
 diff_diff/guides/llms.txt                   |   2 +-
 diff_diff/practitioner.py                   |  71 +++-
 diff_diff/synthetic_control.py              | 213 +++++++++-
 diff_diff/synthetic_control_results.py      | 425 ++++++++++++++++++-
 docs/api/synthetic_control.rst              |  17 +-
 docs/methodology/REGISTRY.md                |  10 +-
 docs/methodology/REPORTING.md               |  13 +-
 tests/test_business_report.py               |  99 +++++
 tests/test_diagnostic_report.py             | 176 ++++++++
 tests/test_methodology_synthetic_control.py | 448 +++++++++++++++++++-
 tests/test_practitioner.py                  |  25 ++
 18 files changed, 1741 insertions(+), 92 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 02de12ab..023904b5 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,7 +9,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 - **New tutorial: `docs/tutorials/24_staggered_vs_collapsed_power.ipynb` — "Staggered Rollout or a Simple 2×2? A Power-Analysis Decision Guide".** A practitioner walkthrough for geo experiments (framed on a 50-state staggered rollout) on when to reach for Callaway-Sant'Anna vs collapsing to a familiar pre/post 2×2. Shows, with live paired Monte Carlo on `generate_staggered_data`, that the collapsed 2×2 silently targets a *diluted* estimand (reports ~60–94% of the true effect-on-treated as the rollout staggers, with near-zero CI coverage of the truth under a slow rollout), and that CS's minimum-detectable-lift penalty is a *fast-rollout* phenomenon that shrinks to parity as the rollout becomes more staggered. Fully self-contained (runs live, no committed data files); ends with a CS-vs-2×2 decision guide.
-- **New estimator: `SyntheticControl` — classic Synthetic Control Method (Abadie, Diamond & Hainmueller 2010; Abadie & Gardeazabal 2003).** Standalone estimator (`diff_diff/synthetic_control.py`) + `SyntheticControlResults` (`diff_diff/synthetic_control_results.py`) + `synthetic_control()` convenience function, exported from `diff_diff`. Builds a single treated unit's counterfactual as a convex combination of never-treated donor units — **donor (unit) weights only**, no time weights or ridge, distinct from `SyntheticDiD`. The inner simplex-constrained weighted-LS solve `W*(V)` reuses `utils._sc_weight_fw` (folding `V^½` into the predictor matrix, `intercept=False`, `zeta=0`); the diagonal predictor-importance matrix `V` is selected data-driven by minimizing pre-period outcome MSPE (`v_method="nested"`, softmax-on-simplex multistart Nelder-Mead + Powell polish) or supplied by the user (`v_method="custom"`). Predictors are built from `predictors`/`predictor_window`/`predictors_op`, `special_predictors`, and per-period outcome lags (`pre_period_outcomes`), in the R `Synth::dataprep` row order; per-row standardization (SD over donors+treated, ddof=1) matches the R `Synth::synth` source. Reports the gap path (`α̂_1t = Y_1t − Σ_j w_j Y_jt`), `att` (mean post-period gap), `pre_rmspe`, donor weights, `v_weights`, and a predictor-balance table. **No analytical standard error** — `se`/`t_stat`/`p_value`/`conf_int` are NaN (in-space placebo permutation inference with the post/pre RMSPE-ratio statistic is planned for a follow-up release; `_placebo_gaps`/`_rmspe_ratio`/`_fit_snapshot` are reserved on the results object). Ten validation gates baked in: predictor-period leakage, absorbing post-period suffix + no-anticipation cross-check against the treatment column, post-period canonicalization, donor-pool filtering before period derivation, empty-window rejection, poor-pre-fit `UserWarning` (RMSPE > SD of treated pre-outcomes), duplicate-predictor-label rejection, inner-solve non-convergence warning, order-independent gap-path rebuild, and the `standardize="none"` deviation; plus fail-closed `custom_v` cross-field rules and degenerate single-donor / single-pre-period handling. **R-`Synth` parity** (`tests/test_methodology_synthetic_control.py`, fixtures generated by `benchmarks/R/generate_synth_basque_golden.R` into `tests/data/`): two-tier on the Basque Country study — Tier-1 feeds R's `solution.v` via `custom_v` and reproduces the published donor weights (region 10 Cataluña 0.851 + region 14 Madrid 0.149) to `atol=1e-3` deterministically; Tier-2 (`@pytest.mark.slow`) checks the data-driven nested fit lands in a tolerance band (the nested `V` legitimately differs because the outer objective uses all pre periods, not R's `time.optimize.ssr` window). Documented in `docs/methodology/REGISTRY.md` §SyntheticControl (with `**Deviation from R:** standardize="none"` and `**Note:**` labels for the standardization formula, objective window, softmax `V` parametrization, and 1×SD poor-fit threshold), `docs/api/synthetic_control.rst`, the LLM guides, and `README.md`.
+- **`SyntheticControl` in-space placebo permutation inference + reporting-stack integration (ADH 2010 §2.4).** New `SyntheticControlResults.in_space_placebo()` provides the significance test classic SCM lacks an analytical SE for: it reassigns treatment to each donor, refits a synthetic control for that pseudo-treated donor against the **other `J−1` donors** (the real treated unit is excluded from every placebo pool — its post-period is treatment-contaminated; matches `SCtools::generate.placebos`), and ranks the treated unit's post/pre **RMSPE ratio** among the `J+1` units. New fields `placebo_p_value` (`= rank/(n_placebos+1)`, an upper-tail rank test on the unsigned RMSPE ratio — direction-agnostic, so it detects an effect of *either* sign rather than a signed/one-directional hypothesis; ties counted via `≥`), `rmspe_ratio` (the treated statistic, set at fit), and `n_placebos`/`n_failed` (effective reference-set sizes; non-converged placebos are excluded from BOTH numerator and denominator, never penalized into the rank). `placebo_p_value` is a **separate field** from the (always-NaN) `p_value` — it is a permutation p-value with no SE/t-stat and does not flow through `safe_inference`; `is_significant` stays bound to `p_value`. Edge cases fail closed: scale-aware RMSPE-ratio floor (a perfect pre-fit gives a finite ratio, not `inf`), `J<2` → NaN+warn, `J==2` → degenerate+coarse warn, deterministic given `seed`. New `get_placebo_df()` returns the per-unit RMSPE-ratio summary table (incl. the treated row and any failed donors) used for the rank. The design keeps the placebo *compute* opt-in — the per-donor refit loop runs only on the explicit `in_space_placebo()` call. To support that opt-in call, every fit retains a `_SyntheticControlFitSnapshot` of the pivoted panel (memory O(units × periods × predictor-vars), like `SyntheticDiD`'s snapshot for `in_time_placebo`; excluded from pickling). A compact/lazy snapshot representation is tracked as a follow-up in `TODO.md`. **Reporting-stack integration:** `SyntheticControlResults` is now routed through `DiagnosticReport` (fit-based `scm_fit` parallel-trends analogue → verdict `design_enforced_pt` reading `pre_rmspe`; `_scm_native` surfaces `pre_rmspe` + donor-weight concentration + the placebo p-value when already computed — never triggering the refit loop implicitly), `practitioner_next_steps` (`_handle_synthetic_control` with the placebo as the headline significance step), and `BusinessReport` (fit-based assumption block, ADH 2010 attribution, robustness via `estimator_native_diagnostics`; HonestDiD passthrough rejected like SDiD/TROP). Also fixes a latent BR bug where the headline `is_significant` was a non-JSON-serializable numpy `bool_` when `p_value` is a numpy `NaN`. Documented in `docs/methodology/REGISTRY.md` §SyntheticControl (new `**Note:**` labels for the donor-pool construction, failure handling, RMSPE-ratio floor, and the non-analytical-p-value split), `docs/methodology/REPORTING.md`, `docs/api/synthetic_control.rst`, the LLM guides, and `README.md`.
+- **New estimator: `SyntheticControl` — classic Synthetic Control Method (Abadie, Diamond & Hainmueller 2010; Abadie & Gardeazabal 2003).** Standalone estimator (`diff_diff/synthetic_control.py`) + `SyntheticControlResults` (`diff_diff/synthetic_control_results.py`) + `synthetic_control()` convenience function, exported from `diff_diff`. Builds a single treated unit's counterfactual as a convex combination of never-treated donor units — **donor (unit) weights only**, no time weights or ridge, distinct from `SyntheticDiD`. The inner simplex-constrained weighted-LS solve `W*(V)` reuses `utils._sc_weight_fw` (folding `V^½` into the predictor matrix, `intercept=False`, `zeta=0`); the diagonal predictor-importance matrix `V` is selected data-driven by minimizing pre-period outcome MSPE (`v_method="nested"`, softmax-on-simplex multistart Nelder-Mead + Powell polish) or supplied by the user (`v_method="custom"`). Predictors are built from `predictors`/`predictor_window`/`predictors_op`, `special_predictors`, and per-period outcome lags (`pre_period_outcomes`), in the R `Synth::dataprep` row order; per-row standardization (SD over donors+treated, ddof=1) matches the R `Synth::synth` source. Reports the gap path (`α̂_1t = Y_1t − Σ_j w_j Y_jt`), `att` (mean post-period gap), `pre_rmspe`, donor weights, `v_weights`, and a predictor-balance table. **No analytical standard error** — `se`/`t_stat`/`p_value`/`conf_int` are NaN; significance comes from in-space placebo permutation inference via `in_space_placebo()` (see the dedicated entry below). Ten validation gates baked in: predictor-period leakage, absorbing post-period suffix + no-anticipation cross-check against the treatment column, post-period canonicalization, donor-pool filtering before period derivation, empty-window rejection, poor-pre-fit `UserWarning` (RMSPE > SD of treated pre-outcomes), duplicate-predictor-label rejection, inner-solve non-convergence warning, order-independent gap-path rebuild, and the `standardize="none"` deviation; plus fail-closed `custom_v` cross-field rules and degenerate single-donor / single-pre-period handling. **R-`Synth` parity** (`tests/test_methodology_synthetic_control.py`, fixtures generated by `benchmarks/R/generate_synth_basque_golden.R` into `tests/data/`): two-tier on the Basque Country study — Tier-1 feeds R's `solution.v` via `custom_v` and reproduces the published donor weights (region 10 Cataluña 0.851 + region 14 Madrid 0.149) to `atol=1e-3` deterministically; Tier-2 (`@pytest.mark.slow`) checks the data-driven nested fit lands in a tolerance band (the nested `V` legitimately differs because the outer objective uses all pre periods, not R's `time.optimize.ssr` window). Documented in `docs/methodology/REGISTRY.md` §SyntheticControl (with `**Deviation from R:** standardize="none"` and `**Note:**` labels for the standardization formula, objective window, softmax `V` parametrization, and 1×SD poor-fit threshold), `docs/api/synthetic_control.rst`, the LLM guides, and `README.md`.
 - **StaggeredTripleDifference methodology-review-tracker promotion: In Progress → Complete**, plus a new opt-in Eq-4.14 overall ATT. Closes the Ortiz-Villavicencio & Sant'Anna (2025, arXiv:2505.09942v3) primary-source review on the tracker (PR-A #499 added the paper review on file; this PR validates the source against it). New paper-equation-anchored Verified Components in `tests/test_methodology_staggered_triple_diff.py` (Theorem 4.1 / Eq. 4.5 RA=IPW=DR identification; Eq. 4.1 three-term DDD decomposition; Eqs. 4.11-4.12 optimal-GMM weight normalization + single-group reduction; Eq. 4.13 event-study cohort-share weighting; Eq. 4.14 / Cor. 4.2 overall) alongside the existing R cross-validation against `triplediff::ddd(panel=TRUE)` + `agg_ddd()`. **New feature — opt-in `overall_att_es` (paper Eq. 4.14 overall):** the unweighted mean of the post-treatment event-study effects ES(e), exposed on `StaggeredTripleDiffResults` (with `overall_se_es` / `overall_t_stat_es` / `overall_p_value_es` / `overall_conf_int_es`) and populated only when `aggregate="event_study"` / `"all"`. The default `overall_att` is unchanged (the Callaway-Sant'Anna simple post-treatment (g,t) average — the library-wide convention). Its analytical SE is the influence function of that mean (the average of the per-event-time combined IFs, routed through the same survey-aware variance estimator as the per-e effects via a new `_se_from_psi` helper); a multiplier-bootstrap SE replaces it under `n_bootstrap>0`. Computed via a side-channel stash on the shared `CallawaySantAnnaAggregationMixin._aggregate_event_study` (no return-signature change; CallawaySantAnna unaffected), over post-treatment `e >= -anticipation` (the library convention, matching `overall_att`). Cross-validated against R `agg_ddd(type="eventstudy")$overall.att` / `overall.se` (SE matches to ~0.1%). REGISTRY `## StaggeredTripleDifference`: the previously-unlabeled overall-aggregation prose is formalized under a `**Note:**` documenting both overalls, and the duplicate aggregation-weight deviation is consolidated (fixing a `P(G=g)` vs R `P(S=g)` mislabel). `METHODOLOGY_REVIEW.md` row L69 promoted to **Complete** (`Last Review = 2026-05-30`) with a Verified Components / R Comparison Results detail block; priority queue pruned. `docs/references.rst` Ortiz-Villavicencio entry pinned to arXiv:2505.09942v3.
 - **SunAbraham + WooldridgeDiD-OLS `vcov_type="conley"` (Conley 1999 spatial-HAC) threading.** Both estimators now accept `vcov_type="conley"` with the five `conley_*` constructor params (`conley_coords`, `conley_cutoff_km`, `conley_metric`, `conley_kernel`, `conley_lag_cutoff`), reusing the already-`conleyreg`-validated `solve_ols` / `conley.py` machinery — within-period spatial HAC at `conley_lag_cutoff=0`, plus the within-unit Bartlett serial term at `conley_lag_cutoff>0` (the panel-aware path, since `conley_time`/`conley_unit` are always supplied — not pooled cross-sectional), no new variance code. Conley routes through each estimator's within-transform path; the unit auto-cluster is dropped on the conley path (an explicit `cluster=` enables the spatial+cluster product kernel); `survey_design=` / `weights` / `n_bootstrap>0` are rejected, and WooldridgeDiD conley is OLS-path-only (`method ∈ {logit, poisson}` + conley still rejected via the `method != "ols"` guard). `SunAbrahamResults` / `WooldridgeDiDResults` gain a `conley_lag_cutoff` field plus a Conley variance-label line in `summary()` (`SunAbrahamResults` also gains `cluster_name`). FWL-composability — the within-transform conley SE equals the full-dummy conley SE — is pinned in `tests/test_conley_vcov.py` (`TestConleySunAbraham` / `TestConleyWooldridge`). **`StackedDiD` conley remains deferred for a methodology reason** (the stacked design replicates units across sub-experiments, so Conley would see same-unit copies at distance 0; no `conleyreg` anchor; paper-gated) — its prior "same shape as the SunAbraham follow-up" framing is corrected in REGISTRY / TODO / the rejection message.
 - **ConleySpatialHAC methodology-review-tracker promotion: In Progress → Complete.** Closes the Conley (1999) *Journal of Econometrics* 92(1) primary-source review on the methodology-review tracker. The paper review on file at `docs/methodology/papers/conley-1999-review.md` was previously merged (2026-05-09); this PR is the F.L.I.P. consolidation — new `tests/test_methodology_conley.py` with paper-equation-numbered Verified Components walk-through (~1600 LoC; 10 classes; 60 tests, 5 of them `@pytest.mark.slow`). Coverage: Eq. 4.2 cross-sectional sandwich (pairwise-distance specialization; the project's paper review identifies Eq. 4.2 page 18 as the real-valued/pairwise form, with Eq. 3.13 reserved for the lattice-indexed form), Eq. 4.2 HC0 + rank-1 limits, Andrews (1991) HAC lag truncation matching `conleyreg::time_dist.cpp`, haversine convention with Earth radius 6371.01 km, Phase 2 panel block-decomposed sandwich at `atol=1e-12`, sparse k-d-tree dense-vs-sparse bit-identity (Wave A #120 numerical correctness), and R `conleyreg` v0.1.9 parity at `atol=1e-6` on 6 fixtures (3 cross-sectional + 3 panel) plus the sparse-forced and time-asymmetric kernel parity contracts. Three dedicated deviations-area classes: `TestConleyLibraryExtensions` (Wave A library extensions — combined spatial+cluster product kernel #119, callable conley_metric validation #123, sparse k-d-tree activation #120, indefiniteness guard), `TestConleyDeviationsFromR` (1-D radial Bartlett vs paper's 2-D separable Eq. 3.14, time-label normalization via `np.unique`, independent temporal kernel deferred), and `TestConleyDeferrals` (5 fail-closed `NotImplementedError`/`TypeError` contracts: LinearRegression + survey_design, DiD/MPD/TWFE + survey_design, Conley + weights, SyntheticDiD + Conley, wild_bootstrap + Conley). Methodology-anchored tests extracted from `tests/test_conley_vcov.py`: full classes `TestConleyDirectHelper`, `TestConleyReductions`, `TestConleyReductionsAddendum`, `TestConleyParityR`, `TestConleyParitySpacetime`, `TestConleyPanelHelper`, `TestConleySparseRParityForced`; plus methodology-anchored tests from `TestConleyKernels`, `TestConleyDistanceMetrics`, `TestConleySparse`. File drops 4248 → 3113 lines after extraction. Defensive surface preserved: input validation, NaN/inf guards, dispatch-level validity, estimator-level integration smoke tests, set_params atomicity, sparse-path activation thresholds + density-gate fallback. `METHODOLOGY_REVIEW.md` row L91 promoted to **Complete** with `Last Review = 2026-05-26`; detail block rewritten with Verified Components / Test Coverage / R Comparison Results inline table / Corrections Made / Deviations / Outstanding Concerns. Priority queue at L1386 pruned: PreTrendsPower removed (already Complete since 2026-05-19) and ConleySpatialHAC removed (this PR); substantive-review-blocked renumbered #2-#5 → #1-#4 and consolidation-pass-blocked renumbered #6-#8 → #5-#6.
diff --git a/README.md b/README.md
index a4ce9993..1e738591 100644
--- a/README.md
+++ b/README.md
@@ -108,7 +108,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`.
 - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html) - Gardner (2022) two-stage estimator with GMM sandwich variance
 - [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing; supports survey-design variance under `survey_design=` for HC1 / CR1 (Wave E.1 Binder TSL) and Conley (Wave E.2 panel-aware stratified-Conley sandwich on per-period PSU totals; extended in Wave E.2 follow-up to `conley_lag_cutoff > 0` via panel-block composition with within-PSU serial Bartlett HAC — `lag>0` requires an effective PSU via explicit `survey_design.psu` or injected `cluster=<col>`); `SurveyDesign.subpopulation()` preserves full-design `n_psu` / `df_survey` via zero-padded scores (Wave E.3, R `svyrecvar(subset())` form)
 - [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - Synthetic DiD combining standard DiD and synthetic control for few treated units
-- [SyntheticControl](https://diff-diff.readthedocs.io/en/stable/api/synthetic_control.html) - Abadie, Diamond & Hainmueller (2010) classic synthetic control for a single treated unit (donor-weight counterfactual, nested/custom V; no inference in this release — permutation/placebo planned)
+- [SyntheticControl](https://diff-diff.readthedocs.io/en/stable/api/synthetic_control.html) - Abadie, Diamond & Hainmueller (2010) classic synthetic control for a single treated unit (donor-weight counterfactual, nested/custom V; in-space placebo permutation inference via `in_space_placebo()`)
 - [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html) - triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
 - [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html) - Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves
 - [HeterogeneousAdoptionDiD](https://diff-diff.readthedocs.io/en/stable/api/had.html) - de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) for designs where **no unit remains untreated**; local-linear estimator at the dose support boundary returning Weighted Average Slope (WAS) on Design 1' (`d̲ = 0` / QUG) or `WAS_{d̲}` on Design 1 (`d̲ > 0`, continuous-near-d̲ or mass-point), with a multi-period event-study extension (last-treatment cohort, pointwise CIs). **Panel-only** in this release - repeated cross-sections rejected by the validator. Alias `HAD`.
diff --git a/TODO.md b/TODO.md
index 5d20a4f9..5236208d 100644
--- a/TODO.md
+++ b/TODO.md
@@ -85,7 +85,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
 | Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
 | Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low |
-| SyntheticControl: `SyntheticControlResults` not wired into the practitioner / DiagnosticReport / BusinessReport routing, so routing SCM results through those tools yields generic parallel-trends/HonestDiD guidance that doesn't fit SCM. Add SCM to the native-routed rejection sets (mirror SDiD/TROP) and surface SCM-native diagnostics (pre-fit / in-space placebo / in-time placebo / leave-one-out). Deferred to PR-2, where it pairs with the placebo-inference layer those reports would surface. | `practitioner.py`, `diagnostic_report.py`, `business_report.py` | SCM PR-1 → PR-2 | Medium |
+| SyntheticControl: in-time placebo + leave-one-out donor-robustness diagnostics are not implemented (ADH 2015, not the ADH 2010 scope of the current estimator), so `_scm_native` surfaces only pre-fit + in-space placebo. The practitioner / DiagnosticReport / BusinessReport routing and the in-space placebo permutation layer landed in PR-2; this remaining row covers adding the two ADH-2015 diagnostics (and surfacing them under `estimator_native_diagnostics`) in a later 2015-sourced PR. | `synthetic_control.py`, `diagnostic_report.py` | ADH-2015 follow-up | Low |
 | ContinuousDiD deferred CGBS 2024 extensions: (a) `covariates=` kwarg not implemented (matches R `contdid` v0.1.0); (b) discrete-treatment saturated regression deferred (integer-valued dose currently warned, not routed to per-level coefficients); (c) lowest-dose-as-control per CGBS 2024 Remark 3.1 (when `P(D=0) = 0`) not implemented — estimator requires never-treated controls. REGISTRY `## ContinuousDiD` → Implementation Checklist marks these as deferred `[ ]` items. | `diff_diff/continuous_did.py` | — | Low |
 | Survey-weighted Silverman bandwidth in EfficientDiD conditional Omega* — `_silverman_bandwidth()` uses unweighted mean/std for bandwidth selection; survey-weighted statistics would better reflect the population distribution but is a second-order refinement | `efficient_did_covariates.py` | — | Low |
 | TROP: extend Wave 4's `_setup_trop_data` helper to also cover the duplicated bootstrap resampling loop in `_bootstrap_variance` / `_bootstrap_variance_global` (~40 LoC dedup; mirrors the data-setup helper pattern with a `fit_callable` parameter for the per-draw refit step). | `trop_local.py`, `trop_global.py` | follow-up | Low |
@@ -163,6 +163,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | MPD cluster+hc2_bm path computes CR2 precomputes twice — once via `solve_ols` → `_compute_cr2_bm` for vcov + per-coefficient DOF, then again via `_compute_cr2_bm_contrast_dof` from `MultiPeriodDiD.fit()` for the post-period-average contrast DOF. Both rebuild `H = X bread_inv X'`, the residual-maker `M`, and the per-cluster `A_g = (I - H_gg)^{-1/2}` matrices. O(n²k) redundant work; acceptable for typical cluster-robust DiD panel sizes (n ≤ a few thousand). Fix would plumb the contrast DOF through the existing CR2 vcov path (intrusive API change) or share the precomputes via a cached helper. | `linalg.py::_compute_cr2_bm_contrast_dof`, `estimators.py::MultiPeriodDiD.fit` | follow-up | Low |
 | Rust-backend HC2 implementation. Current Rust path only supports HC1; HC2 and CR2 Bell-McCaffrey fall through to the NumPy backend. For large-n fits this is noticeable. | `rust/src/linalg.rs` | Phase 1a | Low |
 | CR2 Bell-McCaffrey DOF uses a naive `O(n² k)` per-coefficient loop over cluster pairs. Pustejovsky-Tipton (2018) Appendix B has a scores-based formulation that avoids the full `n × n` `M` matrix. Switch when a user hits a large-`n` cluster-robust design. | `linalg.py::_compute_cr2_bm` | Phase 1a | Low |
+| `SyntheticControl` retains a full `_SyntheticControlFitSnapshot` (pivoted outcome/predictor panels) on EVERY fit to support the opt-in `in_space_placebo()`, so callers who never run the placebo still pay O(units × periods × predictor-vars) memory (same as `SyntheticDiD`'s always-on snapshot for `in_time_placebo`). Store a compact array/index representation instead of per-variable DataFrames, or build the snapshot lazily on first placebo call (would need to retain the source data, ~same cost). | `synthetic_control.py` snapshot build, `synthetic_control_results.py::_SyntheticControlFitSnapshot` | follow-up | Low |
 
 #### Testing/Docs
 
diff --git a/diff_diff/_reporting_helpers.py b/diff_diff/_reporting_helpers.py
index f5711344..6cc70905 100644
--- a/diff_diff/_reporting_helpers.py
+++ b/diff_diff/_reporting_helpers.py
@@ -618,6 +618,23 @@ def describe_target_parameter(results: Any) -> Dict[str, Any]:
             "reference": "REGISTRY.md Sec. TROP",
         }
 
+    if name == "SyntheticControlResults":
+        return {
+            "name": "SCM ATT (mean post-treatment gap for the single treated unit)",
+            "definition": (
+                "The average over the post-treatment periods of the gap "
+                "``alpha_hat_{1t} = Y_{1t} - sum_j w_j Y_{jt}`` between the single "
+                "treated unit and its donor-weighted synthetic control (Abadie, "
+                "Diamond & Hainmueller 2010). There is no population-averaging or "
+                "sampling estimand — it is the effect on the one treated unit; "
+                "significance is assessed by in-space placebo permutation inference "
+                "(no analytical standard error)."
+            ),
+            "aggregation": "single_unit_gap",
+            "headline_attribute": "att",
+            "reference": "REGISTRY.md Sec. SyntheticControl",
+        }
+
     # Default: unrecognized result class. Fall through with a neutral
     # block — agents / downstream consumers can still dispatch on
     # ``aggregation="unknown"`` and fall back to generic ATT narration.
diff --git a/diff_diff/business_report.py b/diff_diff/business_report.py
index 6ceeb4a5..f6edb275 100644
--- a/diff_diff/business_report.py
+++ b/diff_diff/business_report.py
@@ -203,6 +203,7 @@ def __init__(
         if honest_did_results is not None and type(results).__name__ in {
             "SyntheticDiDResults",
             "TROPResults",
+            "SyntheticControlResults",
         }:
             raise ValueError(
                 f"{type(results).__name__} routes robustness to "
@@ -213,8 +214,9 @@ def __init__(
                 "object's native diagnostics "
                 "(SDiD: ``in_time_placebo()``, ``sensitivity_to_zeta_omega()``, "
                 "``pre_treatment_fit``; TROP: ``effective_rank``, "
-                "``loocv_score``) — BusinessReport surfaces these "
-                "automatically under ``estimator_native_diagnostics``."
+                "``loocv_score``; SyntheticControl: ``in_space_placebo()``, "
+                "``pre_rmspe``, ``get_placebo_df()``) — BusinessReport surfaces "
+                "these automatically under ``estimator_native_diagnostics``."
             )
 
         # Round-44 P1 CI review on PR #318: mirror the SDiD/TROP
@@ -646,10 +648,13 @@ def _extract_headline(self, dr_schema: Optional[Dict[str, Any]]) -> Dict[str, An
         if att is None or not np.isfinite(att):
             sign = "undefined"
         ci_level = int(round((1.0 - display_alpha) * 100))
-        is_significant = (
+        # bool(...) coerces away numpy bool_ — when ``p`` is a numpy NaN (e.g.
+        # SyntheticControl, whose analytical p_value is always NaN), ``np.isfinite``
+        # yields a numpy bool that is NOT JSON-serializable in the schema.
+        is_significant = bool(
             p is not None and np.isfinite(p) and p < phrasing_alpha if p is not None else False
         )
-        near_threshold = (
+        near_threshold = bool(
             p is not None
             and np.isfinite(p)
             and (phrasing_alpha - 0.01) < p < (phrasing_alpha + 0.001)
@@ -1002,16 +1007,25 @@ def _lift_robustness(dr: Optional[Dict[str, Any]]) -> Dict[str, Any]:
         return {"status": "skipped", "reason": "auto_diagnostics=False"}
     bacon = dr.get("bacon") or {}
     native = dr.get("estimator_native_diagnostics") or {}
+    native_block = {
+        "status": native.get("status"),
+        "estimator": native.get("estimator"),
+        "pre_treatment_fit": native.get("pre_treatment_fit"),
+    }
+    # Classic SCM exposes pre_rmspe + donor-weight concentration + the (opt-in)
+    # in-space placebo rather than SDiD's pre_treatment_fit; surface those so the
+    # top-level robustness block is not empty for SyntheticControl.
+    if native.get("estimator") == "SyntheticControl":
+        native_block["pre_rmspe"] = native.get("pre_rmspe")
+        native_block["weight_concentration"] = native.get("weight_concentration")
+        native_block["in_space_placebo"] = native.get("in_space_placebo")
     return {
         "bacon": {
             "status": bacon.get("status"),
             "forbidden_weight": bacon.get("forbidden_weight"),
             "verdict": bacon.get("verdict"),
         },
-        "estimator_native": {
-            "status": native.get("status"),
-            "pre_treatment_fit": native.get("pre_treatment_fit"),
-        },
+        "estimator_native": native_block,
     }
 
 
@@ -1153,6 +1167,20 @@ def _describe_assumption(estimator_name: str, results: Any = None) -> Dict[str,
                 "captured through latent factor loadings."
             ),
         }
+    if estimator_name in {"SyntheticControlResults"}:
+        return {
+            # Distinct from SDiD's "synthetic_fit" weighted-PT analogue: classic
+            # SCM is a donor-weighted level match (matches the DR "scm_fit" method).
+            "parallel_trends_variant": "scm_fit",
+            "no_anticipation": True,
+            "description": (
+                "Classic synthetic control identifies the single treated unit's "
+                "counterfactual via a donor-weighted match to its pre-treatment "
+                "trajectory (a design-enforced fit, not a parallel-trends test); "
+                "significance comes from in-space placebo permutation inference "
+                "rather than an analytical standard error."
+            ),
+        }
     if estimator_name == "ContinuousDiDResults":
         # Callaway, Goodman-Bacon & Sant'Anna (2024), two-level PT:
         # REGISTRY.md §ContinuousDiD > Identification.
@@ -1780,6 +1808,8 @@ def _pt_method_subject(method: Optional[str]) -> str:
         return "Pre-treatment event-study coefficients"
     if method == "synthetic_fit":
         return "The synthetic-control pre-treatment fit"
+    if method == "scm_fit":
+        return "The synthetic-control donor-weighted pre-treatment fit"
     if method == "factor":
         return "The factor-model pre-treatment fit"
     return "Pre-treatment data"
@@ -1806,7 +1836,9 @@ def _pt_method_stat_label(method: Optional[str]) -> Optional[str]:
         return "joint p"
     if method in {"slope_difference", "hausman"}:
         return "p"
-    if method in {"synthetic_fit", "factor"}:
+    if method in {"synthetic_fit", "scm_fit", "factor"}:
+        # Design-enforced fit-based paths have no p-value label (SCM's significance
+        # is the in-space placebo, not a PT joint test).
         return None
     return "joint p"
 
@@ -1846,6 +1878,13 @@ def _references_for(estimator_name: str) -> List[Dict[str, str]]:
                 "& Wager, S. (2021). Synthetic Difference in Differences."
             ),
         },
+        "SyntheticControlResults": {
+            "role": "estimator",
+            "citation": (
+                "Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic "
+                "Control Methods for Comparative Case Studies. JASA, 105(490)."
+            ),
+        },
         "SunAbrahamResults": {
             "role": "estimator",
             "citation": (
@@ -2181,11 +2220,20 @@ def _render_summary(schema: Dict[str, Any]) -> str:
                     "assumption." + sens_tail_see_reliable
                 )
         elif verdict == "design_enforced_pt":
-            sentences.append(
-                "The synthetic control is designed to match the treated "
-                "group's pre-period trajectory (SDiD's weighted-parallel-"
-                "trends analogue)."
-            )
+            if method == "scm_fit":
+                sentences.append(
+                    "The synthetic control is designed to reproduce the treated "
+                    "unit's pre-period trajectory via donor weights (classic SCM's "
+                    "design-enforced analogue of parallel trends); significance "
+                    "comes from in-space placebo permutation inference, not a "
+                    "parallel-trends test."
+                )
+            else:
+                sentences.append(
+                    "The synthetic control is designed to match the treated "
+                    "group's pre-period trajectory (SDiD's weighted-parallel-"
+                    "trends analogue)."
+                )
         elif verdict == "inconclusive":
             # Round-35 P1 CI review on PR #318: a ``verdict=="inconclusive"``
             # state means one or more pre-period coefficients had
diff --git a/diff_diff/diagnostic_report.py b/diff_diff/diagnostic_report.py
index cdfdf21a..9173f3c6 100644
--- a/diff_diff/diagnostic_report.py
+++ b/diff_diff/diagnostic_report.py
@@ -157,6 +157,21 @@
             "estimator_native",
         }
     ),
+    "SyntheticControlResults": frozenset(
+        # Classic SCM expresses its identifying assumption as design-enforced
+        # pre-treatment fit (the donor weights match the treated unit's
+        # pre-period trajectory), so — like SDiD — ``parallel_trends`` routes to
+        # the fit-based ``_pt_scm_fit`` (verdict ``design_enforced_pt``, NOT a PT
+        # hypothesis test; see prose). It also exposes ``estimator_native`` (the
+        # in-space placebo permutation inference + weight concentration).
+        # ``sensitivity`` is omitted (no HonestDiD analogue — significance testing
+        # is the native placebo); ``heterogeneity`` is omitted (one treated unit).
+        {
+            "parallel_trends",
+            "design_effect",
+            "estimator_native",
+        }
+    ),
     "EfficientDiDResults": frozenset(
         {
             "parallel_trends",
@@ -195,6 +210,7 @@
 #   "hausman"        — EfficientDiD.hausman_pretest (native PT-All vs PT-Post)
 #   "synthetic_fit"  — SDiD weighted pre-treatment fit (surfaces pre_treatment_fit)
 #   "factor"         — TROP factor-model identification (no PT; renders "N/A" prose)
+#   "scm_fit"        — classic SCM donor-weighted pre-treatment fit (surfaces pre_rmspe)
 _PT_METHOD: Dict[str, str] = {
     "DiDResults": "two_x_two",
     "MultiPeriodDiDResults": "event_study",
@@ -210,6 +226,7 @@
     "ChaisemartinDHaultfoeuilleResults": "event_study",
     "SyntheticDiDResults": "synthetic_fit",
     "TROPResults": "factor",
+    "SyntheticControlResults": "scm_fit",
 }
 
 
@@ -257,11 +274,14 @@ class DiagnosticReport:
     pre_periods, post_periods : list, optional
         Explicit pre- and post-treatment period labels.
     run_parallel_trends, run_sensitivity, run_placebo, run_bacon, run_design_effect, run_heterogeneity, run_epv, run_pretrends_power : bool
-        Per-check opt-in flags. ``run_placebo`` defaults to ``False`` (opt-in,
-        expensive, currently not implemented - placebo key remains reserved
-        as ``skipped`` in the schema). All other checks default to ``True``
-        and are further gated by estimator-type and instance-level
-        applicability (see ``docs/methodology/REPORTING.md``).
+        Per-check opt-in flags. ``run_placebo`` defaults to ``False`` — the generic
+        placebo battery is not implemented in MVP, so the ``placebo`` key remains
+        reserved as ``skipped`` in the schema. (Exception: ``SyntheticControl``'s
+        in-space placebo permutation test IS implemented — run it via
+        ``results.in_space_placebo()``; its result is surfaced under
+        ``estimator_native_diagnostics.in_space_placebo``, not this generic section.)
+        All other checks default to ``True`` and are further gated by estimator-type
+        and instance-level applicability (see ``docs/methodology/REPORTING.md``).
     sensitivity_M_grid : tuple of float, default (0.5, 1.0, 1.5, 2.0)
         Grid of M values passed to ``HonestDiD.sensitivity``. Yields a
         ``SensitivityResults`` object with ``breakdown_M`` populated.
@@ -298,7 +318,9 @@ class DiagnosticReport:
         Other sections (``design_effect``, ``heterogeneity``, ``epv``) are
         read directly from the fitted result object and do not currently
         accept precomputed values — there is no expensive call to bypass.
-        ``placebo`` is reserved in the schema but opt-in / deferred in MVP.
+        ``placebo`` is reserved in the schema but opt-in / deferred in MVP for the
+        generic battery; ``SyntheticControl`` surfaces its in-space placebo under
+        ``estimator_native_diagnostics`` (run ``results.in_space_placebo()``).
     outcome_label, treatment_label : str, optional
         Plain-English labels used in prose rendering.
     """
@@ -390,11 +412,18 @@ def __init__(
         # native-routing contract documented in REPORTING.md.
         # Round-21 P1 CI review on PR #318 flagged this bypass.
         _result_name = type(self._results).__name__
-        _native_routed_names = {"SyntheticDiDResults", "TROPResults"}
+        _native_routed_names = {"SyntheticDiDResults", "TROPResults", "SyntheticControlResults"}
         if _result_name in _native_routed_names:
             _incompatible_keys = []
             if "sensitivity" in self._precomputed:
                 _incompatible_keys.append("sensitivity")
+            # All native-routed estimators reject a precomputed ``parallel_trends``
+            # payload — their PT verdict is computed INTERNALLY (SDiD/SCM:
+            # design-enforced pre-treatment fit; TROP: factor-model identification),
+            # not supplied as a user p-value. SyntheticControl is no exception: its
+            # ``scm_fit`` route reads ``pre_rmspe`` from the fit, and the generic
+            # precomputed-PT adapter only understands p-value-style payloads, which
+            # are methodology-incompatible with SCM (it has no PT p-value test).
             if "parallel_trends" in self._precomputed:
                 _incompatible_keys.append("parallel_trends")
             # Round-32 P1 CI review on PR #318: ``pretrends_power`` is a
@@ -420,7 +449,9 @@ def __init__(
                     "Use the native diagnostics on the result object "
                     "(SDiD: ``in_time_placebo``, ``sensitivity_to_zeta_omega``, "
                     "``pre_treatment_fit``; TROP: ``effective_rank``, "
-                    "``loocv_score``) — DR surfaces these automatically."
+                    "``loocv_score``; SyntheticControl: ``in_space_placebo()``, "
+                    "``pre_rmspe``, ``get_placebo_df()``) — DR surfaces these "
+                    "automatically under ``estimator_native_diagnostics``."
                 )
 
         # Round-44 P1 CI review on PR #318: mirror the SDiD/TROP
@@ -572,6 +603,19 @@ def _compute_applicable_checks(self) -> Tuple[set, Dict[str, str]]:
         # shape is stable: ``schema["placebo"]["status"] == "skipped"``
         # always holds regardless of estimator. The opt-in execution path
         # is deferred to a follow-up; ``REPORTING.md`` documents this.
+        # SyntheticControl is the exception — its in-space placebo permutation test
+        # IS implemented (run via results.in_space_placebo()) and surfaced under
+        # estimator_native_diagnostics, so its generic-section reason points there
+        # rather than claiming placebo is unimplemented (``setdefault`` wins over
+        # the generic message below).
+        if type(self._results).__name__ == "SyntheticControlResults":
+            skipped.setdefault(
+                "placebo",
+                "SyntheticControl's placebo battery is the in-space placebo "
+                "permutation test — run results.in_space_placebo(); the result is "
+                "surfaced under estimator_native_diagnostics.in_space_placebo, not "
+                "this generic section.",
+            )
         skipped.setdefault(
             "placebo",
             "Placebo battery runs on opt-in only; not yet implemented in MVP. "
@@ -877,7 +921,7 @@ def _instance_skip_reason(self, check: str) -> Optional[str]:
                     continue
             return "No group/event-study effects available to compute heterogeneity."
         if check == "estimator_native":
-            if name not in {"SyntheticDiDResults", "TROPResults"}:
+            if name not in {"SyntheticDiDResults", "TROPResults", "SyntheticControlResults"}:
                 return f"{name} does not expose native validation methods."
             return None
         return None
@@ -1073,11 +1117,29 @@ def _ran(key: str) -> bool:
             # has effectively been performed; treating the sensitivity
             # section as not-run would have ``next_steps`` redundantly
             # recommend a check the report already executed (round-19
-            # CI review on PR #318).
+            # CI review on PR #318). SyntheticControl is deliberately NOT
+            # in this set: its significance procedure is the OPT-IN in-space
+            # placebo (``in_space_placebo()``), which ``_scm_native`` reports
+            # but does not run, so the work is not complete merely because the
+            # native block ran — the practitioner ``placebo`` step must persist.
             result_name = type(self._results).__name__
-            if result_name in {"SyntheticDiDResults", "TROPResults"} and _ran("estimator_native"):
+            if result_name in {
+                "SyntheticDiDResults",
+                "TROPResults",
+            } and _ran("estimator_native"):
                 if "sensitivity" not in completed:
                     completed.append("sensitivity")
+            # SCM's significance step is the opt-in in-space placebo. Mark it
+            # complete only when the caller has run it AND it produced a VALID
+            # reference set (>=1 placebo + a finite p-value) — an attempted-but-
+            # infeasible run (J<2, treated-fit failure, or all donors failed) leaves
+            # placebo_p_value NaN and is NOT complete, so the recommendation persists.
+            if (
+                result_name == "SyntheticControlResults"
+                and int(getattr(self._results, "n_placebos", 0) or 0) > 0
+                and np.isfinite(getattr(self._results, "placebo_p_value", np.nan))
+            ):
+                completed.append("placebo")
             if _ran("heterogeneity"):
                 completed.append("heterogeneity")
             ns = practitioner_next_steps(
@@ -1114,6 +1176,8 @@ def _check_parallel_trends(self) -> Dict[str, Any]:
             return self._pt_hausman()
         if method == "synthetic_fit":
             return self._pt_synthetic_fit()
+        if method == "scm_fit":
+            return self._pt_scm_fit()
         if method == "factor":
             return self._pt_factor()
         return {
@@ -2086,6 +2150,10 @@ def _check_estimator_native(self) -> Dict[str, Any]:
 
         TROP: factor-model fit metrics (``effective_rank``, ``loocv_score``,
         selected ``lambda_*``).
+
+        SyntheticControl: pre-treatment fit (``pre_rmspe``), donor-weight
+        concentration, and — when already computed — the in-space placebo
+        permutation p-value (``in_space_placebo``).
         """
         r = self._results
         name = type(r).__name__
@@ -2093,6 +2161,8 @@ def _check_estimator_native(self) -> Dict[str, Any]:
             return self._sdid_native(r)
         if name == "TROPResults":
             return self._trop_native(r)
+        if name == "SyntheticControlResults":
+            return self._scm_native(r)
         return {
             "status": "not_applicable",
             "reason": f"{name} does not expose native validation methods.",
@@ -2172,6 +2242,71 @@ def _trop_native(self, r: Any) -> Dict[str, Any]:
             },
         }
 
+    def _scm_native(self, r: Any) -> Dict[str, Any]:
+        """Populate classic-SCM-native diagnostics section.
+
+        Always surfaces the pre-treatment fit (``pre_rmspe``) and donor-weight
+        concentration. The in-space placebo permutation inference (ADH 2010 §2.4)
+        is reported only when the user has ALREADY run ``in_space_placebo()`` —
+        DR never triggers it implicitly, because it refits one synthetic control
+        per donor (potentially many nested V searches) and the placebo layer is
+        opt-in by design. (This differs from SDiD's cheaper in-time-placebo sweep,
+        which ``_sdid_native`` runs inline.) Only the in-space placebo is exposed;
+        in-time placebo and leave-one-out are ADH 2015 (not implemented).
+        """
+        out: Dict[str, Any] = {"status": "ran", "estimator": "SyntheticControl"}
+        out["pre_rmspe"] = _to_python_float(getattr(r, "pre_rmspe", None))
+        out["n_donors"] = _to_python_scalar(getattr(r, "n_donors", None))
+        out["v_method"] = getattr(r, "v_method", None)
+        # Donor-weight concentration (Herfindahl + top weight): SCM places all mass
+        # on the donor simplex, so concentration is the natural "how few donors
+        # drive the synthetic" diagnostic.
+        try:
+            weights = [float(w) for w in r.donor_weights.values()]
+            out["weight_concentration"] = {
+                "herfindahl": _to_python_float(sum(w * w for w in weights)),
+                "top_weight": _to_python_float(max(weights) if weights else None),
+                "n_donors_with_weight": len(weights),
+            }
+        except Exception as exc:  # noqa: BLE001
+            out["weight_concentration"] = {
+                "status": "error",
+                "reason": f"donor_weights unavailable: {type(exc).__name__}: {exc}",
+            }
+        # In-space placebo: surface only if already computed (opt-in; see docstring).
+        if getattr(r, "_placebo_df", None) is not None:
+            n_placebos = int(getattr(r, "n_placebos", 0) or 0)
+            placebo_p = getattr(r, "placebo_p_value", np.nan)
+            block = {
+                "placebo_p_value": _to_python_float(placebo_p),
+                "rmspe_ratio": _to_python_float(getattr(r, "rmspe_ratio", None)),
+                "n_placebos": _to_python_scalar(n_placebos),
+                "n_failed": _to_python_scalar(getattr(r, "n_failed", None)),
+            }
+            # Distinguish a valid run from an attempted-but-infeasible one (J<2,
+            # treated-fit non-convergence, or zero successful donor refits) so BR/DR
+            # consumers see an explicit status/reason rather than a bare NaN p-value.
+            if n_placebos > 0 and np.isfinite(placebo_p):
+                block["status"] = "ran"
+            else:
+                block["status"] = "infeasible"
+                block["reason"] = (
+                    "in_space_placebo() was run but produced no valid reference set "
+                    "(fewer than 2 donors, a non-converged treated fit, or all donor "
+                    "refits failed); placebo_p_value is NaN."
+                )
+            out["in_space_placebo"] = block
+        else:
+            out["in_space_placebo"] = {
+                "status": "not_run",
+                "reason": (
+                    "Call results.in_space_placebo() to run in-space placebo "
+                    "permutation inference (opt-in; refits one synthetic control "
+                    "per donor)."
+                ),
+            }
+        return out
+
     # -- Heterogeneity helpers --------------------------------------------
 
     def _collect_effect_scalars(self) -> List[float]:
@@ -2504,6 +2639,31 @@ def _pt_synthetic_fit(self) -> Dict[str, Any]:
             "verdict": "design_enforced_pt",
         }
 
+    def _pt_scm_fit(self) -> Dict[str, Any]:
+        """Classic SCM donor-weighted pre-treatment-fit PT analogue.
+
+        Classic synthetic control (Abadie-Diamond-Hainmueller 2010) is not
+        parallel-trends-based: the donor weights are chosen to match the treated
+        unit's pre-period trajectory, so a small ``pre_rmspe`` is the
+        design-enforced substitute for a PT test. Unlike SDiD this reads
+        ``pre_rmspe`` (always populated on a successful fit). The in-space placebo
+        permutation inference lives in ``estimator_native_diagnostics``.
+        """
+        r = self._results
+        fit = _to_python_float(getattr(r, "pre_rmspe", None))
+        if fit is None:
+            return {
+                "status": "skipped",
+                "method": "scm_fit",
+                "reason": "SyntheticControlResults.pre_rmspe is not populated on this fit.",
+            }
+        return {
+            "status": "ran",
+            "method": "scm_fit",
+            "pre_treatment_fit_rmse": fit,
+            "verdict": "design_enforced_pt",
+        }
+
     def _pt_factor(self) -> Dict[str, Any]:
         """TROP has no PT concept — its identification is factor-model-based."""
         return {
@@ -3077,7 +3237,10 @@ def _check_headline(check: str, section: Dict[str, Any]) -> Optional[Any]:
     if check == "epv":
         return section.get("min_epv")
     if check == "estimator_native":
-        return section.get("pre_treatment_fit")
+        # SDiD reports ``pre_treatment_fit``; classic SCM reports ``pre_rmspe`` (its
+        # design-enforced pre-fit) — fall back so SCM's tabular headline is not None.
+        fit = section.get("pre_treatment_fit")
+        return fit if fit is not None else section.get("pre_rmspe")
     return None
 
 
@@ -3112,6 +3275,8 @@ def _pt_subject_phrase(method: Optional[str]) -> str:
         return "Pre-treatment event-study coefficients"
     if method == "synthetic_fit":
         return "The synthetic-control pre-treatment fit"
+    if method == "scm_fit":
+        return "The synthetic-control donor-weighted pre-treatment fit"
     if method == "factor":
         return "The factor-model pre-treatment fit"
     return "Pre-treatment data"
@@ -3139,7 +3304,7 @@ def _pt_stat_label(method: Optional[str]) -> Optional[str]:
         return "joint p"
     if method in {"slope_difference", "hausman"}:
         return "p"
-    if method in {"synthetic_fit", "factor"}:
+    if method in {"synthetic_fit", "scm_fit", "factor"}:
         return None
     return "joint p"
 
@@ -3299,15 +3464,30 @@ def _render_overall_interpretation(schema: Dict[str, Any], labels: Dict[str, str
                 )
         elif verdict == "design_enforced_pt":
             rmse = pt.get("pre_treatment_fit_rmse")
-            sentences.append(
-                f"The synthetic control matches the treated group's "
-                f"pre-period trajectory with RMSE = "
-                f"{rmse:.3g} (SDiD's design-enforced analogue of parallel "
-                f"trends)."
-                if isinstance(rmse, (int, float))
-                else "SDiD's synthetic control is designed to satisfy the "
-                "weighted parallel-trends analogue."
-            )
+            if pt.get("method") == "scm_fit":
+                # Classic SCM: a single treated UNIT, donor-weighted level match;
+                # significance is the in-space placebo, not SDiD's weighted PT.
+                sentences.append(
+                    f"The synthetic control reproduces the treated unit's "
+                    f"pre-period trajectory with pre-RMSPE = {rmse:.3g} (classic "
+                    f"SCM's donor-weighted, design-enforced analogue of parallel "
+                    f"trends; significance is assessed via in-space placebo "
+                    f"permutation, not a parallel-trends test)."
+                    if isinstance(rmse, (int, float))
+                    else "Classic SCM's donor-weighted synthetic control is designed "
+                    "to match the treated unit's pre-period trajectory; significance "
+                    "comes from in-space placebo permutation inference."
+                )
+            else:
+                sentences.append(
+                    f"The synthetic control matches the treated group's "
+                    f"pre-period trajectory with RMSE = "
+                    f"{rmse:.3g} (SDiD's design-enforced analogue of parallel "
+                    f"trends)."
+                    if isinstance(rmse, (int, float))
+                    else "SDiD's synthetic control is designed to satisfy the "
+                    "weighted parallel-trends analogue."
+                )
         elif verdict == "inconclusive":
             # Round-35 P1 CI review on PR #318: DR summary / overall
             # interpretation must surface the inconclusive state
diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt
index e12801c4..b9a54e21 100644
--- a/diff_diff/guides/llms-full.txt
+++ b/diff_diff/guides/llms-full.txt
@@ -616,7 +616,7 @@ scm.fit(
 ) -> SyntheticControlResults
 ```
 
-**Inference:** NONE analytical — `se`/`t_stat`/`p_value`/`conf_int` are NaN. `att` is the mean post-period gap; use placebo/permutation inference for significance (planned follow-up). Predictor periods must lie within the pre window; `post_periods` must be a contiguous suffix cross-checked against `D` (no anticipation).
+**Inference:** NONE analytical — `se`/`t_stat`/`p_value`/`conf_int` are always NaN. `att` is the mean post-period gap. Significance via in-space placebo permutation inference: `results.in_space_placebo()` reassigns treatment to each donor, refits against the other J-1 donors (the real treated unit is excluded from every placebo pool), and sets `placebo_p_value = rank/(n_placebos+1)` from the post/pre RMSPE-ratio. The permutation `placebo_p_value` is a SEPARATE field from the (NaN) `p_value`; `is_significant` stays bound to `p_value`. Predictor periods must lie within the pre window; `post_periods` must be a contiguous suffix cross-checked against `D` (no anticipation).
 
 **Usage:**
 
@@ -1278,7 +1278,10 @@ Returned by `SyntheticControl.fit()`.
 | Attribute | Type | Description |
 |-----------|------|-------------|
 | `att` | `float` | Mean post-period gap (reported point estimate) |
-| `se`, `t_stat`, `p_value`, `conf_int` | `float` / tuple | Always NaN — no analytical SE (permutation inference planned) |
+| `se`, `t_stat`, `p_value`, `conf_int` | `float` / tuple | Always NaN — no analytical SE (use `in_space_placebo()`) |
+| `placebo_p_value` | `float` | In-space placebo permutation p-value, `rank/(n_placebos+1)` (NaN until `in_space_placebo()` runs) |
+| `rmspe_ratio` | `float` | Treated unit's post/pre RMSPE ratio = sqrt(post-MSPE/pre-MSPE) (the placebo test statistic; set at fit) |
+| `n_placebos`, `n_failed` | `int` | Placebos in the reference set / excluded for non-convergence |
 | `n_obs` | `int` | Treated + donor rows over all periods |
 | `n_donors` | `int` | Donor units in the (post-filter) pool |
 | `n_pre_periods`, `n_post_periods` | `int` | Period counts |
@@ -1292,7 +1295,7 @@ Returned by `SyntheticControl.fit()`.
 | `pre_periods`, `post_periods` | `list` | Calendar-sorted periods |
 | `v_method`, `standardize` | `str` | Echoed configuration |
 
-**Methods:** `summary()`, `print_summary()`, `to_dict()`, `to_dataframe()`, `get_gap_df()`, `get_weights_df()`
+**Methods:** `in_space_placebo()` (opt-in permutation inference; refits one synthetic control per donor), `get_placebo_df()` (per-unit RMSPE-ratio table incl. the treated row), `summary()`, `print_summary()`, `to_dict()`, `to_dataframe()`, `get_gap_df()`, `get_weights_df()`
 
 ### TripleDifferenceResults
 
diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt
index f56e63cd..5261b046 100644
--- a/diff_diff/guides/llms.txt
+++ b/diff_diff/guides/llms.txt
@@ -60,7 +60,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")`
 - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html): Gardner (2022) two-stage estimator with GMM sandwich variance
 - [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing; supports `SurveyDesign(weights, strata, psu, fpc)` under `vcov_type="hc1"` with optional `cluster=<col>` for CR1 via Gerber (2026) Binder TSL (Wave E.1) and under `vcov_type="conley"` via a panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2 cross-sectional `conley_lag_cutoff=0`) extended in Wave E.2 follow-up to `conley_lag_cutoff > 0` via panel-block composition with within-PSU serial Bartlett HAC (Newey-West 1987 separable form; `lag>0` requires an effective PSU via explicit `survey_design.psu` or injected `cluster=<col>`), both composed with the Wave D Gardner GMM correction; `SurveyDesign.subpopulation()` preserves full-design `n_psu` / `df_survey` via zero-padded scores at the meat-helper boundary (Wave E.3, R `svyrecvar(subset())` form) (replicate weights queued as follow-up)
 - [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html): Synthetic DiD combining standard DiD and synthetic control methods for few treated units
-- [SyntheticControl](https://diff-diff.readthedocs.io/en/stable/api/synthetic_control.html): Abadie, Diamond & Hainmueller (2010) classic synthetic control for ONE treated unit — donor-weight counterfactual, nested or custom predictor-importance V, gap path + pre-RMSPE; no inference in this release (inference fields are NaN — permutation/placebo planned)
+- [SyntheticControl](https://diff-diff.readthedocs.io/en/stable/api/synthetic_control.html): Abadie, Diamond & Hainmueller (2010) classic synthetic control for ONE treated unit — donor-weight counterfactual, nested or custom predictor-importance V, gap path + pre-RMSPE; no analytical SE (inference fields NaN), significance via in-space placebo permutation inference (`in_space_placebo()`, post/pre RMSPE-ratio, p = rank/(n_placebos+1))
 - [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html): Triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
 - [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html): Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves
 - [HeterogeneousAdoptionDiD](https://diff-diff.readthedocs.io/en/stable/api/had.html): de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) for designs where **no unit remains untreated**; local-linear estimator at the dose support boundary returning Weighted Average Slope (WAS) on Design 1' (`d̲=0` / QUG) or `WAS_{d̲}` on Design 1 (`d̲>0`, continuous-near-d̲ or mass-point), with multi-period event-study extension (last-treatment cohort, pointwise CIs). **Panel-only** in this release (repeated cross-sections rejected by the validator). Alias `HAD`.
diff --git a/diff_diff/practitioner.py b/diff_diff/practitioner.py
index 128f378a..b0d6811f 100644
--- a/diff_diff/practitioner.py
+++ b/diff_diff/practitioner.py
@@ -20,6 +20,7 @@
     "estimator_selection",
     "estimation",
     "sensitivity",
+    "placebo",
     "heterogeneity",
     "robustness",
 }
@@ -37,6 +38,7 @@
     "StackedDiDResults": "StackedDiD",
     "SyntheticDiDResults": "SyntheticDiD",
     "TROPResults": "TROP",
+    "SyntheticControlResults": "SyntheticControl",
     "EfficientDiDResults": "EfficientDiD",
     "ContinuousDiDResults": "ContinuousDiD",
     "TripleDifferenceResults": "TripleDifference (DDD)",
@@ -71,7 +73,7 @@ def practitioner_next_steps(
         Steps the caller has already completed. Valid names:
         ``"target_parameter"``, ``"assumptions"``, ``"parallel_trends"``,
         ``"estimator_selection"``, ``"estimation"``, ``"sensitivity"``,
-        ``"heterogeneity"``, ``"robustness"``.
+        ``"placebo"``, ``"heterogeneity"``, ``"robustness"``.
     verbose : bool, default True
         If True, print a human-readable summary to stdout.
 
@@ -656,6 +658,72 @@ def _handle_trop(results: Any):
     return steps, warnings
 
 
+def _handle_synthetic_control(results: Any):
+    steps = [
+        _step(
+            baker_step=6,
+            label="In-space placebo permutation inference",
+            why=(
+                "Classic SCM has no analytical standard error. Significance "
+                "comes from the in-space placebo test (Abadie-Diamond-Hainmueller "
+                "2010, Section 2.4): reassign treatment to each donor, refit, and "
+                "rank the treated unit's post/pre RMSPE ratio "
+                "(p = rank/(n_placebos+1), excluding non-converged placebos)."
+            ),
+            code=(
+                "placebo_df = results.in_space_placebo()\n"
+                "print(f'placebo p-value: {results.placebo_p_value:.3f} "
+                "(n={results.n_placebos})')\n"
+                "print(placebo_df)  # per-unit RMSPE-ratio table used for the rank"
+            ),
+            priority="high",
+            # SCM's significance test IS the placebo; tag it "placebo" (not
+            # "sensitivity") so it survives once the native diagnostics block runs,
+            # mirroring _handle_trop.
+            step_name="placebo",
+        ),
+        _step(
+            baker_step=3,
+            label="Demonstrate pre-treatment fit (SCM identification)",
+            why=(
+                "SCM's identifying assumption is design-enforced fit, not a "
+                "parallel-trends test: it is only credible when the synthetic "
+                "control reproduces the treated unit's pre-period path. Report "
+                "the pre-RMSPE and predictor-balance table; a poor fit means do "
+                "not use SCM (ADH 2010 p. 495)."
+            ),
+            code=(
+                "print(f'pre-treatment RMSPE: {results.pre_rmspe:.4f}')\n"
+                "print(results.predictor_balance)\n"
+                "print(results.get_weights_df())  # donor weight concentration"
+            ),
+            priority="high",
+            # Design-enforced fit IS SCM's parallel-trends analogue (mirrors the
+            # DiagnosticReport ``scm_fit`` PT routing); tagging it "parallel_trends"
+            # keeps it from being auto-suppressed as the completed estimation step.
+            step_name="parallel_trends",
+        ),
+        _step(
+            baker_step=4,
+            label="Curate the donor pool",
+            why=(
+                "Donors exposed to the same/similar intervention or to large "
+                "confounding shocks contaminate the comparison (ADH 2010 "
+                "pp. 498-499). Restrict the donor pool to clean, comparable units."
+            ),
+            code=(
+                "# Exclude contaminated donors explicitly:\n"
+                "# synthetic_control(..., donor_pool=[clean, comparable, units])"
+            ),
+            priority="medium",
+            step_name="estimator_selection",
+        ),
+        _robustness_compare_step("SyntheticDiD or CS"),
+    ]
+    warnings = _check_nan_att(results)
+    return steps, warnings
+
+
 def _handle_efficient(results: Any):
     steps = [
         _parallel_trends_step(staggered=True),
@@ -1301,6 +1369,7 @@ def _handle_generic(results: Any):
     "StackedDiDResults": _handle_stacked,
     "SyntheticDiDResults": _handle_synthetic,
     "TROPResults": _handle_trop,
+    "SyntheticControlResults": _handle_synthetic_control,
     "EfficientDiDResults": _handle_efficient,
     "ContinuousDiDResults": _handle_continuous,
     "TripleDifferenceResults": _handle_triple,
diff --git a/diff_diff/synthetic_control.py b/diff_diff/synthetic_control.py
index 8f349124..164568a2 100644
--- a/diff_diff/synthetic_control.py
+++ b/diff_diff/synthetic_control.py
@@ -17,9 +17,11 @@
 time weights and ridge regularization: classic SCM uses **donor weights only** and a
 level-matching estimator, plus the outer ``V`` search SyntheticDiD has no analog for.
 
-Inference: classic SCM has **no analytical standard error** — the paper proposes
-permutation/placebo inference (a later PR). ``se``/``t_stat``/``p_value``/``conf_int``
-are always NaN here; ``att`` (mean post-period gap) is the reported estimate.
+Inference: classic SCM has **no analytical standard error**, so
+``se``/``t_stat``/``p_value``/``conf_int`` are always NaN; ``att`` (mean post-period
+gap) is the reported estimate. Significance comes from in-space placebo permutation
+inference (ADH 2010 §2.4) via ``SyntheticControlResults.in_space_placebo()`` — a
+separate ``placebo_p_value`` field, distinct from the NaN analytical ``p_value``.
 
 Numerics provenance: the standardization divisor and the inner/outer optimization
 scheme are NOT specified in ADH (2010) — they are pinned from the R ``Synth`` package
@@ -35,7 +37,10 @@
 import pandas as pd
 from scipy.optimize import minimize
 
-from diff_diff.synthetic_control_results import SyntheticControlResults
+from diff_diff.synthetic_control_results import (
+    SyntheticControlResults,
+    _SyntheticControlFitSnapshot,
+)
 from diff_diff.utils import _sc_weight_fw, safe_inference, warn_if_not_converged
 
 __all__ = ["SyntheticControl", "synthetic_control"]
@@ -389,6 +394,9 @@ def fit(
         # mspe_v is the OUTER-objective value; it is populated only when a nested V
         # search actually ran (None on the custom and single-donor paths).
         mspe_v: Optional[float] = None
+        # ``outer_converged`` tracks whether the nested V search reached an optimum
+        # (trivially True when there is no outer search: custom V or a single donor).
+        outer_converged = True
         if self.v_method == "custom":
             v = self._prepare_custom_v(k)
             w, converged = _inner_solve_W(X1s, X0s, v, self.inner_max_iter, self.inner_min_decrease)
@@ -405,7 +413,7 @@ def fit(
             w = np.array([1.0])
             converged = True
         else:
-            v, w, converged, mspe_v = _outer_solve_V(
+            v, w, converged, mspe_v, outer_converged = _outer_solve_V(
                 X1,
                 X0,
                 X1s,
@@ -426,6 +434,12 @@ def fit(
         pre_rmspe = float(np.sqrt(np.mean(pre_gaps**2)))
         att = float(np.mean(post_gaps))
 
+        # Treated unit's post/pre RMSPE ratio — the in-space placebo test statistic
+        # (ADH 2010 §2.4). Cheap to compute now; the placebo reference distribution
+        # is built on demand by SyntheticControlResults.in_space_placebo().
+        treated_scale = float(np.max(np.abs(Z1))) if Z1.size else 0.0
+        rmspe_ratio = _rmspe_ratio(pre_gaps, post_gaps, treated_scale)
+
         # Poor-fit warning (REGISTRY contract: warn when pre-RMSPE exceeds the SD of
         # the treated unit's pre-period outcomes). This includes a FLAT treated pre-path
         # (pre_sd == 0): any non-trivial RMSPE then means the synthetic cannot reproduce
@@ -490,12 +504,45 @@ def fit(
             standardize=self.standardize,
             alpha=self.alpha,
             mspe_v=mspe_v,
+            rmspe_ratio=rmspe_ratio,
+        )
+        # Retain the panel state needed to refit each donor as a pseudo-treated
+        # unit for in-space placebo inference (ADH 2010 §2.4). Stored as a plain
+        # attribute (not a dataclass field) and excluded from pickling via
+        # SyntheticControlResults.__getstate__ (it holds the full panel).
+        # COPY all caller-owned mutable inputs (the custom_v array, the
+        # optimizer_options dict, the specs list) so post-fit mutation of the
+        # estimator's inputs cannot silently change in_space_placebo() output on an
+        # already-returned results object. (pivots are freshly pivoted here, and the
+        # period/id lists are re-listed below, so those are not caller-aliased.)
+        results._fit_snapshot = _SyntheticControlFitSnapshot(
+            pivots=pivots,
+            specs=list(specs),
+            outcome=outcome,
+            all_periods=list(all_periods),
+            pre_periods=list(pre_periods),
+            post_periods=list(post_periods),
+            donor_ids=list(donor_ids),
+            treated_id=treated_id,
+            standardize=self.standardize,
+            v_method=self.v_method,
+            custom_v=(
+                None if self.custom_v is None else np.array(self.custom_v, dtype=float, copy=True)
+            ),
+            n_starts=self.n_starts,
+            seed=self.seed,
+            optimizer_options=(
+                None if self.optimizer_options is None else dict(self.optimizer_options)
+            ),
+            inner_max_iter=self.inner_max_iter,
+            inner_min_decrease=self.inner_min_decrease,
         )
-        # Reserved for PR-2 (placebo inference) / PR-3 (conformal). Set as plain
-        # attributes so dataclasses.asdict/fields cannot reach them.
-        results._placebo_gaps = None
-        results._rmspe_ratio = None
-        results._fit_snapshot = None
+        # Persist whether the treated unit's own fit reached a valid optimum — both
+        # the inner Frank-Wolfe weight solve AND (on the nested path) the outer V
+        # search. in_space_placebo() fails closed otherwise: ranking a statistic from
+        # a truncated / under-optimized treated fit would not be a valid ADH 2010
+        # §2.4 permutation (and an under-optimized V is anti-conservative).
+        results._fit_converged = bool(converged and outer_converged)
 
         self.results_ = results
         self.is_fitted_ = True
@@ -1022,17 +1069,20 @@ def _outer_solve_V(
     optimizer_options: Optional[Dict[str, Any]],
     inner_max_iter: int,
     inner_min_decrease: float,
-) -> Tuple[np.ndarray, np.ndarray, bool, float]:
+) -> Tuple[np.ndarray, np.ndarray, bool, float, bool]:
     """Data-driven V selection: minimize pre-period outcome MSPE of W*(V).
 
     Multistart Nelder-Mead over ``theta`` (V = softmax(theta)) plus a derivative-free
-    Powell polish from the best point. Returns ``(v_star, w_star, converged, mspe)``.
+    Powell polish from the best point. Returns
+    ``(v_star, w_star, inner_converged, mspe, outer_converged)``.
     """
     k = X1s.shape[0]
     if k == 1:
+        # Single predictor: V is fixed (no outer search), so outer convergence is
+        # trivially satisfied.
         v = np.array([1.0])
         w, converged = _inner_solve_W(X1s, X0s, v, inner_max_iter, inner_min_decrease)
-        return v, w, converged, float(np.mean((Z1 - Z0 @ w) ** 2))
+        return v, w, converged, float(np.mean((Z1 - Z0 @ w) ** 2)), True
 
     # Track inner Frank-Wolfe non-convergence across ALL intermediate evaluations so
     # the outer search cannot silently rank truncated W*(V) solves (codex). `_inner_solve_W`
@@ -1077,26 +1127,41 @@ def objective(theta: np.ndarray) -> float:
     _st["total"] += start_total
     _st["nonconv"] += start_nonconv
 
+    # Track convergence of the SELECTED (lowest-objective) incumbent, NOT "any start
+    # converged" — a non-winning start's success says nothing about whether the V we
+    # actually return is a valid optimum. ``best_success`` follows ``best_x``.
     best_x: np.ndarray = starts[0]
     best_fun = np.inf
-    outer_converged = False
+    best_success = False
     for theta0 in starts:
         res = minimize(objective, theta0, method="Nelder-Mead", options=nm_options)
-        outer_converged = outer_converged or bool(res.success)
         if res.fun < best_fun:
             best_fun = float(res.fun)
             best_x = res.x
+            best_success = bool(res.success)
 
-    # Derivative-free polish from the incumbent (best-of, mirrors R optimx).
+    # Derivative-free polish from the incumbent (best-of, mirrors R optimx). If the
+    # polish improves on the incumbent it becomes the selected solution (carry its
+    # success); if it does not improve but converged AT the incumbent's objective
+    # level, that validates the incumbent as an optimum.
     res_p = minimize(objective, best_x, method="Powell", options=powell_options)
-    outer_converged = outer_converged or bool(res_p.success)
     if res_p.fun < best_fun:
         best_fun = float(res_p.fun)
         best_x = res_p.x
-
-    # Surface a silent under-optimized V: if neither the multistart Nelder-Mead nor
-    # the Powell polish reported success (e.g. optimizer_options={"maxiter": 1}), the
-    # selected V / donor weights / ATT may be sub-optimal.
+        best_success = bool(res_p.success)
+    elif bool(res_p.success) and np.isclose(res_p.fun, best_fun, rtol=1e-5, atol=1e-8):
+        # Powell did not improve but converged back AT the incumbent (same objective
+        # within tolerance) -> the incumbent is a validated optimum. A Powell run that
+        # "succeeds" at a STRICTLY WORSE objective ended elsewhere and says nothing
+        # about the selected incumbent's validity, so it must NOT flip best_success
+        # (which would silently admit an under-optimized V from a success=False start).
+        best_success = True
+    outer_converged = best_success
+
+    # Surface a silent under-optimized V: if the SELECTED outer solution did not
+    # converge (e.g. optimizer_options={"maxiter": 1}, or the lowest-objective
+    # incumbent came from a truncated run), the selected V / donor weights / ATT may
+    # be sub-optimal.
     if not outer_converged:
         warnings.warn(
             "Outer V-search (Nelder-Mead / Powell) did not converge; the selected "
@@ -1125,7 +1190,7 @@ def objective(theta: np.ndarray) -> float:
     v_star = _softmax(best_x)
     w_star, converged = _inner_solve_W(X1s, X0s, v_star, inner_max_iter, inner_min_decrease)
     mspe = float(np.mean((Z1 - Z0 @ w_star) ** 2))
-    return v_star, w_star, converged, mspe
+    return v_star, w_star, converged, mspe, outer_converged
 
 
 def _compute_gap_path(
@@ -1141,3 +1206,107 @@ def _compute_gap_path(
     synthetic = donor_block @ w
     gaps = treated_series - synthetic
     return {period: float(g) for period, g in zip(all_periods, gaps)}
+
+
+# =============================================================================
+# in-space placebo permutation inference (ADH 2010 §2.4) — used by
+# SyntheticControlResults.in_space_placebo() via function-level import
+# =============================================================================
+
+
+def _mspe(gap_path: Dict[Any, float], periods: List[Any]) -> float:
+    """Mean squared prediction error of ``gap_path`` over ``periods``."""
+    if not periods:
+        return float("nan")
+    g = np.array([gap_path[p] for p in periods], dtype=float)
+    return float(np.mean(g**2))
+
+
+def _rmspe_ratio(pre_gaps: np.ndarray, post_gaps: np.ndarray, scale: float) -> float:
+    """Post/pre RMSPE ratio — the in-space placebo test statistic (ADH 2010 §2.4).
+
+    Returns ``RMSPE_post / RMSPE_pre = sqrt(MSPE_post / MSPE_pre)`` (the root-scale
+    ratio, matching the ``rmspe_ratio`` name). ADH 2010 §3.4 reports the *MSPE*
+    ratio (the square of this); the two are monotone-equivalent on nonnegative
+    values, so the permutation rank and p-value are identical either way — only the
+    reported statistic's scale differs.
+
+    The pre-period MSPE denominator is floored at a scale-aware
+    ``1e-8 * max(scale, 1)**2`` (squared-outcome units) BEFORE the square root, so a
+    (near-)perfect pre-treatment fit (pre-MSPE → 0) yields a large-but-FINITE ratio
+    rather than ``inf``/``nan`` (which would corrupt the permutation rank).
+    ``scale`` is the magnitude of the unit's pre-period outcomes. Mirrors the
+    ``_fit_tol`` poor-fit guard in ``fit()``.
+    """
+    pre_mspe = float(np.mean(pre_gaps**2)) if pre_gaps.size else float("nan")
+    post_mspe = float(np.mean(post_gaps**2)) if post_gaps.size else float("nan")
+    floor = 1e-8 * max(float(scale), 1.0) ** 2
+    return float(np.sqrt(post_mspe / max(pre_mspe, floor)))
+
+
+def _placebo_fit_unit(
+    snap: _SyntheticControlFitSnapshot,
+    unit: Any,
+    donor_pool: List[Any],
+    n_starts: int,
+) -> Optional[Tuple[Dict[Any, float], float]]:
+    """Refit a synthetic control for one (pseudo-)treated ``unit`` vs ``donor_pool``.
+
+    Reuses the exact predictor build / standardization / weight solve / gap-path
+    path as ``fit()`` — none of those helpers reads estimator (``self``) state, so
+    the refit is driven entirely by the snapshot. The real treated unit is never in
+    ``donor_pool`` (the caller passes the other ``J−1`` donors). Returns
+    ``(gap_path, rmspe_ratio)``, or ``None`` when the weight solve does not converge
+    (the caller excludes such placebos from the permutation reference set).
+    Per-placebo ``UserWarning``s (poor fit, zero-variance row, non-convergence) are
+    suppressed here; the caller surfaces an aggregate count.
+    """
+    X1, X0, _ = _build_predictor_matrix(snap.pivots, snap.specs, unit, donor_pool)
+    # Belt-and-suspenders: fit() already gated non-finite outcomes over the full
+    # treated+donor panel, so a donor reassigned as pseudo-treated has finite cells.
+    if not (np.all(np.isfinite(X1)) and np.all(np.isfinite(X0))):
+        return None
+    X1s, X0s, _ = _standardize(X1, X0, snap.standardize)
+    Y = snap.pivots[snap.outcome]
+    Z1 = Y.loc[snap.pre_periods, unit].to_numpy(dtype=float)
+    Z0 = Y.loc[snap.pre_periods, donor_pool].to_numpy(dtype=float)
+    # ``outer_converged`` is trivially True when there is no outer V search (custom
+    # V or a single-donor degenerate pool).
+    outer_converged = True
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        if snap.v_method == "custom":
+            # custom_v length == k (= len(specs)) is unit-agnostic, so it stays
+            # valid for every placebo; it was finiteness/non-negativity-checked at
+            # construction. Just trace-normalize (matches _prepare_custom_v).
+            v = np.asarray(snap.custom_v, dtype=float).ravel()
+            v = v / v.sum()
+            w, converged = _inner_solve_W(X1s, X0s, v, snap.inner_max_iter, snap.inner_min_decrease)
+        elif len(donor_pool) == 1:
+            # Degenerate: a single donor forces w = [1.0]; V is irrelevant.
+            w, converged = np.array([1.0]), True
+        else:
+            _, w, converged, _, outer_converged = _outer_solve_V(
+                X1,
+                X0,
+                X1s,
+                X0s,
+                Z1,
+                Z0,
+                n_starts,
+                snap.seed,
+                snap.optimizer_options,
+                snap.inner_max_iter,
+                snap.inner_min_decrease,
+            )
+    # Exclude a placebo whose fit is not a valid optimum — a truncated inner W OR an
+    # under-optimized outer V search. An under-optimized placebo V fits the pre-period
+    # worse, shrinking its RMSPE ratio and biasing the permutation p-value
+    # anti-conservatively, so such placebos must not silently enter the rank.
+    if not (converged and outer_converged):
+        return None
+    gap_path = _compute_gap_path(Y, w, unit, donor_pool, snap.all_periods)
+    pre_gaps = np.array([gap_path[p] for p in snap.pre_periods], dtype=float)
+    post_gaps = np.array([gap_path[p] for p in snap.post_periods], dtype=float)
+    scale = float(np.max(np.abs(Z1))) if Z1.size else 0.0
+    return gap_path, _rmspe_ratio(pre_gaps, post_gaps, scale)
diff --git a/diff_diff/synthetic_control_results.py b/diff_diff/synthetic_control_results.py
index e36c8352..dff829f0 100644
--- a/diff_diff/synthetic_control_results.py
+++ b/diff_diff/synthetic_control_results.py
@@ -5,12 +5,15 @@
 ``synthetic_control.py`` to mirror the TROP estimator/results split.
 
 The classic synthetic control of Abadie, Diamond & Hainmueller (2010) produces a
-gap path and donor/predictor weights but **no analytical standard error** — the
-paper proposes permutation/placebo inference instead (a later PR). Accordingly
-``se``/``t_stat``/``p_value``/``conf_int`` are always NaN on this object; the
-point estimate ``att`` (average post-period gap) is the reported quantity.
+gap path and donor/predictor weights but **no analytical standard error**.
+Accordingly ``se``/``t_stat``/``p_value``/``conf_int`` are always NaN on this
+object; the point estimate ``att`` (average post-period gap) is the reported
+quantity. Significance comes from in-space placebo permutation inference via
+:meth:`SyntheticControlResults.in_space_placebo` (a separate ``placebo_p_value``
+field, not the NaN ``p_value``).
 """
 
+import warnings
 from dataclasses import dataclass, field
 from typing import Any, Dict, List, Optional, Tuple
 
@@ -22,6 +25,40 @@
 __all__ = ["SyntheticControlResults"]
 
 
+@dataclass
+class _SyntheticControlFitSnapshot:
+    """Panel state retained for post-hoc in-space placebo refits.
+
+    Holds everything ``SyntheticControlResults.in_space_placebo()`` needs to
+    refit ANY donor as the pseudo-treated unit without re-reading the original
+    DataFrame. Built in ``SyntheticControl.fit()`` and excluded from pickling by
+    ``SyntheticControlResults.__getstate__`` (it retains the full treated+donor
+    outcome/predictor panel — a privacy/size hazard if serialized).
+
+    ``specs`` is annotated ``List[Any]`` rather than ``List[_PredictorSpec]`` to
+    avoid an import cycle (``_PredictorSpec`` lives in ``synthetic_control.py``,
+    which imports this module). ``donor_ids`` is an ORDERED list so the placebo
+    iteration order — and therefore the rank / p-value — is deterministic.
+    """
+
+    pivots: Dict[str, pd.DataFrame]
+    specs: List[Any]
+    outcome: str
+    all_periods: List[Any]
+    pre_periods: List[Any]
+    post_periods: List[Any]
+    donor_ids: List[Any]
+    treated_id: Any
+    standardize: str
+    v_method: str
+    custom_v: Optional[Any]
+    n_starts: int
+    seed: Optional[int]
+    optimizer_options: Optional[Dict[str, Any]]
+    inner_max_iter: int
+    inner_min_decrease: float
+
+
 @dataclass
 class SyntheticControlResults:
     """
@@ -82,8 +119,22 @@ class SyntheticControlResults:
         ``"std"`` (per-row SD scaling) or ``"none"``.
     alpha : float
         Significance level recorded for downstream (placebo) inference.
+    rmspe_ratio : float
+        The treated unit's post/pre RMSPE ratio = ``sqrt(MSPE_post / MSPE_pre)`` —
+        the in-space placebo test statistic (ADH 2010 §2.4), computed at fit time.
+    placebo_p_value : float
+        In-space placebo permutation p-value (``rank / (n_placebos + 1)``), NaN
+        until :meth:`in_space_placebo` is run. SEPARATE from the (always-NaN)
+        analytical ``p_value``; ``is_significant`` stays bound to ``p_value``.
+    n_placebos, n_failed : int
+        Donor placebos that entered the permutation reference set / were excluded
+        for non-convergence. Both 0 until :meth:`in_space_placebo` is run.
     survey_metadata : Any, optional
         Reserved; always None in this release.
+
+    Significance for classic SCM comes from :meth:`in_space_placebo` (opt-in
+    in-space placebo permutation inference); :meth:`get_placebo_df` returns the
+    per-unit RMSPE-ratio table used for the rank.
     """
 
     att: float
@@ -108,13 +159,49 @@ class SyntheticControlResults:
     alpha: float = 0.05
     mspe_v: Optional[float] = None
     survey_metadata: Optional[Any] = field(default=None)
-
-    # Reserved for PR-2 (placebo inference) / PR-3 (conformal). These are plain
-    # (un-annotated) class attributes, NOT dataclass fields, so dataclasses.fields()
-    # and dataclasses.asdict() cannot reach them; fit() sets them per instance.
-    _placebo_gaps = None
-    _rmspe_ratio = None
-    _fit_snapshot = None
+    # In-space placebo permutation inference (Abadie-Diamond-Hainmueller 2010
+    # Section 2.4), populated by ``in_space_placebo()``. ``rmspe_ratio`` (the
+    # treated unit's post/pre RMSPE ratio) is computed at fit time; the rest stay
+    # at their no-inference defaults until a placebo run. NOTE: the permutation
+    # ``placebo_p_value`` is deliberately SEPARATE from ``p_value`` (which stays
+    # NaN) — it is not an analytical p-value, has no SE / t-stat, and does not
+    # flow through ``safe_inference``. ``is_significant`` likewise stays bound to
+    # the (NaN) ``p_value``, NOT ``placebo_p_value``.
+    placebo_p_value: float = np.nan
+    rmspe_ratio: float = np.nan
+    n_placebos: int = 0
+    n_failed: int = 0
+
+    def __post_init__(self) -> None:
+        # Internal state set per instance by ``fit()`` / ``in_space_placebo()``.
+        # Declared here (not as dataclass fields) so ``dataclasses.fields()`` /
+        # ``dataclasses.asdict()`` cannot reach the retained panel state.
+        # ``_fit_snapshot`` (full panel) and ``_placebo_gaps`` (per-unit gap paths)
+        # are panel-derived and nulled on pickle by ``__getstate__``; ``_placebo_df``
+        # holds the small per-unit aggregate table returned by ``get_placebo_df()``.
+        self._fit_snapshot: Optional[_SyntheticControlFitSnapshot] = None
+        self._placebo_gaps: Optional[Dict[Any, Dict[Any, float]]] = None
+        self._placebo_df: Optional[pd.DataFrame] = None
+        # Whether the treated unit's own inner Frank-Wolfe weight solve converged.
+        # in_space_placebo() fails closed when this is False: a truncated treated
+        # fit makes the ranked statistic (rmspe_ratio) not a valid SCM optimum.
+        self._fit_converged: bool = True
+
+    def __getstate__(self) -> Dict[str, Any]:
+        """Exclude panel-derived internal state from pickling.
+
+        ``_fit_snapshot`` retains the full treated+donor panel and ``_placebo_gaps``
+        the per-unit gap paths — both panel-derived, a privacy/size hazard if the
+        pickle is sent elsewhere. The scalar placebo fields (``placebo_p_value``,
+        ``rmspe_ratio``, ``n_placebos``, ``n_failed``) and the small ``_placebo_df``
+        aggregate table survive. An unpickled result keeps all public fields; a
+        diagnostic call that needs the snapshot (``in_space_placebo``) then raises a
+        ValueError directing the user to re-fit. Mirrors ``SyntheticDiDResults``.
+        """
+        state = self.__dict__.copy()
+        state["_fit_snapshot"] = None
+        state["_placebo_gaps"] = None
+        return state
 
     def __repr__(self) -> str:
         """Concise string representation."""
@@ -209,12 +296,51 @@ def summary(self, alpha: Optional[float] = None) -> str:
                 f"{'ATT (avg gap)':<15} {self.att:>12.4f} {'n/a':>12} " f"{'n/a':>10} {'n/a':>10}",
                 "-" * 75,
                 "",
-                "Inference: classic SCM has no analytical standard error.",
-                "Use permutation / placebo inference for significance testing",
-                "(Abadie-Diamond-Hainmueller 2010, Section 2.4).",
-                "=" * 75,
             ]
         )
+        # Three states: (1) placebo never run -> point to in_space_placebo();
+        # (2) run with a valid reference set -> show the permutation p-value;
+        # (3) run but infeasible (no placebo entered the rank, e.g. J<2 or all
+        # donors failed) -> say so explicitly rather than implying it was not run.
+        # ``_placebo_df is not None`` is the "attempted" signal (survives pickling).
+        placebo_attempted = self._placebo_df is not None
+        if placebo_attempted and np.isfinite(self.placebo_p_value):
+            # The classic analytical fields above stay n/a (no SE); this is the
+            # permutation p-value of the post/pre RMSPE ratio, p = rank/(n_placebos+1).
+            lines.extend(
+                [
+                    "In-space placebo permutation inference "
+                    "(Abadie-Diamond-Hainmueller 2010, Section 2.4):",
+                    f"{'  RMSPE ratio (post/pre):':<34} {self.rmspe_ratio:>10.4f}",
+                    f"{'  Permutation p-value:':<34} {self.placebo_p_value:>10.4f}",
+                    f"{'  Placebos in reference set:':<34} {self.n_placebos:>10d}"
+                    + (f"  ({self.n_failed} excluded)" if self.n_failed else ""),
+                    "",
+                    "(Analytical SE is still undefined for classic SCM; the "
+                    "p-value above is permutation-based.)",
+                    "=" * 75,
+                ]
+            )
+        elif placebo_attempted:
+            lines.extend(
+                [
+                    "In-space placebo permutation inference was attempted but "
+                    "produced no valid reference set",
+                    f"(0 placebos entered the rank; {self.n_failed} failed to "
+                    "converge). placebo_p_value is undefined — too few donors or "
+                    "all donor refits failed. Inspect get_placebo_df().",
+                    "=" * 75,
+                ]
+            )
+        else:
+            lines.extend(
+                [
+                    "Inference: classic SCM has no analytical standard error.",
+                    "Run in_space_placebo() for in-space permutation inference",
+                    "(Abadie-Diamond-Hainmueller 2010, Section 2.4).",
+                    "=" * 75,
+                ]
+            )
 
         return "\n".join(lines)
 
@@ -248,6 +374,13 @@ def to_dict(self) -> Dict[str, Any]:
             "treated_unit": self.treated_unit,
             "v_method": self.v_method,
             "standardize": self.standardize,
+            # In-space placebo permutation inference. rmspe_ratio is set at fit;
+            # placebo_p_value / n_placebos / n_failed stay at their no-inference
+            # defaults (NaN / 0) until in_space_placebo() runs.
+            "rmspe_ratio": self.rmspe_ratio,
+            "placebo_p_value": self.placebo_p_value,
+            "n_placebos": self.n_placebos,
+            "n_failed": self.n_failed,
         }
         if self.survey_metadata is not None:
             sm = self.survey_metadata
@@ -293,3 +426,265 @@ def get_weights_df(self) -> pd.DataFrame:
             [{"unit": unit, "weight": w} for unit, w in items],
             columns=["unit", "weight"],
         )
+
+    _PLACEBO_COLS = ["unit", "pre_mspe", "post_mspe", "rmspe_ratio", "is_treated", "status"]
+
+    def get_placebo_df(self) -> pd.DataFrame:
+        """
+        Get the in-space placebo distribution as a DataFrame (one row per unit).
+
+        This is a per-unit SUMMARY table (one row per unit), enough to reproduce
+        the permutation rank and a ratio-distribution plot — NOT the per-period
+        placebo gap paths needed for the classic "spaghetti" plot (those are
+        retained internally on ``_placebo_gaps`` for the successful placebos).
+        Columns: ``unit``, ``pre_mspe``, ``post_mspe``, ``rmspe_ratio``,
+        ``is_treated``, ``status`` (``"treated"`` / ``"placebo"`` / ``"failed"``).
+        The treated unit is always present as a single ``is_treated=True,
+        status="treated"`` row (its ratio is the original J-donor fit). After a
+        placebo run **that produced a reference set** (``>= 2`` donors AND a
+        converged treated fit), the table has ``n_donors + 1`` rows — every donor
+        appears, including those whose refit did not converge (``status="failed"``
+        with NaN metrics, excluded from the rank). In the degenerate / fail-closed
+        cases (fewer than 2 donors, or a treated fit that did not converge) the
+        placebo loop does not run, so only the treated row is returned.
+
+        Populated by :meth:`in_space_placebo`; the summary table is retained on
+        pickling, so it is still returned after a round-trip. Before any placebo
+        run — including on an unpickled result that never ran one — only the
+        treated row is returned.
+
+        Returns
+        -------
+        pandas.DataFrame
+        """
+        if self._placebo_df is not None:
+            return self._placebo_df.copy()
+        from diff_diff.synthetic_control import _mspe
+
+        pre = _mspe(self.gap_path, self.pre_periods)
+        post = _mspe(self.gap_path, self.post_periods)
+        return pd.DataFrame(
+            [
+                {
+                    "unit": self.treated_unit,
+                    "pre_mspe": pre,
+                    "post_mspe": post,
+                    "rmspe_ratio": self.rmspe_ratio,
+                    "is_treated": True,
+                    "status": "treated",
+                }
+            ],
+            columns=self._PLACEBO_COLS,
+        )
+
+    def in_space_placebo(
+        self,
+        n_starts: Optional[int] = None,
+    ) -> pd.DataFrame:
+        """
+        In-space placebo permutation inference (Abadie-Diamond-Hainmueller 2010,
+        Section 2.4).
+
+        Reassigns the treatment to each donor in turn, re-estimates a synthetic
+        control for that pseudo-treated donor against the OTHER donors, and ranks
+        the real treated unit's post/pre RMSPE ratio among all units. Populates
+        ``placebo_p_value``, ``n_placebos`` and ``n_failed`` on this object
+        (``rmspe_ratio`` — the treated unit's own ratio — is set at fit time) and
+        returns the placebo distribution via :meth:`get_placebo_df`.
+
+        The real treated unit is **excluded from every placebo's donor pool**: its
+        post-period outcome is treatment-contaminated, so allowing a placebo to
+        load weight on it would bias the placebo gap. The ranking set is therefore
+        the ``J+1`` units ``{treated} ∪ {J placebos}``, with each placebo fit
+        against the other ``J-1`` donors (this matches the standard
+        ``SCtools::generate.placebos`` construction). The post/pre RMSPE ratio
+        normalizes by pre-treatment fit, which obviates the pre-fit-cutoff
+        filtering of ADH Figures 5-7 (journal p. 502), so no pre-fit filter is
+        offered — every converged placebo enters the rank.
+
+        The permutation ``placebo_p_value`` is intentionally distinct from
+        ``p_value`` (which stays NaN — classic SCM has no analytical SE) and from
+        ``is_significant`` (which also stays bound to the NaN ``p_value``).
+
+        A placebo is **excluded** from the reference set (counted in ``n_failed``)
+        when its fit is not a valid optimum — EITHER its inner Frank-Wolfe weight
+        solve did not converge (a truncated ``W`` is unusable) OR its outer ``V``
+        search did not converge (an under-optimized ``V`` fits the pre-period worse,
+        shrinking its RMSPE ratio and biasing the permutation p-value
+        anti-conservatively). Each placebo refit **inherits the original fit's
+        ``optimizer_options`` / ``n_starts``**, so valid inference requires settings
+        adequate for the outer ``V`` search to converge: production defaults do;
+        with cheap settings, raise ``n_starts`` here or re-fit with a larger
+        ``optimizer_options['maxiter']`` (otherwise placebos are dropped as failed).
+        The treated unit's own fit is held to the same standard — if its inner OR
+        outer search did not converge, the whole run fails closed (see below).
+
+        Parameters
+        ----------
+        n_starts : int, optional
+            Override the multistart count for each placebo's nested V search.
+            Default None inherits the original fit's ``n_starts``. The placebo
+            loop is the cost driver (one outer V search per donor); lower it for a
+            faster, coarser scan.
+
+        Returns
+        -------
+        pandas.DataFrame
+            The placebo distribution (see :meth:`get_placebo_df`).
+
+        Raises
+        ------
+        ValueError
+            If the fit snapshot is unavailable (e.g. this result was unpickled).
+        """
+        if self._fit_snapshot is None:
+            raise ValueError(
+                "in_space_placebo() requires the fit snapshot on the results "
+                "object. This result appears to have been loaded from "
+                "serialization (which excludes the snapshot) or produced by an "
+                "older estimator version. Re-fit to enable in-space placebo "
+                "inference."
+            )
+        from diff_diff.synthetic_control import _mspe, _placebo_fit_unit
+
+        snap = self._fit_snapshot
+        donors = list(snap.donor_ids)
+        n_donors = len(donors)
+        n_starts_eff = snap.n_starts if n_starts is None else int(n_starts)
+
+        treated_pre = _mspe(self.gap_path, snap.pre_periods)
+        treated_post = _mspe(self.gap_path, snap.post_periods)
+        treated_ratio = self.rmspe_ratio
+
+        rows: List[Dict[str, Any]] = [
+            {
+                "unit": snap.treated_id,
+                "pre_mspe": treated_pre,
+                "post_mspe": treated_post,
+                "rmspe_ratio": treated_ratio,
+                "is_treated": True,
+                "status": "treated",
+            }
+        ]
+
+        # Fail closed when the treated unit's OWN fit did not converge at fit time
+        # (inner Frank-Wolfe weight solve OR outer V search): ranking a statistic
+        # from a truncated / under-optimized treated fit would not be a valid ADH
+        # 2010 §2.4 permutation (placebos already fail-closed on non-convergence, so
+        # the treated unit must too). ``_fit_converged`` folds both failure modes, so
+        # the remediation names the knobs for each.
+        if not self._fit_converged:
+            warnings.warn(
+                "In-space placebo skipped: the treated unit's own SCM fit did not "
+                "converge at fit time (inner Frank-Wolfe weight solve and/or outer V "
+                "search), so its RMSPE ratio is not a valid optimum to rank against "
+                "placebos. placebo_p_value is NaN — re-fit with a larger "
+                "inner_max_iter / looser inner_min_decrease (inner) and/or a larger "
+                "optimizer_options['maxiter'] / more n_starts (outer V search).",
+                UserWarning,
+                stacklevel=2,
+            )
+            self.placebo_p_value = np.nan
+            self.n_placebos = 0
+            self.n_failed = 0
+            self._placebo_gaps = {}
+            self._placebo_df = pd.DataFrame(rows, columns=self._PLACEBO_COLS)
+            return self._placebo_df.copy()
+
+        if n_donors < 2:
+            warnings.warn(
+                "In-space placebo inference requires at least 2 donors (each "
+                f"placebo is fit against the other donors); only {n_donors} "
+                "available. placebo_p_value is NaN.",
+                UserWarning,
+                stacklevel=2,
+            )
+            self.placebo_p_value = np.nan
+            self.n_placebos = 0
+            self.n_failed = 0
+            self._placebo_gaps = {}
+            self._placebo_df = pd.DataFrame(rows, columns=self._PLACEBO_COLS)
+            return self._placebo_df.copy()
+
+        if n_donors == 2:
+            warnings.warn(
+                "In-space placebo with 2 donors: each placebo is fit against a "
+                "single donor (degenerate weight w=[1]) with no V search, so the "
+                "permutation p-value is coarse (only 2 placebos enter the "
+                "reference set; the smallest attainable p-value is 1/3).",
+                UserWarning,
+                stacklevel=2,
+            )
+
+        placebo_gaps: Dict[Any, Dict[Any, float]] = {}
+        ranked_ratios: List[float] = []
+        n_failed = 0
+
+        for j in donors:
+            pool = [d for d in donors if d != j]
+            fitted = _placebo_fit_unit(snap, j, pool, n_starts_eff)
+            if fitted is None:
+                # Non-converged inner Frank-Wolfe weight solve (a truncated W is
+                # unusable for ranking): exclude from BOTH the numerator and the
+                # denominator (never penalize a truncated solve into the rank).
+                # Still record the donor with NaN metrics so get_placebo_df()
+                # returns the full treated + every-donor unit set.
+                n_failed += 1
+                rows.append(
+                    {
+                        "unit": j,
+                        "pre_mspe": np.nan,
+                        "post_mspe": np.nan,
+                        "rmspe_ratio": np.nan,
+                        "is_treated": False,
+                        "status": "failed",
+                    }
+                )
+                continue
+            gap_path_j, ratio_j = fitted
+            placebo_gaps[j] = gap_path_j
+            pre_j = _mspe(gap_path_j, snap.pre_periods)
+            post_j = _mspe(gap_path_j, snap.post_periods)
+            ranked_ratios.append(ratio_j)
+            rows.append(
+                {
+                    "unit": j,
+                    "pre_mspe": pre_j,
+                    "post_mspe": post_j,
+                    "rmspe_ratio": ratio_j,
+                    "is_treated": False,
+                    "status": "placebo",
+                }
+            )
+
+        n_placebos = len(ranked_ratios)
+        if n_placebos == 0:
+            warnings.warn(
+                "No in-space placebo entered the reference set (all donors "
+                f"failed to converge or were filtered out of {n_donors}); "
+                "placebo_p_value is NaN.",
+                UserWarning,
+                stacklevel=2,
+            )
+            p_value = np.nan
+        else:
+            # One-sided rank, treated unit included as the "+1". Ties counted via
+            # ``>=`` so the p-value is conservative.
+            rank = 1 + sum(1 for r in ranked_ratios if r >= treated_ratio)
+            p_value = rank / (n_placebos + 1)
+
+        if n_failed > 0:
+            warnings.warn(
+                f"{n_failed} of {n_donors} in-space placebos failed to converge "
+                "and were excluded from the permutation distribution; "
+                f"placebo_p_value uses the remaining {n_placebos}.",
+                UserWarning,
+                stacklevel=2,
+            )
+
+        self.placebo_p_value = float(p_value)
+        self.n_placebos = int(n_placebos)
+        self.n_failed = int(n_failed)
+        self._placebo_gaps = placebo_gaps
+        self._placebo_df = pd.DataFrame(rows, columns=self._PLACEBO_COLS)
+        return self._placebo_df.copy()
diff --git a/docs/api/synthetic_control.rst b/docs/api/synthetic_control.rst
index a9377898..506c7422 100644
--- a/docs/api/synthetic_control.rst
+++ b/docs/api/synthetic_control.rst
@@ -19,9 +19,11 @@ over the post periods.
 - Aggregate / few-unit comparative case studies (states, regions, countries).
 
 **Inference:** classic SCM has **no analytical standard error**. ``se``, ``t_stat``,
-``p_value`` and ``conf_int`` are NaN; ``att`` (the mean post-period gap) is the reported
-estimate. The paper's in-space placebo permutation inference (post/pre RMSPE-ratio,
-``rank/(J+1)`` p-value) is planned for a follow-up release.
+``p_value`` and ``conf_int`` are always NaN; ``att`` (the mean post-period gap) is the
+reported estimate. Significance comes from **in-space placebo permutation inference** via
+:meth:`~diff_diff.SyntheticControlResults.in_space_placebo` (post/pre RMSPE-ratio statistic,
+``placebo_p_value = rank/(n_placebos+1)``). This permutation p-value is a separate field
+from the (NaN) ``p_value``; ``is_significant`` stays bound to ``p_value``.
 
 **Distinct from** :class:`~diff_diff.SyntheticDiD` (Arkhangelsky et al. 2021), which adds
 time weights and ridge regularization; classic SCM uses **donor weights only** plus the
@@ -67,6 +69,8 @@ Results container for synthetic control estimation.
 
    .. autosummary::
 
+      ~SyntheticControlResults.in_space_placebo
+      ~SyntheticControlResults.get_placebo_df
       ~SyntheticControlResults.summary
       ~SyntheticControlResults.print_summary
       ~SyntheticControlResults.to_dict
@@ -155,6 +159,13 @@ Quick estimation with the convenience function::
     )
     print(f"ATT: {results.att:.3f}, pre-RMSPE: {results.pre_rmspe:.3f}")
 
+In-space placebo permutation inference (opt-in; refits one synthetic control per donor)::
+
+    placebo_df = results.in_space_placebo()       # reassigns treatment to each donor
+    print(f"placebo p-value: {results.placebo_p_value:.3f} "
+          f"(n_placebos={results.n_placebos})")    # p = rank/(n_placebos+1)
+    print(placebo_df)   # per-unit RMSPE-ratio table used for the permutation rank
+
 Supplying a fixed predictor-importance matrix (skips the outer V search)::
 
     import numpy as np
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 736cbd92..26420281 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -1984,7 +1984,7 @@ Classic synthetic control (donor/unit weights only) for a single treated unit, d
 - **Outer solve (`v_method="nested"`):** choose diagonal PSD `V` minimizing the pre-period **outcome** MSPE `mean((Z1 − Z0·W*(V))²)`. `v_method="custom"` skips the outer search and uses a user-supplied `custom_v` (trace-normalized).
 - **Effect:** gap path `α̂_1t = Y_1t − Σ_j w_j·Y_jt`; `att` = mean post-period gap; `pre_rmspe` = pre-period fit diagnostic.
 
-**Inference:** **No analytical standard error** (Section 2.4) — `se`/`t_stat`/`p_value`/`conf_int` are NaN. The paper proposes in-space placebo permutation inference with the post/pre RMSPE-ratio statistic and `rank/(J+1)` p-value (deferred to a follow-up PR; `_placebo_gaps`/`_rmspe_ratio` are reserved on the results object).
+**Inference:** **No analytical standard error** (Section 2.4) — `se`/`t_stat`/`p_value`/`conf_int` are always NaN. Significance comes from **in-space placebo permutation inference** via `SyntheticControlResults.in_space_placebo()`: reassign treatment to each donor, refit a synthetic control for it, and rank the treated unit's post/pre RMSPE ratio (`rmspe_ratio` = `RMSPE_post / RMSPE_pre` = `sqrt(MSPE_post / MSPE_pre)`) among all units; `placebo_p_value = rank / (n_placebos + 1)`, where `rank = 1 + #{placebos with ratio ≥ treated ratio}` — an **upper-tail rank test on the (unsigned) RMSPE-ratio statistic**, ties counted conservatively via `≥`. Because the ratio squares the gaps it is direction-agnostic: a large ratio signals an effect of *either* sign, so this is NOT a signed/one-directional ("one-sided") hypothesis test on the treatment-effect direction. ADH 2010 §3.4 reports the *MSPE* ratio (the square of `rmspe_ratio`); the two are monotone-equivalent, so the rank and p-value are identical — only the reported statistic's scale differs. `rmspe_ratio` (the treated statistic) is computed at fit time; `placebo_p_value` / `n_placebos` / `n_failed` are populated by the opt-in `in_space_placebo()` call. `get_placebo_df()` returns the per-unit ratio table used for the rank (the per-period placebo gap paths for a "spaghetti" plot are retained internally, not in this summary).
 
 **Notes / deviations:**
 - **Note:** The standardization divisor `divisor = sqrt(apply(cbind(X0,X1), 1, var))` (per-predictor SD over donors+treated, ddof=1) and the inner/outer optimizer are **not specified in ADH 2010** (which defers these numerics to Abadie & Gardeazabal 2003 App. B / the `Synth` software). The divisor is pinned from the R `Synth::synth` source; `solution.v` lives in this scaled predictor space, so the deterministic R-parity test feeds `custom_v` in the same scaled space.
@@ -1994,8 +1994,12 @@ Classic synthetic control (donor/unit weights only) for a single treated unit, d
 - **Deviation from R:** `standardize="none"` disables predictor standardization entirely; R `Synth` always scales by the predictor SD. Provided for diagnostics; changes the geometry of the `V` objective.
 - **Note:** predictor rows support only **equal-weight** linear combinations of pre-period values — `mean` (`k_s = 1/T0`), `sum` (`k_s = 1`), and per-period outcome lags (identity, a single `k_s = 1`). ADH (2010) §2.3 defines the general form `Ȳ_i^{K_m} = Σ_s k_s Y_is` with *arbitrary* weights `k_s`; this release does NOT accept user-supplied non-uniform `K_m` weight vectors (and `median` and other non-linear aggregations are intentionally excluded). The supported set still spans the standard `Synth::dataprep` `predictors.op` + `special.predictors` usage; arbitrary-weight `K_m` is a deferred extension.
 - **Deviation from R:** predictor/outcome **aggregation fails closed on any non-finite (NaN/inf) cell**, whereas R `Synth::dataprep` hardwires `na.rm=TRUE` (aggregating over the observed cells of a partially-missing window). The fail-closed contract is deliberate: na-dropping silently aggregates different period subsets across units, yielding incomparable predictors with no warning. The analyst must restrict `predictor_window` / `special_predictors` / `pre_period_outcomes` periods (and the outcome panel) to where each variable is observed; both partially- and fully-missing windows raise `ValueError`. Only the row *ordering* matches `dataprep`, not the missing-data handling.
+- **Note (in-space placebo donor pool):** the real treated unit is **excluded from every placebo's donor pool** — when donor `j` is pseudo-treated it is fit against the other `J−1` donors, never the real treated unit (whose post-period is treatment-contaminated). The ranking set is still the `J+1` units {treated} ∪ {J placebos}. ADH 2010 §2.4 does not spell out placebo donor-pool composition; this matches the standard `SCtools::generate.placebos` construction (rotate the pseudo-treated identity through the donor pool; the original treated unit is never re-added as a donor).
+- **Note (placebo failure handling):** a placebo is **excluded from both the numerator and the denominator** of the rank (never penalized into it) and tallied in `n_failed` when its fit is not a valid optimum — EITHER its **inner Frank-Wolfe weight solve** did not converge (a truncated `W` is unusable) OR its **outer `V` search** did not converge (an under-optimized `V` fits the pre-period worse, shrinking the RMSPE ratio and biasing the p-value anti-conservatively, so it must not silently enter the rank). The reported p-value uses the **effective** count `rank / (n_placebos + 1)`, where `n_placebos` is the number of placebos that entered the reference set. Failed donors still appear in `get_placebo_df()` (`status="failed"`, NaN metrics), so once a reference set is produced the table is the full treated + every-donor unit set (`n_donors + 1` rows). In the fail-closed cases the placebo loop does not run and only the treated row is returned: `J < 2` → `placebo_p_value` is NaN with a warning (no placebo distribution; `J == 2` warns the distribution is coarse), and a treated fit whose own **inner OR outer** search did not converge also fails closed (ranking a truncated / under-optimized treated statistic would not be a valid permutation). **Caveat:** each placebo refit inherits the original fit's `optimizer_options` / `n_starts`, so valid inference requires settings adequate for the outer `V` search to converge to a comparable-quality synthetic (production defaults do; cheap settings under-optimize placebo `V` and those placebos are dropped as failed — raise `n_starts` on `in_space_placebo()` or re-fit with a larger `optimizer_options['maxiter']`).
+- **Note (RMSPE-ratio floor):** the reported `rmspe_ratio = sqrt(MSPE_post / MSPE_pre)` floors the pre-period MSPE denominator at a scale-aware `1e-8 · max(|pre-outcomes|, 1)²` (before the square root) so a (near-)perfect pre-fit (`pre-MSPE → 0`) yields a large-but-FINITE ratio rather than `inf`/`nan` (which would corrupt the rank). Ties (`ratio_j ≥ treated_ratio`) are counted, making the p-value conservative. Mirrors the `_fit_tol` poor-fit guard.
+- **Note (placebo p-value is non-analytical):** `placebo_p_value` is deliberately a SEPARATE field from `p_value` (which stays NaN) — it is a permutation p-value with no SE / t-stat, so it does not flow through `safe_inference`. `is_significant` likewise stays bound to the (NaN) `p_value`, NOT `placebo_p_value`; a tool gating on `is_significant` will see `False` even when `placebo_p_value` is small. The reporting stack surfaces the placebo p-value through `estimator_native_diagnostics`, never the analytical headline.
 
-**Reference implementation:** authors' `Synth` package for R/MATLAB/Stata (`Synth::synth`). **R-parity anchor:** the Basque Country study (Abadie-Gardeazabal 2003, `data("basque")`) — published synthetic = region 10 (Cataluña) 0.851 + region 14 (Madrid) 0.149, `loss.v` 0.0089. Two-tier test (`tests/test_methodology_synthetic_control.py`): Tier-1 feeds R's `solution.v` via `custom_v` → donor weights match to atol 1e-3 (deterministic); Tier-2 checks the nested fit in a band.
+**Reference implementation:** authors' `Synth` package for R/MATLAB/Stata (`Synth::synth`); in-space placebo construction follows `SCtools::generate.placebos`. **R-parity anchor:** the Basque Country study (Abadie-Gardeazabal 2003, `data("basque")`) — published synthetic = region 10 (Cataluña) 0.851 + region 14 (Madrid) 0.149, `loss.v` 0.0089. Two-tier test (`tests/test_methodology_synthetic_control.py`): Tier-1 feeds R's `solution.v` via `custom_v` → donor weights match to atol 1e-3 (deterministic); Tier-2 checks the nested fit in a band.
 
 **Requirements checklist:**
 - [x] Donor weights on the unit simplex; exactly one treated unit, block assignment.
@@ -2003,7 +2007,7 @@ Classic synthetic control (donor/unit weights only) for a single treated unit, d
 - [x] Inner simplex-constrained weighted LS via `_sc_weight_fw` with diagonal PSD `V`.
 - [x] Outer nested `V` (pre-period outcome MSPE) + user-supplied `custom_v`.
 - [x] Gap path + pre-period RMSPE + predictor-balance table.
-- [x] No analytical SE (NaN inference); placebo permutation inference deferred to a follow-up PR.
+- [x] No analytical SE (NaN inference); in-space placebo permutation inference (`in_space_placebo()`, `rank/(n_placebos+1)`) with the real treated unit excluded from every placebo pool, effective-count denominator, and a scale-aware RMSPE-ratio floor.
 - [x] Predictor-leakage, absorbing-suffix/no-anticipation, empty-window, duplicate-label, and inner-non-convergence validation gates.
 
 ---
diff --git a/docs/methodology/REPORTING.md b/docs/methodology/REPORTING.md
index ea16530f..58058312 100644
--- a/docs/methodology/REPORTING.md
+++ b/docs/methodology/REPORTING.md
@@ -263,9 +263,16 @@ a library setting.
   pre-period), and routes sensitivity to `in_time_placebo()` +
   `sensitivity_to_zeta_omega()`. `TROPResults` surfaces factor-model
   diagnostics (`effective_rank`, `loocv_score`, selected `lambda_*`)
-  under `estimator_native_diagnostics`. `EfficientDiDResults` PT runs
-  through `EfficientDiD.hausman_pretest` (the estimator's native
-  PT-All vs PT-Post check).
+  under `estimator_native_diagnostics`. `SyntheticControlResults`
+  routes parallel-trends to the `scm_fit` analogue (`pre_rmspe`,
+  verdict `design_enforced_pt`) and surfaces `pre_rmspe`, donor-weight
+  concentration, and the in-space placebo permutation p-value under
+  `estimator_native_diagnostics` — the placebo block is populated only
+  when the caller has already run `in_space_placebo()` (opt-in; DR never
+  triggers the per-donor refit loop implicitly), and it omits
+  HonestDiD-style `sensitivity` (significance IS the placebo).
+  `EfficientDiDResults` PT runs through `EfficientDiD.hausman_pretest`
+  (the estimator's native PT-All vs PT-Post check).
 
 - **Note:** Pre-trends verdict is a three-bin heuristic, not a field
   convention. DR maps the joint p-value as follows:
diff --git a/tests/test_business_report.py b/tests/test_business_report.py
index 940ca1ef..d5668e95 100644
--- a/tests/test_business_report.py
+++ b/tests/test_business_report.py
@@ -24,6 +24,7 @@
 from unittest.mock import patch
 
 import numpy as np
+import pandas as pd
 import pytest
 
 import diff_diff as dd
@@ -39,6 +40,7 @@
     generate_did_data,
     generate_factor_data,
     generate_staggered_data,
+    synthetic_control,
 )
 from diff_diff.business_report import BUSINESS_REPORT_SCHEMA_VERSION
 
@@ -111,6 +113,38 @@ def sdid_fit():
     return sdid, fdf
 
 
+@pytest.fixture  # function-scoped: some tests run in_space_placebo(), which mutates the result
+def scm_fit():
+    rng = np.random.default_rng(0)
+    T, T0, n = 8, 6, 4
+    years = list(range(2000, 2000 + T))
+    donors = {
+        j: rng.normal(10, 2) + rng.normal(0, 0.3) * np.arange(T) + rng.normal(0, 0.15, T)
+        for j in range(n)
+    }
+    treated = 0.6 * donors[0] + 0.4 * donors[1] + rng.normal(0, 0.08, T)
+    treated[T0:] += 3.0
+    rows = []
+    for j in range(n):
+        for t in range(T):
+            rows.append({"unit": f"d{j}", "year": years[t], "y": donors[j][t], "treated": 0})
+    for t in range(T):
+        rows.append({"unit": "treated", "year": years[t], "y": treated[t], "treated": int(t >= T0)})
+    df = pd.DataFrame(rows)
+    res = synthetic_control(
+        df,
+        "y",
+        "treated",
+        "unit",
+        "year",
+        seed=0,
+        n_starts=1,
+        optimizer_options={"maxiter": 50},
+        inner_min_decrease=1e-3,
+    )
+    return res, df
+
+
 @pytest.fixture(scope="module")
 def edid_fit():
     from diff_diff import EfficientDiD
@@ -726,6 +760,71 @@ class TripleDifferenceResults:
         assert "DDD" in desc
         assert "Ortiz-Villavicencio" in desc or "2025" in desc
 
+
+class TestSCMBusinessReport:
+    """Classic SCM routes through BusinessReport with the fit-based assumption
+    block, native-diagnostics robustness, and an ADH 2010 attribution."""
+
+    def test_scm_assumption_block_is_fit_based(self, scm_fit):
+        res, _ = scm_fit
+        assumption = BusinessReport(res, auto_diagnostics=False).to_dict()["assumption"]
+        # Distinct from SDiD's "synthetic_fit" — classic SCM is a donor-weighted match.
+        assert assumption["parallel_trends_variant"] == "scm_fit"
+        desc = assumption["description"].lower()
+        assert "placebo" in desc  # significance is placebo-based, not analytical SE
+        assert "donor-weighted" in desc or "donor" in desc
+
+    def test_scm_report_is_json_serializable(self, scm_fit):
+        # SCM's analytical p_value is NaN; the headline is_significant must be a
+        # plain bool (not numpy bool_) so the AI-legible schema serializes.
+        res, _ = scm_fit
+        schema = BusinessReport(res, auto_diagnostics=True).to_dict()
+        json.dumps(schema)  # must not raise
+        assert isinstance(schema["headline"]["is_significant"], bool)
+
+    def test_scm_robustness_uses_native_diagnostics(self, scm_fit):
+        # SCM omits HonestDiD "sensitivity", so robustness must come from the
+        # estimator-native diagnostics block, which must be populated and ran.
+        res, _ = scm_fit
+        diag = BusinessReport(res, auto_diagnostics=True).to_dict()["diagnostics"]
+        assert diag["status"] == "ran"
+        native = diag["schema"]["estimator_native_diagnostics"]
+        assert native["estimator"] == "SyntheticControl"
+        assert isinstance(native["pre_rmspe"], float)
+
+    def test_scm_appendix_attributes_adh_2010(self, scm_fit):
+        res, _ = scm_fit
+        blob = json.dumps(BusinessReport(res, auto_diagnostics=False).to_dict()).lower()
+        assert "abadie" in blob and "2010" in blob
+
+    def test_scm_rejects_honest_did_passthrough(self, scm_fit):
+        res, _ = scm_fit
+        with pytest.raises(ValueError, match="estimator_native_diagnostics"):
+            BusinessReport(res, honest_did_results=object(), auto_diagnostics=False)
+
+    def test_scm_target_parameter_is_named(self, scm_fit):
+        # BR must name SCM's headline estimand (single-unit mean post-treatment gap),
+        # not fall back to "ATT (unrecognized result class)".
+        res, _ = scm_fit
+        tp = BusinessReport(res, auto_diagnostics=False).to_dict()["target_parameter"]
+        assert "unrecognized" not in tp["name"].lower()
+        assert "scm" in tp["name"].lower() or "synthetic control" in tp["name"].lower()
+
+    def test_scm_robustness_block_surfaces_native_fields(self, scm_fit):
+        # The top-level robustness block must surface SCM-native content (pre_rmspe,
+        # weight concentration, the placebo result) rather than the SDiD-shaped
+        # pre_treatment_fit=None left over from the generic lift.
+        res, _ = scm_fit
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            res.in_space_placebo()
+            rob = BusinessReport(res, auto_diagnostics=True).to_dict()["robustness"]
+        native = rob["estimator_native"]
+        assert native["estimator"] == "SyntheticControl"
+        assert isinstance(native["pre_rmspe"], float)
+        assert "weight_concentration" in native
+        assert native["in_space_placebo"]["n_placebos"] == res.n_placebos
+
     def test_staggered_triple_diff_assumption_uses_ddd_not_generic_pt(self):
         class StaggeredTripleDiffResults:
             pass
diff --git a/tests/test_diagnostic_report.py b/tests/test_diagnostic_report.py
index 32e7a50d..af1ef4c3 100644
--- a/tests/test_diagnostic_report.py
+++ b/tests/test_diagnostic_report.py
@@ -20,6 +20,7 @@
 from unittest.mock import patch
 
 import numpy as np
+import pandas as pd
 import pytest
 
 import diff_diff as dd
@@ -34,6 +35,7 @@
     generate_did_data,
     generate_factor_data,
     generate_staggered_data,
+    synthetic_control,
 )
 from diff_diff.diagnostic_report import (
     DIAGNOSTIC_REPORT_SCHEMA_VERSION,
@@ -135,6 +137,45 @@ def sdid_fit():
     return sdid, fdf
 
 
+def _scm_panel(n_donors=4, T=8, T0=6, effect=3.0, seed=0):
+    """Small balanced SCM panel: treated = convex mix of two donors + post effect."""
+    rng = np.random.default_rng(seed)
+    years = list(range(2000, 2000 + T))
+    donors = {
+        j: rng.normal(10, 2) + rng.normal(0, 0.3) * np.arange(T) + rng.normal(0, 0.15, T)
+        for j in range(n_donors)
+    }
+    treated = 0.6 * donors[0] + 0.4 * donors[1] + rng.normal(0, 0.08, T)
+    treated[T0:] += effect
+    rows = []
+    for j in range(n_donors):
+        for t in range(T):
+            rows.append({"unit": f"d{j}", "year": years[t], "y": donors[j][t], "treated": 0})
+    for t in range(T):
+        rows.append({"unit": "treated", "year": years[t], "y": treated[t], "treated": int(t >= T0)})
+    return pd.DataFrame(rows)
+
+
+@pytest.fixture  # function-scoped: some tests run in_space_placebo(), which mutates the result
+def scm_fit():
+    warnings.filterwarnings("ignore")
+    df = _scm_panel(n_donors=4)
+    # Cheap optimizer settings keep the pure-Python fixture fast; the report path,
+    # not solver accuracy, is what these tests exercise.
+    res = synthetic_control(
+        df,
+        "y",
+        "treated",
+        "unit",
+        "year",
+        seed=0,
+        n_starts=1,
+        optimizer_options={"maxiter": 50},
+        inner_min_decrease=1e-3,
+    )
+    return res, df
+
+
 # ---------------------------------------------------------------------------
 # Schema contract
 # ---------------------------------------------------------------------------
@@ -2005,6 +2046,141 @@ def test_sdid_does_not_call_honest_did(self, sdid_fit):
             mock.assert_not_called()
 
 
+class TestSCMNative:
+    """Classic SCM routes to the fit-based PT analogue + native diagnostics."""
+
+    def test_scm_applicable_checks(self, scm_fit):
+        res, _ = scm_fit
+        applicable = DiagnosticReport(res).applicable_checks
+        assert "estimator_native" in applicable
+        assert "parallel_trends" in applicable  # design-enforced fit analogue
+        # SCM is not HonestDiD/heterogeneity territory (one treated unit).
+        assert "sensitivity" not in applicable
+        assert "heterogeneity" not in applicable
+
+    def test_scm_pt_uses_scm_fit_method(self, scm_fit):
+        res, _ = scm_fit
+        pt = DiagnosticReport(res).to_dict()["parallel_trends"]
+        assert pt["method"] == "scm_fit"
+        assert pt["verdict"] == "design_enforced_pt"
+        assert isinstance(pt.get("pre_treatment_fit_rmse"), float)
+
+    def test_scm_native_section_populated(self, scm_fit):
+        res, _ = scm_fit
+        native = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"]
+        assert native["status"] == "ran"
+        assert native["estimator"] == "SyntheticControl"
+        assert isinstance(native["pre_rmspe"], float)
+        assert "herfindahl" in native["weight_concentration"]
+        # Placebo is opt-in: NOT auto-run inside the report.
+        assert native["in_space_placebo"]["status"] == "not_run"
+        # In-time placebo / leave-one-out are ADH 2015 (not implemented here).
+        assert "in_time_placebo" not in native and "leave_one_out" not in native
+
+    def test_scm_native_surfaces_placebo_after_optin_run(self, scm_fit):
+        res, _ = scm_fit
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            res.in_space_placebo()
+            native = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"]
+        block = native["in_space_placebo"]
+        assert block["n_placebos"] == res.n_placebos
+        assert block["placebo_p_value"] == pytest.approx(res.placebo_p_value)
+
+    def test_scm_does_not_call_honest_did(self, scm_fit):
+        """HonestDiD sensitivity should NOT run on SCM (fit-based / native path)."""
+        res, _ = scm_fit
+        with patch("diff_diff.honest_did.HonestDiD.sensitivity_analysis") as mock:
+            DiagnosticReport(res).to_dict()
+            mock.assert_not_called()
+
+    def test_scm_significance_not_marked_done_until_placebo_run(self, scm_fit):
+        """SCM significance is the OPT-IN in-space placebo, which DR never runs.
+
+        Unlike SDiD/TROP (whose native block IS the sensitivity work), SCM's
+        native block only reports the placebo as ``not_run``, so the in-space
+        placebo next-step must persist — significance is not complete merely
+        because the native diagnostics ran.
+        """
+        res, _ = scm_fit  # the fixture does NOT run the placebo
+        schema = DiagnosticReport(res).to_dict()
+        labels = " ".join(s.get("label", "") for s in schema.get("next_steps", [])).lower()
+        assert "placebo" in labels  # the significance recommendation still surfaces
+
+    def test_scm_placebo_step_completes_after_run(self, scm_fit):
+        """Once the opt-in placebo has been run, DR stops recommending it."""
+        res, _ = scm_fit
+        before = DiagnosticReport(res).to_dict()["next_steps"]
+        assert any("placebo" in s.get("label", "").lower() for s in before)
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            res.in_space_placebo()  # opt-in significance procedure now done
+        after = DiagnosticReport(res).to_dict()["next_steps"]
+        assert not any("placebo" in s.get("label", "").lower() for s in after)
+
+    def test_scm_rejects_precomputed_parallel_trends_and_sensitivity(self, scm_fit):
+        # Like SDiD/TROP, SCM computes its PT verdict internally (the scm_fit
+        # design-enforced pre-fit), so a precomputed parallel_trends p-value payload
+        # is methodology-incompatible and rejected — as is sensitivity (no HonestDiD
+        # analogue). SCM's PT still runs natively (scm_fit) without a precomputed input.
+        res, _ = scm_fit
+        with pytest.raises(ValueError, match="estimator_native_diagnostics"):
+            DiagnosticReport(res, precomputed={"parallel_trends": {"joint_p_value": 0.5}})
+        with pytest.raises(ValueError, match="estimator_native_diagnostics"):
+            DiagnosticReport(res, precomputed={"sensitivity": {"breakdown_M": 1.0}})
+        # Without a precomputed input, the native scm_fit PT verdict still renders.
+        assert DiagnosticReport(res).to_dict()["parallel_trends"]["method"] == "scm_fit"
+
+    def test_scm_pt_prose_is_scm_specific_not_sdid(self, scm_fit):
+        # The design_enforced_pt prose must describe SCM's donor-weighted, single-
+        # treated-unit, placebo-based identification — NOT SDiD's weighted-PT analogue.
+        res, _ = scm_fit
+        interp = DiagnosticReport(res).to_dict()["overall_interpretation"].lower()
+        assert "in-space placebo" in interp or "donor-weighted" in interp
+        assert "sdid" not in interp
+
+    def test_scm_target_parameter_is_named(self, scm_fit):
+        res, _ = scm_fit
+        tp = DiagnosticReport(res).to_dict()["target_parameter"]
+        assert "unrecognized" not in tp["name"].lower()
+        assert "scm" in tp["name"].lower() or "synthetic control" in tp["name"].lower()
+
+    def test_scm_generic_placebo_section_points_to_native(self, scm_fit):
+        # The generic ``placebo`` section must not claim placebo is "not yet
+        # implemented" for SCM — its in-space placebo IS implemented and surfaced
+        # under estimator_native_diagnostics (avoids contradictory report state).
+        res, _ = scm_fit
+        reason = (DiagnosticReport(res).to_dict()["placebo"].get("reason") or "").lower()
+        assert "not yet implemented" not in reason
+        assert "in_space_placebo" in reason or "estimator_native" in reason
+
+    def test_scm_native_marks_infeasible_placebo(self):
+        # An attempted-but-infeasible placebo (here J<2) surfaces an explicit
+        # status="infeasible" + reason in _scm_native, not a bare NaN p-value.
+        rng = np.random.default_rng(0)
+        T, T0 = 8, 6
+        years = list(range(2000, 2000 + T))
+        d0 = rng.normal(10, 2, T)
+        treated = d0 + rng.normal(0, 0.1, T)
+        treated[T0:] += 3.0
+        rows = [{"unit": "d0", "year": years[t], "y": d0[t], "treated": 0} for t in range(T)]
+        rows += [
+            {"unit": "treated", "year": years[t], "y": treated[t], "treated": int(t >= T0)}
+            for t in range(T)
+        ]
+        df = pd.DataFrame(rows)
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            res = synthetic_control(
+                df, "y", "treated", "unit", "year", seed=0, n_starts=1, inner_min_decrease=1e-3
+            )
+            res.in_space_placebo()  # only 1 donor -> infeasible
+            native = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"]
+        block = native["in_space_placebo"]
+        assert block["status"] == "infeasible"
+        assert np.isnan(block["placebo_p_value"]) and "reason" in block
+
+
 # ---------------------------------------------------------------------------
 # Error handling
 # ---------------------------------------------------------------------------
diff --git a/tests/test_methodology_synthetic_control.py b/tests/test_methodology_synthetic_control.py
index 00507bd8..074ad896 100644
--- a/tests/test_methodology_synthetic_control.py
+++ b/tests/test_methodology_synthetic_control.py
@@ -24,6 +24,8 @@
 from __future__ import annotations
 
 import json
+import pickle
+import warnings
 from pathlib import Path
 
 import numpy as np
@@ -742,8 +744,13 @@ def test_result_accessors_render():
     assert set(gdf.columns) == {"period", "gap", "phase"}
     wdf = res.get_weights_df()
     assert list(wdf.columns) == ["unit", "weight"]
-    # Reserved PR-2/3 attributes present and None.
-    assert res._placebo_gaps is None and res._rmspe_ratio is None and res._fit_snapshot is None
+    # PR-2: fit() populates the placebo refit snapshot and the treated unit's
+    # RMSPE ratio; the placebo reference distribution is not computed until
+    # in_space_placebo() runs (placebo_p_value stays NaN, gaps/df unset).
+    assert res._fit_snapshot is not None
+    assert res._placebo_gaps is None and res._placebo_df is None
+    assert np.isfinite(res.rmspe_ratio)
+    assert np.isnan(res.placebo_p_value) and res.n_placebos == 0
 
 
 def test_inferred_post_matches_explicit():
@@ -836,3 +843,440 @@ def test_basque_tier2_nested_band():
     top2 = [u for u, _ in sorted(res.donor_weights.items(), key=lambda kv: -kv[1])[:2]]
     assert set(top2) == {10, 14}
     assert res.donor_weights.get(10, 0) + res.donor_weights.get(14, 0) > 0.7
+
+
+# ---------------------------------------------------------------------------
+# In-space placebo permutation inference (Abadie-Diamond-Hainmueller 2010 §2.4)
+# ---------------------------------------------------------------------------
+
+
+def _fit_for_placebo(n_donors=4, effect=3.0, **kw):
+    """Fit with cheap settings on a panel carrying a strong post-treatment effect."""
+    df, _, _ = _make_panel(n_donors=n_donors, effect=effect)
+    opts = dict(_FAST)
+    opts.update(kw)
+    with warnings.catch_warnings():  # single-donor / poor-fit fit warnings are not under test
+        warnings.simplefilter("ignore")
+        return synthetic_control(df, "y", "treated", "unit", "year", seed=0, **opts)
+
+
+def test_in_space_placebo_strong_effect_ranks_treated_first():
+    # A 3.0-unit post effect on a treated unit that is a clean convex mix of two
+    # donors -> treated RMSPE ratio is the most extreme -> rank 1 -> p = 1/(J+1).
+    res = _fit_for_placebo(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        pdf = res.in_space_placebo()
+    assert res.n_placebos == 4 and res.n_failed == 0
+    treated_ratio = pdf.loc[pdf["is_treated"], "rmspe_ratio"].iloc[0]
+    placebo_ratios = pdf.loc[~pdf["is_treated"], "rmspe_ratio"]
+    assert (treated_ratio > placebo_ratios).all()  # treated is the most extreme unit
+    assert res.placebo_p_value == pytest.approx(1 / (res.n_placebos + 1))
+    # Exactly one treated row; the placebo rows are exactly the donor units.
+    assert int(pdf["is_treated"].sum()) == 1
+    assert pdf.loc[pdf["is_treated"], "unit"].iloc[0] == "treated"
+    assert set(pdf.loc[~pdf["is_treated"], "unit"]) == {"d0", "d1", "d2", "d3"}
+
+
+def test_in_space_placebo_excludes_real_treated_from_donor_pools():
+    # The real treated unit is never in the donor universe, so it cannot serve as
+    # a donor for any placebo (ADH 2010 contamination guard; SCtools convention).
+    res = _fit_for_placebo(n_donors=4)
+    snap = res._fit_snapshot
+    assert snap.treated_id == "treated" and "treated" not in snap.donor_ids
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res.in_space_placebo()
+    # Each donor became a placebo exactly once; the treated unit is not a placebo.
+    assert "treated" not in res._placebo_gaps
+    assert set(res._placebo_gaps) == set(snap.donor_ids)
+
+
+def test_in_space_placebo_p_in_valid_discrete_set():
+    res = _fit_for_placebo(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res.in_space_placebo()
+    valid = {(k + 1) / (res.n_placebos + 1) for k in range(res.n_placebos + 1)}
+    assert any(res.placebo_p_value == pytest.approx(v) for v in valid)
+
+
+def test_in_space_placebo_does_not_touch_analytical_inference():
+    # The permutation p-value is SEPARATE from the analytical fields, which stay
+    # NaN; is_significant stays bound to the (NaN) p_value, not placebo_p_value.
+    res = _fit_for_placebo(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res.in_space_placebo()
+    assert np.isfinite(res.placebo_p_value)
+    assert_nan_inference(
+        {"se": res.se, "t_stat": res.t_stat, "p_value": res.p_value, "conf_int": res.conf_int}
+    )
+    assert res.is_significant is False
+
+
+def test_in_space_placebo_deterministic():
+    res = _fit_for_placebo(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        p1 = res.in_space_placebo()
+        first_p = res.placebo_p_value
+        p2 = res.in_space_placebo()
+    assert res.placebo_p_value == first_p  # bit-equal p-value across runs
+    pd.testing.assert_frame_equal(p1, p2)  # identical rows AND row order
+
+
+def test_in_space_placebo_requires_two_donors():
+    res = _fit_for_placebo(n_donors=1)
+    with pytest.warns(UserWarning, match="at least 2 donors"):
+        pdf = res.in_space_placebo()
+    assert np.isnan(res.placebo_p_value) and res.n_placebos == 0
+    assert len(pdf) == 1 and bool(pdf["is_treated"].iloc[0])
+
+
+def test_in_space_placebo_two_donors_warns_coarse():
+    res = _fit_for_placebo(n_donors=2)
+    with pytest.warns(UserWarning, match="coarse"):
+        res.in_space_placebo()
+    # 2 placebos -> reference set of 3 -> p in {1/3, 2/3, 1}.
+    assert res.n_placebos == 2
+    assert any(res.placebo_p_value == pytest.approx(v) for v in (1 / 3, 2 / 3, 1.0))
+
+
+def test_in_space_placebo_fails_closed_on_nonconverged_treated_fit():
+    # inner_max_iter=1 truncates the treated unit's own Frank-Wolfe solve, so its
+    # RMSPE ratio is not a valid optimum. in_space_placebo() must fail closed
+    # (NaN p-value + warning), NOT rank a truncated treated statistic.
+    df, _, _ = _make_panel(n_donors=4, effect=3.0)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            seed=0,
+            n_starts=1,
+            inner_max_iter=1,
+            optimizer_options={"maxiter": 5},
+        )
+    assert res._fit_converged is False  # treated fit was truncated
+    with pytest.warns(UserWarning, match="did not converge at fit time"):
+        pdf = res.in_space_placebo()
+    assert np.isnan(res.placebo_p_value)
+    assert res.n_placebos == 0 and res.n_failed == 0  # the placebo loop never ran
+    assert len(pdf) == 1 and bool(pdf["is_treated"].iloc[0])  # treated row only
+
+
+def test_in_space_placebo_pickle_drops_snapshot_keeps_scalars():
+    res = _fit_for_placebo(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res.in_space_placebo()
+    restored = pickle.loads(pickle.dumps(res))
+    # Scalars survive; panel-derived state is dropped.
+    assert restored.placebo_p_value == res.placebo_p_value
+    assert restored.rmspe_ratio == res.rmspe_ratio
+    assert restored.n_placebos == res.n_placebos and restored.n_failed == res.n_failed
+    assert restored._fit_snapshot is None and restored._placebo_gaps is None
+    # The small aggregate table survives, so get_placebo_df still works...
+    assert len(restored.get_placebo_df()) == len(res.get_placebo_df())
+    # ...but a re-run of the refit raises (the snapshot is gone).
+    with pytest.raises(ValueError, match="requires the fit snapshot"):
+        restored.in_space_placebo()
+
+
+def test_in_space_placebo_custom_v_path():
+    df, _, _ = _make_panel(n_donors=4)
+    # Default predictors = all pre-period outcomes -> k = number of pre periods (T0).
+    k = 6
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            v_method="custom",
+            custom_v=np.ones(k),
+            inner_min_decrease=1e-3,
+        )
+        pdf = res.in_space_placebo()
+    assert res.n_placebos == 4 and np.isfinite(res.placebo_p_value)
+    assert len(pdf) == 5
+
+
+def test_get_placebo_df_before_run_returns_treated_row_only():
+    res = _fit_for_placebo(n_donors=4)
+    pdf = res.get_placebo_df()
+    assert len(pdf) == 1
+    assert bool(pdf["is_treated"].iloc[0]) and pdf["status"].iloc[0] == "treated"
+    assert set(pdf.columns) == {
+        "unit",
+        "pre_mspe",
+        "post_mspe",
+        "rmspe_ratio",
+        "is_treated",
+        "status",
+    }
+
+
+def test_rmspe_ratio_floors_zero_pre_mspe():
+    # Perfect pre-fit (pre-MSPE == 0) must yield a large-but-finite ratio, not
+    # inf/nan (which would corrupt the permutation rank).
+    from diff_diff.synthetic_control import _rmspe_ratio
+
+    pre = np.zeros(5)
+    assert np.isfinite(_rmspe_ratio(pre, np.array([1.0, 2.0, 3.0]), scale=10.0))
+    # A zero-effect (post all zero) placebo has ratio 0 — the least extreme.
+    assert _rmspe_ratio(pre, np.zeros(3), scale=10.0) == 0.0
+
+
+def test_in_space_placebo_perfect_treated_fit_finite_ratio():
+    # 2-donor panel where the treated unit EQUALS d0 in the pre-period -> the inner
+    # FW solve lands on w=[1, 0], so the treated pre-MSPE is (bit-)exactly 0. The
+    # RMSPE ratio must stay FINITE (scale-aware floor), never inf/nan.
+    rng = np.random.default_rng(2)
+    T, T0 = 8, 6
+    years = list(range(2000, 2000 + T))
+    d0 = rng.normal(10, 2, T)
+    d1 = rng.normal(5, 2, T)
+    treated = d0.copy()
+    treated[T0:] += 5.0  # identical to d0 in the pre-period, clean post effect
+    rows = []
+    for name, series in (("d0", d0), ("d1", d1)):
+        for t in range(T):
+            rows.append({"unit": name, "year": years[t], "y": series[t], "treated": 0})
+    for t in range(T):
+        rows.append({"unit": "treated", "year": years[t], "y": treated[t], "treated": int(t >= T0)})
+    df = pd.DataFrame(rows)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            v_method="custom",
+            custom_v=np.ones(T0),
+            inner_min_decrease=1e-3,
+        )
+    assert res.pre_rmspe == pytest.approx(0.0, abs=1e-9)
+    assert np.isfinite(res.rmspe_ratio) and res.rmspe_ratio > 0
+
+
+def test_in_space_placebo_immune_to_post_fit_mutation():
+    # The fit snapshot must COPY caller-owned mutable inputs (custom_v,
+    # optimizer_options), so mutating them after fit() cannot silently change
+    # in_space_placebo() output on an already-returned results object.
+    df, _, _ = _make_panel(n_donors=4)
+    cv = np.ones(6)  # k = 6 default pre-period-outcome predictors
+    opts = {"maxiter": 50}
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            v_method="custom",
+            custom_v=cv,
+            optimizer_options=opts,
+            inner_min_decrease=1e-3,
+        )
+        p1 = res.in_space_placebo().copy()
+        pval1 = res.placebo_p_value
+    snap = res._fit_snapshot
+    assert snap.custom_v is not cv and snap.optimizer_options is not opts
+    # Mutate the caller-owned originals AFTER fit -> placebo output must not change.
+    cv[:] = [1e6, 1.0, 1.0, 1.0, 1.0, 1.0]
+    opts["maxiter"] = 1
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        p2 = res.in_space_placebo().copy()
+    assert res.placebo_p_value == pval1
+    pd.testing.assert_frame_equal(p1, p2)
+
+
+def test_get_placebo_df_includes_failed_donors(monkeypatch):
+    # When the treated fit IS valid but some per-donor placebo refits fail to
+    # converge, get_placebo_df() must still list EVERY unit (treated + each donor)
+    # so callers can tell which donors failed -> exactly n_donors + 1 rows.
+    # (A truncated treated fit instead fails the whole placebo run closed, tested
+    # separately; here we simulate isolated donor failures with a converged treated
+    # fit by monkeypatching the per-donor refit to fail for the first two donors.)
+    import importlib
+
+    # diff_diff.synthetic_control the SUBMODULE is shadowed by the re-exported
+    # synthetic_control FUNCTION on the package, so import the module explicitly.
+    sc = importlib.import_module("diff_diff.synthetic_control")
+
+    res = _fit_for_placebo(n_donors=4)  # treated fit converges (normal settings)
+    real_fit_unit = sc._placebo_fit_unit
+    calls = {"n": 0}
+
+    def flaky_fit_unit(snap, unit, donor_pool, n_starts):
+        calls["n"] += 1
+        if calls["n"] <= 2:  # first two donor refits "fail"
+            return None
+        return real_fit_unit(snap, unit, donor_pool, n_starts)
+
+    monkeypatch.setattr(sc, "_placebo_fit_unit", flaky_fit_unit)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        pdf = res.in_space_placebo()
+    assert len(pdf) == res.n_donors + 1  # treated + every donor, regardless of failures
+    assert res.n_failed == 2 and res.n_placebos == res.n_donors - 2
+    failed = pdf[pdf["status"] == "failed"]
+    assert len(failed) == 2 and failed["rmspe_ratio"].isna().all()  # NaN metrics
+
+
+def test_in_space_placebo_fails_closed_on_underoptimized_outer_v():
+    # An under-optimized OUTER V search (optimizer maxiter=1) leaves the treated
+    # fit's V non-optimal even though the inner solve converges. Its RMSPE ratio is
+    # therefore not a valid optimum, so in_space_placebo() must FAIL CLOSED rather
+    # than silently rank an anti-conservatively under-optimized statistic.
+    df, _, _ = _make_panel(n_donors=4, effect=3.0)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            seed=0,
+            n_starts=1,
+            optimizer_options={"maxiter": 1},  # outer V search cannot converge
+            inner_min_decrease=1e-3,  # inner still converges -> isolates the outer path
+        )
+    assert res._fit_converged is False  # outer V non-convergence -> invalid fit
+    with pytest.warns(UserWarning, match="did not converge at fit time"):
+        res.in_space_placebo()
+    assert np.isnan(res.placebo_p_value)
+    assert res.n_placebos == 0 and res.n_failed == 0  # placebo loop never ran
+
+
+def test_outer_v_convergence_tracks_selected_incumbent(monkeypatch):
+    # _outer_solve_V must report convergence of the SELECTED (lowest-objective)
+    # incumbent, NOT "any start converged". Here the first multistart succeeds with a
+    # HIGH objective while the winning (lowest-objective) start reports success=False;
+    # the fit must be flagged non-converged so in_space_placebo() fails closed.
+    import importlib
+
+    from scipy.optimize import OptimizeResult
+
+    sc = importlib.import_module("diff_diff.synthetic_control")
+    calls = {"n": 0}
+
+    def fake_minimize(fun, x0, **kwargs):
+        calls["n"] += 1
+        x0 = np.asarray(x0, dtype=float)
+        if kwargs.get("method") == "Nelder-Mead":
+            # 1st start: high objective but converged; later: low objective (wins) but NOT.
+            if calls["n"] == 1:
+                return OptimizeResult(x=x0, fun=10.0, success=True)
+            return OptimizeResult(x=x0, fun=1.0, success=False)
+        # Powell polish: neither improves on nor converges at the incumbent.
+        return OptimizeResult(x=x0, fun=5.0, success=False)
+
+    monkeypatch.setattr(sc, "minimize", fake_minimize)
+    df, _, _ = _make_panel(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df, "y", "treated", "unit", "year", seed=0, n_starts=2, inner_min_decrease=1e-3
+        )
+    # The winning incumbent came from a success=False run -> selected V is not a
+    # validated optimum, so the fit must not be marked converged.
+    assert res._fit_converged is False
+    with pytest.warns(UserWarning, match="did not converge at fit time"):
+        res.in_space_placebo()
+    assert np.isnan(res.placebo_p_value)
+
+
+def test_outer_v_powell_success_at_worse_point_does_not_validate(monkeypatch):
+    # The Powell polish must validate the SELECTED incumbent only when it converges
+    # back AT the incumbent's objective level. Here the winning (lowest-objective)
+    # Nelder-Mead start reports success=False, and Powell "succeeds" but at a STRICTLY
+    # WORSE objective (it ended elsewhere). Powell's success says nothing about the
+    # selected incumbent, so the fit must stay non-converged and fail closed -- a flag
+    # of "converged" here would silently admit an under-optimized V into the placebo
+    # ranking and produce wrong permutation inference.
+    import importlib
+
+    from scipy.optimize import OptimizeResult
+
+    sc = importlib.import_module("diff_diff.synthetic_control")
+    calls = {"n": 0}
+
+    def fake_minimize(fun, x0, **kwargs):
+        calls["n"] += 1
+        x0 = np.asarray(x0, dtype=float)
+        if kwargs.get("method") == "Nelder-Mead":
+            # Single start: lowest objective wins but reports success=False.
+            return OptimizeResult(x=x0, fun=1.0, success=False)
+        # Powell polish: SUCCEEDS, but at a strictly worse objective than the incumbent.
+        return OptimizeResult(x=x0, fun=5.0, success=True)
+
+    monkeypatch.setattr(sc, "minimize", fake_minimize)
+    df, _, _ = _make_panel(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df, "y", "treated", "unit", "year", seed=0, n_starts=1, inner_min_decrease=1e-3
+        )
+    # Powell's success at a worse point must NOT flip the selected incumbent to converged.
+    assert res._fit_converged is False
+    with pytest.warns(UserWarning, match="did not converge at fit time"):
+        res.in_space_placebo()
+    assert np.isnan(res.placebo_p_value)
+
+
+def test_to_dict_includes_placebo_scalars():
+    res = _fit_for_placebo(n_donors=4)
+    d = res.to_dict()
+    for key in ("placebo_p_value", "rmspe_ratio", "n_placebos", "n_failed"):
+        assert key in d
+    # Before the placebo run: rmspe_ratio is finite (fit-time), placebo_p_value NaN.
+    assert np.isfinite(d["rmspe_ratio"]) and np.isnan(d["placebo_p_value"])
+    assert d["n_placebos"] == 0 and d["n_failed"] == 0
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res.in_space_placebo()
+    d2 = res.to_dict()
+    assert np.isfinite(d2["placebo_p_value"]) and d2["n_placebos"] == 4
+
+
+def test_summary_distinguishes_infeasible_placebo_from_not_run():
+    # summary() must tell "placebo never run" apart from "placebo run but produced
+    # no valid reference set" (J<2 here -> placebo_p_value NaN but it WAS attempted).
+    df, _, _ = _make_panel(n_donors=1)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(df, "y", "treated", "unit", "year", seed=0, **_FAST)
+        before = res.summary()
+        res.in_space_placebo()  # infeasible: single donor -> no placebo distribution
+        after = res.summary()
+    assert "Run in_space_placebo()" in before  # never run
+    assert np.isnan(res.placebo_p_value) and res._placebo_df is not None  # attempted
+    assert "attempted but" in after and "no valid reference set" in after
+    assert "Run in_space_placebo()" not in after  # not mislabeled as "not run"
+
+
+def test_rmspe_ratio_is_root_scale():
+    # The reported statistic is the ROOT-scale ratio RMSPE_post/RMSPE_pre =
+    # sqrt(MSPE_post/MSPE_pre), NOT the MSPE ratio. Hand-worked: pre-MSPE = 4,
+    # post-MSPE = 9 -> sqrt(9/4) = 1.5 (the MSPE ratio would be 9/4 = 2.25).
+    from diff_diff.synthetic_control import _rmspe_ratio
+
+    pre = np.array([2.0, 2.0])  # MSPE = 4
+    post = np.array([3.0, 3.0])  # MSPE = 9
+    assert _rmspe_ratio(pre, post, scale=10.0) == pytest.approx(1.5)
+    # Zero post-effect -> ratio 0; perfect pre-fit -> finite (floored), not inf.
+    assert _rmspe_ratio(pre, np.zeros(2), scale=10.0) == pytest.approx(0.0)
+    assert np.isfinite(_rmspe_ratio(np.zeros(2), post, scale=10.0))
diff --git a/tests/test_practitioner.py b/tests/test_practitioner.py
index 99ccec5d..4153d840 100644
--- a/tests/test_practitioner.py
+++ b/tests/test_practitioner.py
@@ -20,6 +20,7 @@
 from diff_diff.results import DiDResults, SyntheticDiDResults
 from diff_diff.stacked_did_results import StackedDiDResults
 from diff_diff.sun_abraham import SunAbrahamResults
+from diff_diff.synthetic_control_results import SyntheticControlResults
 from diff_diff.triple_diff import TripleDifferenceResults
 from diff_diff.trop_results import TROPResults
 from diff_diff.two_stage_results import TwoStageDiDResults
@@ -114,6 +115,15 @@ def mock_trop_results():
     return r
 
 
+@pytest.fixture
+def mock_scm_results():
+    r = SyntheticControlResults.__new__(SyntheticControlResults)
+    r.att = 1.2
+    r.se = np.nan
+    r.placebo_p_value = np.nan
+    return r
+
+
 @pytest.fixture
 def mock_efficient_results():
     r = EfficientDiDResults.__new__(EfficientDiDResults)
@@ -336,6 +346,21 @@ def test_trop_results(self, mock_trop_results):
         assert "control_group" not in all_code
         assert "anticipation" not in all_code
 
+    def test_synthetic_control_results(self, mock_scm_results):
+        output = practitioner_next_steps(mock_scm_results, verbose=False)
+        assert output["estimator"] == "SyntheticControl"
+        assert len(output["next_steps"]) > 0
+        # The in-space placebo step must surface (it is SCM's significance test)
+        # and must not be auto-suppressed as the completed estimation step.
+        all_code = " ".join(s.get("code", "") for s in output["next_steps"])
+        all_labels = " ".join(s.get("label", "") for s in output["next_steps"]).lower()
+        assert "in_space_placebo" in all_code
+        assert "placebo" in all_labels
+        # SCM is not a staggered DiD: no control-group / anticipation knobs.
+        handler_steps = [s for s in output["next_steps"] if s["baker_step"] > 2]
+        handler_code = " ".join(s.get("code", "") for s in handler_steps)
+        assert "control_group" not in handler_code and "anticipation" not in handler_code
+
     def test_efficient_results(self, mock_efficient_results):
         output = practitioner_next_steps(mock_efficient_results, verbose=False)
         assert len(output["next_steps"]) > 0

From 1d188638a204737796ab1d21a378da393d3841c9 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 31 May 2026 15:26:39 -0400
Subject: [PATCH 2/3] =?UTF-8?q?synthetic=5Fcontrol:=20address=20CI=20codex?=
 =?UTF-8?q?=20P2s=20=E2=80=94=20precise=20placebo-infeasibility=20reason?=
 =?UTF-8?q?=20+=20n=5Fstarts=20validation?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

CI review (PR #511) flagged two P2s:

1. summary() misdiagnosed the infeasible-placebo reason. The treated-fit
   fail-closed branch and the J<2 branch both leave n_placebos=0, n_failed=0,
   so reconstructing the cause from counts narrated a non-converged treated fit
   as "too few donors or all donor refits failed" (false for that path). Add an
   explicit `_placebo_status` recorded by in_space_placebo()
   ({treated_fit_nonconverged, too_few_donors, all_placebos_failed, ran}) and
   render the specific reason in both summary() and DiagnosticReport._scm_native
   instead of inferring it from counts. The status is a small string, so it
   survives pickling alongside _placebo_df / the scalar fields.

2. in_space_placebo(n_starts=...) skipped the positive-integer validation the
   estimator constructor enforces — int(0)/int(-1)/int(2.5) silently coerced
   into a degenerate or invalid permutation procedure. Mirror the constructor's
   check and fail fast with a ValueError.

Also relabels the lingering "One-sided rank" code comment to "upper-tail rank
on the unsigned RMSPE ratio" (matching the REGISTRY/CHANGELOG wording fix).

New regressions: summary() names the treated-fit-failure reason (not the
donor-side reason) under identical n_placebos/n_failed counts; invalid n_starts
overrides raise and leave placebo state untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 diff_diff/diagnostic_report.py              | 30 +++++++++--
 diff_diff/synthetic_control_results.py      | 57 +++++++++++++++++----
 tests/test_methodology_synthetic_control.py | 53 ++++++++++++++++++-
 3 files changed, 125 insertions(+), 15 deletions(-)

diff --git a/diff_diff/diagnostic_report.py b/diff_diff/diagnostic_report.py
index 9173f3c6..f474e91b 100644
--- a/diff_diff/diagnostic_report.py
+++ b/diff_diff/diagnostic_report.py
@@ -2283,17 +2283,39 @@ def _scm_native(self, r: Any) -> Dict[str, Any]:
                 "n_placebos": _to_python_scalar(n_placebos),
                 "n_failed": _to_python_scalar(getattr(r, "n_failed", None)),
             }
-            # Distinguish a valid run from an attempted-but-infeasible one (J<2,
-            # treated-fit non-convergence, or zero successful donor refits) so BR/DR
+            # Distinguish a valid run from an attempted-but-infeasible one so BR/DR
             # consumers see an explicit status/reason rather than a bare NaN p-value.
+            # The SPECIFIC cause comes from the results' recorded ``_placebo_status``
+            # (n_placebos/n_failed alone cannot tell a non-converged treated fit
+            # apart from too-few-donors).
             if n_placebos > 0 and np.isfinite(placebo_p):
                 block["status"] = "ran"
             else:
+                placebo_status = getattr(r, "_placebo_status", None)
+                _reasons = {
+                    "treated_fit_nonconverged": (
+                        "in_space_placebo() was run but the treated unit's own SCM "
+                        "fit did not converge at fit time, so its RMSPE ratio is not "
+                        "a valid optimum to rank against placebos; placebo_p_value "
+                        "is NaN."
+                    ),
+                    "too_few_donors": (
+                        "in_space_placebo() was run but fewer than 2 donors are "
+                        "available (each placebo is fit against the other donors); "
+                        "placebo_p_value is NaN."
+                    ),
+                    "all_placebos_failed": (
+                        "in_space_placebo() was run but every donor refit failed to "
+                        "converge, so no placebo entered the reference set; "
+                        "placebo_p_value is NaN."
+                    ),
+                }
                 block["status"] = "infeasible"
-                block["reason"] = (
+                block["reason"] = _reasons.get(
+                    placebo_status,
                     "in_space_placebo() was run but produced no valid reference set "
                     "(fewer than 2 donors, a non-converged treated fit, or all donor "
-                    "refits failed); placebo_p_value is NaN."
+                    "refits failed); placebo_p_value is NaN.",
                 )
             out["in_space_placebo"] = block
         else:
diff --git a/diff_diff/synthetic_control_results.py b/diff_diff/synthetic_control_results.py
index dff829f0..8e412057 100644
--- a/diff_diff/synthetic_control_results.py
+++ b/diff_diff/synthetic_control_results.py
@@ -186,6 +186,14 @@ def __post_init__(self) -> None:
         # in_space_placebo() fails closed when this is False: a truncated treated
         # fit makes the ranked statistic (rmspe_ratio) not a valid SCM optimum.
         self._fit_converged: bool = True
+        # Explicit reason an in-space placebo run was infeasible/absent, set by
+        # in_space_placebo(). summary() / _scm_native render THIS instead of
+        # reconstructing the cause from counts — n_placebos/n_failed alone cannot
+        # tell a non-converged treated fit ("treated_fit_nonconverged", n_failed=0)
+        # apart from too few donors ("too_few_donors", also n_failed=0). Values:
+        # None (not run), "ran", "treated_fit_nonconverged", "too_few_donors",
+        # "all_placebos_failed". A small string, so it survives pickling.
+        self._placebo_status: Optional[str] = None
 
     def __getstate__(self) -> Dict[str, Any]:
         """Exclude panel-derived internal state from pickling.
@@ -322,16 +330,35 @@ def summary(self, alpha: Optional[float] = None) -> str:
                 ]
             )
         elif placebo_attempted:
-            lines.extend(
-                [
+            # Render the SPECIFIC reason recorded by in_space_placebo(); the count
+            # fields (n_placebos=0, n_failed=0) cannot tell a non-converged treated
+            # fit apart from too-few-donors, so do not reconstruct it from counts.
+            status = getattr(self, "_placebo_status", None)
+            if status == "treated_fit_nonconverged":
+                reason = [
+                    "In-space placebo was skipped: the treated unit's own SCM fit "
+                    "did not converge at fit time (inner Frank-Wolfe weight solve",
+                    "and/or outer V search), so its RMSPE ratio is not a valid "
+                    "optimum to rank against placebos. placebo_p_value is undefined",
+                    "— re-fit with a larger inner_max_iter / looser "
+                    "inner_min_decrease and/or a larger optimizer_options['maxiter']",
+                    "/ more n_starts.",
+                ]
+            elif status == "too_few_donors":
+                reason = [
+                    "In-space placebo inference requires at least 2 donors (each "
+                    "placebo is fit against the other donors); too few were",
+                    "available. placebo_p_value is undefined. Inspect " "get_placebo_df().",
+                ]
+            else:  # "all_placebos_failed" (or a legacy unpickle without the status)
+                reason = [
                     "In-space placebo permutation inference was attempted but "
                     "produced no valid reference set",
                     f"(0 placebos entered the rank; {self.n_failed} failed to "
-                    "converge). placebo_p_value is undefined — too few donors or "
-                    "all donor refits failed. Inspect get_placebo_df().",
-                    "=" * 75,
+                    "converge). placebo_p_value is undefined — all donor refits",
+                    "failed. Inspect get_placebo_df().",
                 ]
-            )
+            lines.extend([*reason, "=" * 75])
         else:
             lines.extend(
                 [
@@ -550,7 +577,15 @@ def in_space_placebo(
         snap = self._fit_snapshot
         donors = list(snap.donor_ids)
         n_donors = len(donors)
-        n_starts_eff = snap.n_starts if n_starts is None else int(n_starts)
+        if n_starts is None:
+            n_starts_eff = snap.n_starts
+        else:
+            # Mirror the estimator constructor's validation (synthetic_control.py)
+            # so a bad override fails fast instead of silently coercing (e.g. via
+            # int(0)/int(-1)) into a degenerate or invalid permutation procedure.
+            if not isinstance(n_starts, (int, np.integer)) or n_starts < 1:
+                raise ValueError(f"n_starts override must be a positive integer, got {n_starts!r}")
+            n_starts_eff = int(n_starts)
 
         treated_pre = _mspe(self.gap_path, snap.pre_periods)
         treated_post = _mspe(self.gap_path, snap.post_periods)
@@ -588,6 +623,7 @@ def in_space_placebo(
             self.n_placebos = 0
             self.n_failed = 0
             self._placebo_gaps = {}
+            self._placebo_status = "treated_fit_nonconverged"
             self._placebo_df = pd.DataFrame(rows, columns=self._PLACEBO_COLS)
             return self._placebo_df.copy()
 
@@ -603,6 +639,7 @@ def in_space_placebo(
             self.n_placebos = 0
             self.n_failed = 0
             self._placebo_gaps = {}
+            self._placebo_status = "too_few_donors"
             self._placebo_df = pd.DataFrame(rows, columns=self._PLACEBO_COLS)
             return self._placebo_df.copy()
 
@@ -668,8 +705,9 @@ def in_space_placebo(
             )
             p_value = np.nan
         else:
-            # One-sided rank, treated unit included as the "+1". Ties counted via
-            # ``>=`` so the p-value is conservative.
+            # Upper-tail rank on the (unsigned) RMSPE ratio, treated unit included
+            # as the "+1". Ties counted via ``>=`` so the p-value is conservative.
+            # (The ratio squares the gaps -> direction-agnostic, NOT a signed test.)
             rank = 1 + sum(1 for r in ranked_ratios if r >= treated_ratio)
             p_value = rank / (n_placebos + 1)
 
@@ -686,5 +724,6 @@ def in_space_placebo(
         self.n_placebos = int(n_placebos)
         self.n_failed = int(n_failed)
         self._placebo_gaps = placebo_gaps
+        self._placebo_status = "ran" if n_placebos > 0 else "all_placebos_failed"
         self._placebo_df = pd.DataFrame(rows, columns=self._PLACEBO_COLS)
         return self._placebo_df.copy()
diff --git a/tests/test_methodology_synthetic_control.py b/tests/test_methodology_synthetic_control.py
index 074ad896..ae563d01 100644
--- a/tests/test_methodology_synthetic_control.py
+++ b/tests/test_methodology_synthetic_control.py
@@ -1254,7 +1254,8 @@ def test_to_dict_includes_placebo_scalars():
 
 def test_summary_distinguishes_infeasible_placebo_from_not_run():
     # summary() must tell "placebo never run" apart from "placebo run but produced
-    # no valid reference set" (J<2 here -> placebo_p_value NaN but it WAS attempted).
+    # no valid reference set" (J<2 here -> placebo_p_value NaN but it WAS attempted),
+    # and name the SPECIFIC infeasibility reason (too few donors).
     df, _, _ = _make_panel(n_donors=1)
     with warnings.catch_warnings():
         warnings.simplefilter("ignore")
@@ -1264,10 +1265,58 @@ def test_summary_distinguishes_infeasible_placebo_from_not_run():
         after = res.summary()
     assert "Run in_space_placebo()" in before  # never run
     assert np.isnan(res.placebo_p_value) and res._placebo_df is not None  # attempted
-    assert "attempted but" in after and "no valid reference set" in after
+    assert res._placebo_status == "too_few_donors"
+    assert "requires at least 2 donors" in after  # specific reason, not "not run"
     assert "Run in_space_placebo()" not in after  # not mislabeled as "not run"
 
 
+def test_summary_treated_fit_failure_names_specific_reason():
+    # When the treated unit's OWN fit fails to converge, in_space_placebo() fails
+    # closed with n_placebos=0, n_failed=0 -- the SAME counts as the J<2 case. The
+    # CI codex P2: summary() must not reconstruct the reason from those counts and
+    # narrate "too few donors or all donor refits failed" (false here); it must name
+    # the treated-fit non-convergence recorded in _placebo_status.
+    df, _, _ = _make_panel(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            seed=0,
+            n_starts=1,
+            optimizer_options={"maxiter": 1},  # outer V cannot converge -> fail closed
+            inner_min_decrease=1e-3,
+        )
+        assert res._fit_converged is False
+        with pytest.warns(UserWarning, match="did not converge at fit time"):
+            res.in_space_placebo()
+        after = res.summary()
+    assert res._placebo_status == "treated_fit_nonconverged"
+    assert res.n_placebos == 0 and res.n_failed == 0  # same counts as J<2
+    assert "treated unit's own SCM fit" in after and "did not converge" in after
+    # Must NOT misdiagnose as the donor-side reason.
+    assert "too few" not in after.lower()
+    assert "all donor refits" not in after.lower()
+
+
+def test_in_space_placebo_rejects_invalid_n_starts():
+    # CI codex P2: the n_starts override must fail fast on non-positive / non-integer
+    # values (mirroring the estimator constructor) rather than silently coercing via
+    # int(...) into a degenerate one-start (or invalid) permutation procedure.
+    res = _fit_for_placebo(n_donors=4)
+    for bad in (0, -1, -5):
+        with pytest.raises(ValueError, match="n_starts override must be a positive integer"):
+            res.in_space_placebo(n_starts=bad)
+    for bad in (2.5, "3"):
+        with pytest.raises(ValueError, match="n_starts override must be a positive integer"):
+            res.in_space_placebo(n_starts=bad)  # type: ignore[arg-type]
+    # The placebo state must be untouched by a rejected override.
+    assert res._placebo_status is None and res._placebo_df is None
+
+
 def test_rmspe_ratio_is_root_scale():
     # The reported statistic is the ROOT-scale ratio RMSPE_post/RMSPE_pre =
     # sqrt(MSPE_post/MSPE_pre), NOT the MSPE ratio. Hand-worked: pre-MSPE = 4,

From 44ea0b409ca9bd96b33e503edfc5bdb7a39b1e50 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 31 May 2026 15:45:16 -0400
Subject: [PATCH 3/3] test(diagnostic_report): scope scm_fit warning
 suppression to avoid global filter mutation

CI review (PR #511) P3: the new scm_fit fixture called
warnings.filterwarnings("ignore"), which permanently mutates the process-wide
warning filters and can silently weaken warning-regression coverage in later
tests in the same worker. Wrap only the synthetic_control(...) call in
warnings.catch_warnings() + simplefilter("ignore") so the suppression is scoped
to the fixture's solver call. (Pre-existing sibling fixtures in this file use
the same legacy global-filter pattern but are out of scope for this PR.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 tests/test_diagnostic_report.py | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/tests/test_diagnostic_report.py b/tests/test_diagnostic_report.py
index af1ef4c3..66e594a5 100644
--- a/tests/test_diagnostic_report.py
+++ b/tests/test_diagnostic_report.py
@@ -158,21 +158,24 @@ def _scm_panel(n_donors=4, T=8, T0=6, effect=3.0, seed=0):
 
 @pytest.fixture  # function-scoped: some tests run in_space_placebo(), which mutates the result
 def scm_fit():
-    warnings.filterwarnings("ignore")
     df = _scm_panel(n_donors=4)
     # Cheap optimizer settings keep the pure-Python fixture fast; the report path,
-    # not solver accuracy, is what these tests exercise.
-    res = synthetic_control(
-        df,
-        "y",
-        "treated",
-        "unit",
-        "year",
-        seed=0,
-        n_starts=1,
-        optimizer_options={"maxiter": 50},
-        inner_min_decrease=1e-3,
-    )
+    # not solver accuracy, is what these tests exercise. Scope the warning
+    # suppression to this call via catch_warnings() so it does not mutate the
+    # global filter state for later tests in the same worker.
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            seed=0,
+            n_starts=1,
+            optimizer_options={"maxiter": 50},
+            inner_min_decrease=1e-3,
+        )
     return res, df