Skip to content

Add SyntheticControl estimator (classic SCM, Abadie-Diamond-Hainmueller 2010) — PR-1 core#501

Merged
igerber merged 14 commits into
mainfrom
feature/synthetic-control
May 31, 2026
Merged

Add SyntheticControl estimator (classic SCM, Abadie-Diamond-Hainmueller 2010) — PR-1 core#501
igerber merged 14 commits into
mainfrom
feature/synthetic-control

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 30, 2026

Summary

  • Add SyntheticControl — the classic Synthetic Control Method of Abadie, Diamond & Hainmueller (2010) (originating in Abadie & Gardeazabal 2003) — as a standalone estimator (diff_diff/synthetic_control.py) with a separate SyntheticControlResults container (diff_diff/synthetic_control_results.py) and a synthetic_control() convenience function. Donor (unit) weights only — no time weights or ridge — distinct from SyntheticDiD.
  • Inner simplex-constrained weighted-LS solve W*(V) reuses utils._sc_weight_fw (folding V^½ into the predictor matrix, intercept=False, zeta=0, noise-scaled min_decrease). The diagonal predictor-importance matrix V is selected data-driven (v_method="nested": softmax-on-simplex multistart Nelder-Mead + derivative-free Powell polish, minimizing pre-period outcome MSPE) or user-supplied (v_method="custom"). Per-row standardization (SD over donors+treated, ddof=1) matches the R Synth::synth source.
  • Reports the gap path α̂_1t = Y_1t − Σ_j w_j Y_jt, att (mean post-period gap), pre_rmspe, donor weights, v_weights, and a predictor-balance table. No analytical standard errorse/t_stat/p_value/conf_int are NaN (in-space placebo permutation inference is a planned follow-up; _placebo_gaps/_rmspe_ratio/_fit_snapshot are reserved on the results object).
  • Fail-closed validation gates: predictor-period leakage, absorbing post-period suffix + no-anticipation/uninterrupted-exposure cross-check against the treatment column, post-period canonicalization, donor-pool filtering, empty/duplicate/reordered predictor windows, non-finite predictor/outcome data, NaN/non-binary treatment, duplicate predictor labels, inner-solve non-convergence, outer-V non-convergence, order-independent gap path, custom_v cross-field rules, and degenerate single-donor / single-pre-period handling.
  • Docs: docs/methodology/REGISTRY.md §SyntheticControl (with **Deviation from R:** / **Note:** labels), docs/api/synthetic_control.rst + autosummary stubs + index toctree, diff_diff/guides/llms.txt (estimator count 17→18) + llms-full.txt, README.md catalog row, docs/doc-deps.yaml, CHANGELOG.md.

Methodology references (required if estimator / math changes)

  • Method name(s): Classic Synthetic Control Method — donor-weight counterfactual with a nested/custom diagonal predictor-importance matrix V.
  • Paper / source link(s): Abadie, Diamond & Hainmueller (2010), JASA 105(490):493–505, https://doi.org/10.1198/jasa.2009.ap08746; Abadie & Gardeazabal (2003). Reviews on file under docs/methodology/papers/ (merged in synthetic control: PR-A paper reviews (ADH 2010/2015, Abadie 2021 JEL, CWZ 2021) #497). Inner solve / standardization / solution.v scaled-space behavior verified against the R Synth::synth source.
  • Intentional deviations from the source (all labeled in REGISTRY §SyntheticControl): standardization divisor pinned from the Synth source (ADH 2010 defers it to AG 2003 App. B); outer objective minimizes pre-period MSPE over all pre periods, not R's time.optimize.ssr window (so the nested V differs by an efficiency-only choice → Tier-2 parity is a band); softmax-on-simplex V parametrization; predictor/outcome aggregation fails closed on non-finite cells whereas R uses na.rm=TRUE (deliberate — na-dropping silently aggregates incomparable period subsets); aggregation restricted to linear combinations {mean, sum} (no median); standardize="none" option; 1×SD poor-fit threshold (defensive, matches SyntheticDiD). No analytical SE (NaN inference); placebo inference deferred.

Validation

  • Tests added/updated: tests/test_methodology_synthetic_control.py — 43 tests covering the 10 validation gates, custom_v cross-field rules, J==1 / T0==0 / T0==1 degeneracies, the NaN-inference contract, get_paramsset_params round-trip, the estimators re-export surface, and two-tier R-Synth parity.
  • R parity: reproduces the published Basque result (Cataluña 0.851 + Madrid 0.149, loss.v 0.0089). Tier-1 feeds R's solution.v via custom_v → donor weights match to atol=1e-3 (observed ~2e-5), deterministic/optimizer-independent; Tier-2 (@pytest.mark.slow) checks the nested fit in a band. Golden fixtures live in tests/data/ (so Tier-1 runs in isolated-install CI without R), generated by benchmarks/R/generate_synth_basque_golden.R; "Synth" added to benchmarks/R/requirements.R.
  • Backtest / simulation / notebook evidence: N/A (tutorial deferred to a later PR).
  • Deferred (tracked in TODO.md): wiring SyntheticControlResults into the practitioner / DiagnosticReport / BusinessReport routing → PR-2, where it pairs with the placebo-inference layer those reports would surface.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

igerber and others added 7 commits May 30, 2026 16:48
Add the classic Synthetic Control Method of Abadie, Diamond & Hainmueller
(2010) as a standalone `SyntheticControl` estimator with a separate
`SyntheticControlResults` container and a `synthetic_control()` convenience
function, distinct from `SyntheticDiD` (donor weights only; no time weights or
ridge).

- Inner simplex solve W*(V) reuses utils._sc_weight_fw (V^½ folded into the
  predictor matrix, intercept=False, zeta=0, noise-scaled min_decrease).
- Diagonal predictor-importance V selected data-driven (nested: softmax-on-
  simplex multistart Nelder-Mead + Powell polish, pre-period outcome MSPE) or
  user-supplied (custom_v). Per-row standardization (SD over donors+treated,
  ddof=1) matches the R Synth source; solution.v lives in the scaled space.
- Reports gap path, att (mean post gap), pre_rmspe, donor_weights, v_weights,
  predictor_balance. No analytical SE — inference fields are NaN (placebo
  permutation inference reserved for a follow-up; _placebo_gaps/_rmspe_ratio/
  _fit_snapshot reserved on the results object).
- 10 validation gates: predictor-period leakage, absorbing post-suffix +
  no-anticipation cross-check, post canonicalization, donor filtering, empty
  windows, poor-fit warning, duplicate labels, inner non-convergence warning,
  order-independent gap path, standardize="none" deviation; plus fail-closed
  custom_v cross-field rules and J==1 / T0 degenerate handling.

R-Synth parity (tests/test_methodology_synthetic_control.py; goldens generated
by benchmarks/R/generate_synth_basque_golden.R into tests/data/): two-tier on
the Basque study — Tier-1 feeds R's solution.v via custom_v and reproduces the
published donor weights (region 10 0.851 + region 14 0.149) to atol=1e-3
deterministically; Tier-2 (@slow) checks the nested fit in a tolerance band
(the nested V differs because the outer objective uses all pre periods, not
R's time.optimize.ssr window).

Docs: REGISTRY §SyntheticControl (deviation/Note labels), docs/api/
synthetic_control.rst + autosummary + index toctree, llms.txt (count 17→18) +
llms-full.txt sections, README catalog row, doc-deps.yaml, CHANGELOG.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…tes (2×P1)

- Validate the treatment indicator on the FULL input BEFORE
  _resolve_treated_and_donors, so a non-{0,1} code fails closed instead of
  silently dropping a unit from both the treated and donor sets (which would
  quietly change the donor pool / weights / ATT). (codex R1 P1)
- _resolve_periods now asserts D==1 in every post period (uninterrupted
  exposure) in addition to D==0 in every pre period (no anticipation), on both
  the inferred and explicit branches — an explicit suffix over a 0,0,1,0 path
  no longer averages a treated period with an untreated one. (codex R1 P1)
- Add regressions: test_non_binary_treatment_rejected,
  test_untreated_period_in_post_rejected. (codex R1 P2)

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…abel/mspe_v fixes

- Fail closed on non-finite (NaN/inf) values in the data that actually enters the
  SCM matching problem: the treated/donor outcome panel (gap path / ATT / nested
  objective) and every selected predictor block (predictors over predictor_window,
  special_predictors, pre_period_outcomes). Previously NaN propagated silently to
  finite-looking uniform weights (e.g. a docs-style predictors=['school.high'] over
  the default all-pre window on the Basque fixture's late-start covariates). (codex R2 P1)
- special_predictors label now uses the full ordered period list, so distinct sets
  sharing endpoints+length (e.g. [2000,2002,2004] vs [2000,2003,2004]) no longer
  collide in the duplicate guard / v_weights key. (codex R2 P2)
- mspe_v (the OUTER-objective value) is now None on the custom and single-donor
  paths, matching its documented "nested-only" contract. (codex R2 P2)
- Docs examples (api rst + llms-full) set predictor_window explicitly to steer away
  from the all-pre default on late-start covariates. (codex R2 P2)
- Regressions: test_non_finite_predictor_rejected, test_non_finite_outcome_rejected,
  test_distinct_special_period_sets_not_duplicate.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…s re-export, doc wording

- Reject missing (NaN) values in treatment/unit/time up front in fit(), before
  donor classification: a partially-missing treatment history would otherwise be
  silently classified as never-treated by groupby(...).max() (NaN dropped),
  changing the donor pool / weights / ATT with no warning. (codex R3 P1)
- Re-export SyntheticControl from diff_diff.estimators (mirrors the SyntheticDiD /
  TwoWayFixedEffects backward-compat re-export contract + __all__). (codex R3 P1)
- Reword the short README / llms.txt catalog descriptions: "no inference in this
  release (inference fields are NaN — permutation/placebo planned)" instead of
  "permutation-only inference", matching the api/REGISTRY docs. (codex R3 P3)
- Regressions: test_missing_treatment_value_rejected, test_estimators_module_reexport.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
… on outer-V non-convergence

- Document the predictor/outcome missing-data contract as an explicit
  **Deviation from R:** in REGISTRY: aggregation fails closed on any non-finite
  cell, whereas R Synth::dataprep uses na.rm=TRUE. The fail-closed choice is
  deliberate (na-dropping silently aggregates different period subsets across
  units → incomparable predictors). Parity wording tightened to "row ORDER
  matches dataprep" across REGISTRY / api rst / _validate_predictors docstring. (codex R4 P1)
- _outer_solve_V now tracks OptimizeResult.success across the multistart
  Nelder-Mead runs and the Powell polish, and emits a UserWarning when neither
  converged (previously only the inner Frank-Wolfe solve surfaced non-convergence). (codex R4 P2)
- Regressions: test_all_na_predictor_window_rejected (documents partial==full
  fail-closed under the no-na.rm contract), test_outer_v_nonconvergence_warning.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…onicalize predictor periods

- Restrict predictor/special aggregation ops to {mean, sum} — both LINEAR
  combinations per ADH 2010 §2.3 (Ȳ = Σ k_s Y_is). median (non-linear) removed
  from _OP_FUNCS, the _PredictorSpec comment, the fit docstring, and llms-full;
  no deviation note needed since mean/sum are the documented linear forms. (codex R5 P1)
- Canonicalize predictor_window / special_predictors / pre_period_outcomes period
  lists to unique + calendar-sorted (_canon) before labeling, mirroring the
  post_periods treatment: reordered [c,b,a]==[a,b,c] (so the duplicate-label guard
  catches them) and a repeated period no longer re-weights a "mean". (codex R5 P2)
- Regressions: test_median_op_rejected, test_reordered_special_periods_are_duplicates,
  test_duplicate_predictor_window_periods_deduped.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…ing-stack wiring (TODO)

- Clarify the mspe_v docstring: populated only when the nested outer search runs
  (None on both the custom and degenerate single-donor paths). (codex R6 P3)
- Defer the reporting-stack wiring P2 to PR-2 via a TODO.md row: wiring
  SyntheticControlResults into practitioner / DiagnosticReport / BusinessReport
  routing (so SCM results don't get generic PT/HonestDiD guidance) pairs naturally
  with the PR-2 placebo-inference layer those reports would surface. (codex R6 P2)

Codex R6 verdict: ✅ no P0/P1.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@github-actions
Copy link
Copy Markdown

Overall Assessment
⚠️ Needs changes — one unmitigated P1 methodology/documentation mismatch remains in SyntheticControl.

Executive Summary

  • Affected method: SyntheticControl (classic SCM from Abadie, Diamond & Hainmueller 2010; AG 2003 / Synth numerics for the nested V path).
  • The core SCM math, donor-pool logic, absorbing-treatment checks, fail-closed missing-data handling, and NaN inference contract are consistent with the new registry entry.
  • P1: the implementation only supports mean/sum/identity predictor aggregations, but the registry/source still describe general K_m linear combinations; because that restriction is not labeled in REGISTRY.md, it is an undocumented methodology deviation.
  • The missing practitioner / diagnostic / business-report routing is properly tracked in TODO.md, so it is informational only.
  • I could not execute the new tests in this environment because pytest, numpy, and pandas are unavailable.

Methodology

Code Quality

  • No findings. Impact: none. Concrete fix: none.

Performance

  • No findings. Impact: none. Concrete fix: none.

Maintainability

  • No findings beyond the tracked routing TODO below. Impact: none. Concrete fix: none.

Tech Debt

  • P3 The PR correctly tracks the missing SCM routing in practitioner/reporting surfaces in TODO.md:L85, so I would not block on it. Impact: SyntheticControlResults passed through Practitioner / DiagnosticReport / BusinessReport will still receive generic parallel-trends / HonestDiD messaging. Concrete fix: none for this PR; the TODO is the right place to add SCM-native diagnostics alongside placebo inference.

Security

  • No findings. Impact: none. Concrete fix: none.

Documentation/Tests

  • No additional PR-specific findings beyond the methodology doc mismatch above. Impact: none. Concrete fix: none.
  • Verification note: I could not run tests/test_methodology_synthetic_control.py here because the review environment does not have pytest, numpy, or pandas.

Path to Approval

  1. Add a **Note:** or **Deviation from R:** in docs/methodology/REGISTRY.md explicitly stating that PR-1 supports only equal-weight mean/sum/identity predictor constructions, not arbitrary weighted K_m combinations, and mirror that restriction in docs/api/synthetic_control.rst. If full ADH surface is intended instead, implement explicit weighted K_m support and cover it with a methodology test.

…tor restriction (P1)

Add a **Note:** in REGISTRY §SyntheticControl (mirrored in docs/api/synthetic_control.rst)
that predictor rows support only EQUAL-WEIGHT linear combinations — mean (k_s=1/T0), sum
(k_s=1), and per-period outcome lags (identity) — and NOT ADH (2010) §2.3's general
arbitrary-weight form Ȳ = Σ_s k_s Y_is (nor non-linear ops like median). The supported
set still spans standard Synth::dataprep predictors.op + special.predictors usage;
arbitrary-weight K_m is a deferred extension. Documents the restriction introduced when
median was dropped, so it is no longer an undocumented methodology deviation.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 536aeb5a2a7163efad1993250fc5e7d906b2f22e


Overall Assessment

✅ Looks good — the previous P1 methodology documentation gap is resolved, and I did not find a new unmitigated P0/P1 issue in the changed SCM code. One P2 edge-case warning gap remains, plus a tracked P3 tech-debt item.

Executive Summary

  • Affected method: SyntheticControl (classic SCM; ADH 2010, with AG 2003 / Synth-pinned numerics).
  • Re-review outcome: the prior P1 is resolved. The equal-weight predictor restriction is now explicitly documented in docs/methodology/REGISTRY.md:L1978-L1985 and docs/api/synthetic_control.rst:L107-L119.
  • Core SCM behavior is aligned with the registry on donor-pool construction, absorbing/no-anticipation validation, nested/custom V, and the NaN inference contract.
  • P2: the documented poor-fit warning can be skipped when the treated unit’s pre-period outcome path is flat (SD=0) even if pre-period RMSPE is nonzero.
  • The missing practitioner/report routing is properly tracked in TODO.md:L85, so it is informational only.
  • Verification note: I could not run tests/test_methodology_synthetic_control.py here because pytest, numpy, pandas, and scipy are unavailable.

Methodology

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • P3 The missing SCM-native routing in practitioner/report surfaces is explicitly tracked in TODO.md:L85. Impact: those tools still fall back to generic parallel-trends/HonestDiD guidance for SyntheticControlResults. Concrete fix: none for this PR; the TODO is sufficient.

Security

  • No findings.

Documentation/Tests

  • No additional PR-specific findings.
  • Verification note: I could not execute tests/test_methodology_synthetic_control.py in this environment because pytest, numpy, pandas, and scipy are not installed.

…at pre-path (P2)

The poor-fit warning was gated by `pre_sd > 0`, so a FLAT treated pre-period path
(SD == 0) with nonzero pre-RMSPE never warned even though the synthetic clearly fails
to reproduce a constant series. Change the gate to the literal REGISTRY contract
(warn when pre_rmspe > pre_sd), including the SD == 0 case, with a scale-aware absolute
floor (1e-8 * max(|Z1|, 1)) so a near-perfect flat fit (RMSPE ~ roundoff) does not
spuriously warn. REGISTRY poor-fit Note updated to document the flat-path behavior
(slightly broader than SyntheticDiD's SD>0-gated form). Regression:
test_poor_fit_warning_flat_treated_pre_path.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 95cd4f9efa00cdee700b92bb65738eacc7ca717a


Overall Assessment

✅ Looks good — the affected method is SyntheticControl (classic SCM per ADH 2010 / AG 2003 / Synth-pinned numerics), and I did not find a new unmitigated P0/P1 issue in the changed SCM paths. The prior methodology documentation gap is resolved. One P2 performance issue remains, plus a tracked P3 tech-debt item.

Executive Summary

  • Affected method: SyntheticControl in diff_diff/synthetic_control.py, with SyntheticControlResults in diff_diff/synthetic_control_results.py.
  • Re-review outcome: the prior P1 methodology documentation gap is resolved. The equal-weight K_m restriction and the flat-pre-path poor-fit warning behavior are now documented consistently with the implementation in docs/methodology/REGISTRY.md:L1978-L1985, docs/api/synthetic_control.rst:L107-L119, diff_diff/synthetic_control.py:L43-L47, and diff_diff/synthetic_control.py:L427-L434.
  • Core SCM contracts match the registry and local ADH 2010 review: single treated unit + never-treated donors, absorbing/no-anticipation checks, nested/custom V, and NaN analytical inference via safe_inference() in diff_diff/synthetic_control.py:L452-L453.
  • P2: _v_starts() still computes expensive heuristic starts even when n_starts=1, so the single-start path pays O(k) extra inner solves before optimization in diff_diff/synthetic_control.py:L914-L980.
  • The missing practitioner / DiagnosticReport / BusinessReport routing is properly tracked in TODO.md:L85, so it is informational only.
  • Verification note: I could not run tests/test_methodology_synthetic_control.py here because pytest, numpy, pandas, and scipy are unavailable in this environment.

Methodology

Affected method: SyntheticControl (classic SCM).

  • P3 Informational — No unmitigated methodology defect found in the changed estimator. The previous re-review issue is resolved: the implementation’s equal-weight-only predictor construction and flat-pre-path poor-fit warning now match the documented REGISTRY/API deviations in diff_diff/synthetic_control.py:L43-L47, diff_diff/synthetic_control.py:L427-L434, docs/methodology/REGISTRY.md:L1978-L1985, docs/api/synthetic_control.rst:L107-L119, and the regression coverage in tests/test_methodology_synthetic_control.py:L419-L435. Impact: none. Concrete fix: none.

Code Quality

  • No findings.

Performance

  • P2 _v_starts() eagerly computes the inverse-variance and univariate-fit heuristic candidates before truncating to n_starts, so n_starts=1 still incurs the univariate loop’s k inner Frank-Wolfe solves, and small n_starts values can still do unused candidate work in diff_diff/synthetic_control.py:L914-L980. Impact: the multistart knob does not actually bound startup cost on low-start runs, which can make high-k SCM fits materially slower than callers expect. Concrete fix: short-circuit _v_starts() when n_starts <= 1, and more generally stop generating heuristic/random candidates once len(candidates) == n_starts.

Maintainability

  • No findings.

Tech Debt

  • P3 The SCM-native practitioner/report routing gap is explicitly tracked in TODO.md:L85. Impact: routing SyntheticControlResults through those surfaces still yields generic parallel-trends/HonestDiD guidance rather than SCM-specific diagnostics. Concrete fix: none for this PR; the TODO tracking is sufficient.

Security

  • No findings.

Documentation/Tests

  • No PR-specific findings.
  • Verification note: static review only; I could not execute tests/test_methodology_synthetic_control.py in this environment because pytest, numpy, pandas, and scipy are not installed.

… (P2)

_v_starts() eagerly computed the inverse-variance and univariate-fit heuristic
candidates (the latter = O(k) inner Frank-Wolfe solves) before truncating to
n_starts, so n_starts=1 still paid the univariate loop. Generate candidates lazily
and stop once `target = max(n_starts, 1)` are collected: n_starts=1 now returns the
uniform start without the univariate loop. Candidate ORDER is unchanged, so any given
n_starts yields the same set as before (default n_starts=4 is identical — Basque Tier-2
parity preserved) — only unused work is skipped. Regression: test_n_starts_one_runs.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 0adb638c013425a092f51f1fefef076f34d4422d


Overall Assessment

✅ Looks good — the affected method is SyntheticControl (classic SCM per ADH 2010 / AG 2003 / Synth-pinned numerics), and I did not find a new unmitigated P0/P1 issue in the changed SCM paths. The prior methodology-documentation concern remains resolved. One non-blocking P2 performance issue remains, plus one P3 tech-debt item that is already tracked in TODO.md.

Executive Summary

  • Affected method: SyntheticControl in diff_diff/synthetic_control.py with SyntheticControlResults in diff_diff/synthetic_control_results.py.
  • Re-review outcome: the previously raised methodology-doc alignment issue stays resolved. The equal-weight K_m restriction and the flat-pre-path poor-fit warning are now documented consistently with the implementation in docs/methodology/REGISTRY.md:L1978-L1985, docs/api/synthetic_control.rst:L107-L119, diff_diff/synthetic_control.py:L43-L47, and diff_diff/synthetic_control.py:L427-L434.
  • Core SCM contracts match the registry and local paper review: single treated unit / never-treated donors, absorbing contiguous-suffix post window with no-anticipation cross-checks, nested/custom diagonal V, and NaN analytical inference via safe_inference() in diff_diff/synthetic_control.py:L452-L453 and diff_diff/utils.py:L177-L211.
  • The previous _v_starts() eager-work issue is fixed: candidate generation is now lazy, so n_starts=1 no longer pays for the univariate-start loop (diff_diff/synthetic_control.py:L947-L989; regression coverage at tests/test_methodology_synthetic_control.py:L317-L323).
  • P2: the “inverse-variance” heuristic start is built from already-standardized predictors, so in the normal nondegenerate case it collapses to the same seed as the uniform start (diff_diff/synthetic_control.py:L383-L418, diff_diff/synthetic_control.py:L852-L877, diff_diff/synthetic_control.py:L956-L964).
  • The practitioner / DiagnosticReport / BusinessReport routing gap is explicitly tracked in TODO.md:L85, so it is informational only.

Methodology

  • P3 Informational — No unmitigated methodology defect found in the changed estimator. Impact: none. Concrete fix: none. The implementation and docs now agree on the documented deviations around equal-weight predictor aggregation and the broader poor-fit warning contract in docs/methodology/REGISTRY.md:L1978-L1985, docs/api/synthetic_control.rst:L107-L119, diff_diff/synthetic_control.py:L43-L47, and diff_diff/synthetic_control.py:L427-L434.

Code Quality

  • No findings.

Performance

  • P2 [Newly identified] _v_starts() computes its “inverse-variance” heuristic from the already row-standardized predictors, so for any nonzero-variance predictor row the variance is 1 by construction and the resulting theta is the same zero-vector as the uniform start. Impact: n_starts=2 effectively runs the same Nelder-Mead seed twice, and larger n_starts values get less search diversity than the parameter name implies, while still paying for the extra optimizer call. Concrete fix: build that heuristic from the unstandardized X1/X0 instead of X1s/X0s, or deduplicate candidate theta vectors before the multistart loop. Relevant code: diff_diff/synthetic_control.py:L383-L418, diff_diff/synthetic_control.py:L852-L877, diff_diff/synthetic_control.py:L956-L964.

Maintainability

  • No findings.

Tech Debt

  • P3 Informational — SyntheticControlResults is not yet wired into practitioner/report routing, but that limitation is explicitly tracked in TODO.md:L85. Impact: SCM outputs routed through those surfaces still get generic parallel-trends / HonestDiD guidance rather than SCM-native diagnostics. Concrete fix: none for this PR; the TODO tracking is sufficient.

Security

  • No findings.

Documentation/Tests

  • No PR-specific findings.
  • Static review only: I could not execute tests/test_methodology_synthetic_control.py in this environment because pytest is unavailable. The added coverage is otherwise strong on paper: validation gates, NaN inference, degenerate paths, and Basque R-parity are exercised in tests/test_methodology_synthetic_control.py:L94-L244, tests/test_methodology_synthetic_control.py:L273-L323, tests/test_methodology_synthetic_control.py:L411-L444, and tests/test_methodology_synthetic_control.py:L610-L721.

…start + distinct seeds (P2)

The inverse-variance heuristic start was computed from the already-standardized
predictors, where every row variance is ~1, so its theta collapsed to the same
zero-vector as the uniform start — running the same Nelder-Mead seed twice and giving
less search diversity than n_starts implies. Now:
- compute the inverse-variance start from the UNSTANDARDIZED X1/X0 (threaded through
  _outer_solve_V → _v_starts), so it is a genuinely different seed on real-scale data;
- de-duplicate candidates (_add_unique, atol=1e-9) so the multistart never runs the
  same seed twice — a collapsed heuristic slot is refilled by a fresh Dirichlet draw.
Default n_starts=4 still yields 4 distinct seeds; Basque Tier-2 parity preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 3b98d282481db94dc0be656d538e645d30862b49


Overall Assessment

⚠️ Needs changes — the affected method is SyntheticControl, and the ADH 2010 / Registry contracts are mostly implemented correctly, but the nested V path still suppresses inner Frank-Wolfe non-convergence during candidate evaluation and can therefore choose V from truncated W*(V) solves with no warning.

Executive Summary

  • Affected method: SyntheticControl in diff_diff/synthetic_control.py, with SyntheticControlResults in diff_diff/synthetic_control_results.py.
  • The core SCM contracts match the Methodology Registry: one treated unit, never-treated donors, absorbing/no-anticipation post window checks, predictor construction as covariates plus linear pre-period outcome combinations, and no analytical SE with NaN inference via safe_inference() (docs/methodology/REGISTRY.md:L1964-L1985, diff_diff/synthetic_control.py:L319-L455, diff_diff/utils.py:L177-L211).
  • P1 [Newly identified]: the nested V search ignores the inner-solver convergence flag on intermediate candidate evaluations, so the outer objective can silently rank V values using non-optimal donor weights (diff_diff/synthetic_control.py:L882-L913, diff_diff/synthetic_control.py:L977-L1077, diff_diff/synthetic_control.py:L446-L452).
  • The previously noted _v_starts() duplicate-seed performance issue appears fixed: the inverse-variance heuristic now uses raw-scale predictors and candidate generation is lazy (diff_diff/synthetic_control.py:L960-L976).
  • The practitioner / DiagnosticReport / BusinessReport routing gap is explicitly tracked in TODO.md:L85, so it is informational only.

Methodology

  • P1 [Newly identified] Impact: ADH 2010 defines the nested objective in terms of W*(V), but _outer_solve_V() and _v_starts() ignore converged from _inner_solve_W() on trial V values; only the final re-solve is surfaced to the user. If some inner solves hit inner_max_iter, the search can silently select a suboptimal V/ATT even when the final W*(V*) happens to converge. Concrete fix: propagate an inner-nonconvergence tracker through _v_starts() and _outer_solve_V(), and do not treat non-converged objective evaluations as ordinary valid search points. At minimum emit an aggregated warning; preferably return np.inf/retry so V selection never compares truncated inner solves. Mirror the aggregated-warning pattern already used when suppressed FW non-convergence is summarized in diff_diff/synthetic_did.py:L1692-L1704. Relevant code: diff_diff/synthetic_control.py:L882-L913, diff_diff/synthetic_control.py:L979-L988, diff_diff/synthetic_control.py:L1024-L1027, diff_diff/synthetic_control.py:L1048-L1077, diff_diff/synthetic_control.py:L446-L452.
  • P3 Informational Impact: the documented deviations around equal-weight predictor aggregation, fail-closed non-finite handling, all-pre-period outer MSPE, and standardize="none" are all explicitly recorded in the Registry/API docs, so I am not treating them as defects. Concrete fix: none. References: docs/methodology/REGISTRY.md:L1978-L1985, docs/api/synthetic_control.rst:L107-L119.

Code Quality

  • No findings.

Performance

  • No findings. The prior multistart duplicate-seed inefficiency looks resolved in diff_diff/synthetic_control.py:L960-L976.

Maintainability

  • No findings.

Tech Debt

  • P3 Informational Impact: SyntheticControlResults is not yet wired into practitioner/report routing, so those surfaces still give generic parallel-trends / HonestDiD guidance for SCM outputs. This is already tracked and therefore non-blocking. Concrete fix: none for this PR. Reference: TODO.md:L85.

Security

  • No findings.

Documentation/Tests

  • No additional PR-specific findings.
  • Static review only: I could not execute tests/test_methodology_synthetic_control.py in this environment because pytest, numpy, pandas, and scipy are unavailable.

Path to Approval

  1. In diff_diff/synthetic_control.py, track inner Frank-Wolfe non-convergence across _v_starts() and _outer_solve_V(), and surface it before accepting a nested fit. Do not let the outer optimizer compare silently truncated W*(V) evaluations.
  2. Add a regression test that forces non-convergence on intermediate inner solves during nested V search and asserts the estimator warns or rejects the fit instead of silently returning a nested solution.

…e during V search (P1)

During the nested V search, _inner_solve_W's convergence flag was discarded on every
intermediate evaluation (univariate starts + objective calls), so the outer optimizer
could silently rank truncated W*(V) solves if inner solves hit inner_max_iter; only the
final re-solve was surfaced. Now _v_starts returns its inner-solve counts and
_outer_solve_V tallies intermediate non-convergence across the univariate starts AND
every objective evaluation, emitting one aggregated UserWarning when the rate exceeds 5%
(mirrors the synthetic_did.py bootstrap-FW aggregation). Healthy fits (converging inner
solves) stay silent — Basque Tier-2 unaffected. Regression:
test_inner_v_search_nonconvergence_warning.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 7c361ee6ccba2367831b01e2252cc1e06b01ec72


Overall Assessment

⚠️ Needs changes — re-review of SyntheticControl. The estimator mostly matches ADH 2010 and the Methodology Registry, but the prior nested-V non-convergence blocker is only partially fixed: low-rate inner Frank-Wolfe failures are still allowed to influence V selection as if they were valid objective evaluations.

Executive Summary

  • Affected method: SyntheticControl in diff_diff/synthetic_control.py, with SyntheticControlResults in diff_diff/synthetic_control_results.py.
  • Core SCM alignment looks good: one treated unit, never-treated donors, absorbing/no-anticipation checks, simplex donor weights, predictor construction from covariates plus pre-period outcome combinations, and NaN inference via safe_inference() match the Registry and ADH 2010 review (docs/methodology/REGISTRY.md:L1964-L1976, docs/methodology/papers/abadie-diamond-hainmueller-2010-review.md:L80-L95, diff_diff/synthetic_control.py:L319-L455).
  • P1 [Previous finding partially addressed, still open]: _v_starts() and _outer_solve_V() still score _inner_solve_W() outputs even when conv=False; the new warning only fires above a 5% failure rate, but a single bad evaluation can still change an argmin in a nested optimization (diff_diff/synthetic_control.py:L981-L997, diff_diff/synthetic_control.py:L1038-L1106).
  • The documented deviations from R / ADH are properly labeled and non-blocking: all-pre MSPE window, softmax V, fail-closed missing-data aggregation, equal-weight-only predictor ops, and standardize="none" (docs/methodology/REGISTRY.md:L1978-L1985, docs/api/synthetic_control.rst:L102-L119).
  • The practitioner / DiagnosticReport / BusinessReport routing gap is tracked in TODO.md:L85, so it is informational only.
  • Static review only: I could not run tests/test_methodology_synthetic_control.py because pytest is not installed, and runtime deps like numpy are unavailable in this environment.

Methodology

  • Severity: P1 [Previous finding partially addressed, still open]. Impact: ADH 2010 defines nested SCM as choosing V* by minimizing pre-period MSPE of the true inner solution W*(V) (docs/methodology/papers/abadie-diamond-hainmueller-2010-review.md:L80-L89, docs/methodology/REGISTRY.md:L1972-L1973). In this PR, _v_starts() still builds heuristic starts from w_i even when conv_i=False, and _outer_solve_V() still returns ordinary objective values when conv=False; warning only occurs if failures exceed 5% (diff_diff/synthetic_control.py:L981-L997, diff_diff/synthetic_control.py:L1038-L1106). That narrows the prior problem, but does not remove it: in an argmin search, unlike a bootstrap summary, a single truncated evaluation can change the selected V, donor weights, and ATT with no warning. Concrete fix: treat any conv=False inner solve during nested search as unusable for V ranking by returning np.inf, retrying with a larger budget, or otherwise discarding that candidate, and warn on any non-zero discarded count.
  • Severity: P3 Informational. Impact: the PR’s deviations around all-pre-period MSPE, softmax parameterization of V, fail-closed missing-data handling, equal-weight-only predictor operators, and standardize="none" are explicitly documented in the Registry/API docs, so they are not defects. Concrete fix: none (docs/methodology/REGISTRY.md:L1978-L1985, docs/api/synthetic_control.rst:L102-L119).

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3 Informational. Impact: SyntheticControlResults is not yet routed through practitioner/report surfaces, but that limitation is explicitly tracked and therefore non-blocking. Concrete fix: none for this PR (TODO.md:L85).

Security

  • No findings.

Documentation/Tests

  • Severity: P2. Impact: the added non-convergence regression only exercises the high-failure-rate warning branch by forcing inner_max_iter=1; it would not catch the still-open low-rate failure case from the prior review (tests/test_methodology_synthetic_control.py:L320-L322). Concrete fix: add a regression that forces only one or a small minority of nested-search inner solves to fail and asserts the estimator warns or excludes those candidates.
  • Static review only: I could not execute the tests in this environment.

Path to Approval

  1. In diff_diff/synthetic_control.py, stop treating any conv=False result from _inner_solve_W() as a valid nested-V objective point. Discard, retry, or penalize those candidates instead of letting them compete with converged solves.
  2. Change the warning logic from a bootstrap-style >5% threshold to an “any discarded candidate” signal for nested V selection.
  3. Add a regression test covering the low-rate case: exactly one or a small minority of nested-search inner solves fail, and the estimator must not silently accept those evaluations as ordinary objective values.

…es from V argmin (P1)

Strengthen the prior fix: a non-converged inner Frank-Wolfe solve during the nested V
search is now EXCLUDED from V ranking (not merely warned above a 5% rate) — in an argmin
search even one truncated W*(V) could win and silently flip the selected V. The objective
returns a large FINITE penalty (10×(max single-donor vertex MSPE)+1, which dominates any
feasible objective value since the objective is convex in w → max at a simplex vertex) so
that candidate can never be chosen; a non-converged univariate-start solve gets inf MSPE
(→ zero heuristic weight). A finite penalty is used instead of np.inf because inf floods
scipy's Nelder-Mead/Powell simplex arithmetic with RuntimeWarnings (and makes it churn).
Warn on ANY non-zero non-convergence count (no rate threshold). Healthy fits unaffected
(Basque Tier-2 unchanged; warning count back to baseline).

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 08957d636a32c404b8b459c48ae695a27a440fd8


Overall Assessment

✅ Looks good — re-review of SyntheticControl. The prior P1 about nested-V ranking on non-converged inner solves appears resolved; I found one remaining non-blocking test gap.

Executive Summary

  • Affected method: SyntheticControl / SyntheticControlResults, implementing classic SCM from Abadie-Diamond-Hainmueller (2010) and Abadie-Gardeazabal (2003).
  • Previous blocker resolved: non-converged inner solves no longer compete in nested V selection; _v_starts() drops them from heuristic seeding and _outer_solve_V() penalizes them out of the argmin while warning on any occurrence (diff_diff/synthetic_control.py:L981-L1001, diff_diff/synthetic_control.py:L1037-L1127).
  • Core methodology matches the Registry: one treated unit, never-treated donors, absorbing/no-anticipation checks, simplex donor weights, pre-period outcome MSPE for nested V, and no analytical SE (docs/methodology/REGISTRY.md:L1964-L1976, diff_diff/synthetic_control.py:L319-L455).
  • The documented deviations from R are properly labeled and therefore non-blocking (docs/methodology/REGISTRY.md:L1978-L1985, docs/api/synthetic_control.rst:L107-L119).
  • Non-blocking P2: the new nested-nonconvergence regression still only exercises the blanket-failure warning path, not the specific low-rate exclusion behavior from the prior review (tests/test_methodology_synthetic_control.py:L317-L322).
  • The practitioner/report-routing gap is explicitly tracked in TODO.md, so it is informational only (TODO.md:L85).

Methodology

No unmitigated findings. The implementation now adheres to the Registry contracts for treatment structure, donor composition, weighting, nested V selection, and NaN-only inference (diff_diff/synthetic_control.py:L319-L455, diff_diff/synthetic_control.py:L981-L1127). The remaining departures from R are explicitly documented under **Note:** / **Deviation from R:**, so they are P3 informational rather than defects (docs/methodology/REGISTRY.md:L1978-L1985, docs/api/synthetic_control.rst:L107-L119).

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

  • Severity: P3. Impact: SyntheticControlResults is still not routed through practitioner / DiagnosticReport / BusinessReport, but that limitation is explicitly tracked and therefore non-blocking for this PR (TODO.md:L85). Concrete fix: none in PR-1; handle in the planned PR-2 routing/placebo-inference follow-up.

Security

No findings.

Documentation/Tests

  • Severity: P2. Impact: tests/test_methodology_synthetic_control.py:L317-L322 only forces inner_max_iter=1 and asserts the aggregated warning. It does not directly lock in the prior-review fix that a single low-rate conv=False inner solve must be excluded from nested-V ranking, not merely warned about. A future regression toward thresholded warning/exclusion could pass this test. Concrete fix: add a deterministic low-rate regression, ideally by monkeypatching _inner_solve_W() so exactly one chosen nested evaluation returns conv=False, then assert both the any-occurrence warning and that the selected V/weights reflect candidate exclusion.
  • Static review only: I could not run the suite in this environment because required runtime dependencies are missing (numpy import fails before package initialization).

…sion regression (P2)

Add test_single_inner_nonconvergence_excluded_from_v_ranking: monkeypatch
_inner_solve_W so exactly ONE objective evaluation (the uniform-start eval) reports
conv=False, then assert (a) the any-occurrence "during nested V selection" warning fires
and (b) the selected V is a genuine small-MSPE fit (res.mspe_v < 1.0, not the large
penalty) — i.e. the truncated candidate was EXCLUDED from the argmin, not merely warned.
Complements the blanket-failure (inner_max_iter=1) test. Test-only change.

(Reaches the module via importlib since the diff_diff.synthetic_control attribute is the
convenience function, which shadows the submodule — same pattern as diff_diff.trop.)

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: fca42fe746d4bd73d8244a8ae34583baa0b64925


Overall Assessment

✅ Looks good — re-review of SyntheticControl. I found no unmitigated P0/P1 issues.

Executive Summary

Methodology

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3
  • Impact: SyntheticControlResults is not yet wired into practitioner / DiagnosticReport / BusinessReport, but that limitation is explicitly tracked and therefore mitigated for this PR.
  • Concrete fix: None in PR-1; handle in the planned PR-2 follow-up noted in TODO.md:L85-L85.

Security

  • No findings.

Documentation/Tests

  • No findings on the changed docs/tests. The prior re-review concern about low-rate inner-solve exclusion is now directly covered by tests/test_methodology_synthetic_control.py:L325-L357.
  • Static-review note: I could not execute the test file in this environment because numpy is not installed and pytest is unavailable.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 30, 2026
@igerber igerber merged commit 4bf60fa into main May 31, 2026
33 of 34 checks passed
@igerber igerber deleted the feature/synthetic-control branch May 31, 2026 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant