Add SyntheticControl estimator (classic SCM, Abadie-Diamond-Hainmueller 2010) — PR-1 core by igerber · Pull Request #501 · igerber/diff-diff

igerber · 2026-05-30T21:54:48Z

Summary

Add SyntheticControl — the classic Synthetic Control Method of Abadie, Diamond & Hainmueller (2010) (originating in Abadie & Gardeazabal 2003) — as a standalone estimator (diff_diff/synthetic_control.py) with a separate SyntheticControlResults container (diff_diff/synthetic_control_results.py) and a synthetic_control() convenience function. Donor (unit) weights only — no time weights or ridge — distinct from SyntheticDiD.
Inner simplex-constrained weighted-LS solve W*(V) reuses utils._sc_weight_fw (folding V^½ into the predictor matrix, intercept=False, zeta=0, noise-scaled min_decrease). The diagonal predictor-importance matrix V is selected data-driven (v_method="nested": softmax-on-simplex multistart Nelder-Mead + derivative-free Powell polish, minimizing pre-period outcome MSPE) or user-supplied (v_method="custom"). Per-row standardization (SD over donors+treated, ddof=1) matches the R Synth::synth source.
Reports the gap path α̂_1t = Y_1t − Σ_j w_j Y_jt, att (mean post-period gap), pre_rmspe, donor weights, v_weights, and a predictor-balance table. No analytical standard error — se/t_stat/p_value/conf_int are NaN (in-space placebo permutation inference is a planned follow-up; _placebo_gaps/_rmspe_ratio/_fit_snapshot are reserved on the results object).
Fail-closed validation gates: predictor-period leakage, absorbing post-period suffix + no-anticipation/uninterrupted-exposure cross-check against the treatment column, post-period canonicalization, donor-pool filtering, empty/duplicate/reordered predictor windows, non-finite predictor/outcome data, NaN/non-binary treatment, duplicate predictor labels, inner-solve non-convergence, outer-V non-convergence, order-independent gap path, custom_v cross-field rules, and degenerate single-donor / single-pre-period handling.
Docs: docs/methodology/REGISTRY.md §SyntheticControl (with **Deviation from R:** / **Note:** labels), docs/api/synthetic_control.rst + autosummary stubs + index toctree, diff_diff/guides/llms.txt (estimator count 17→18) + llms-full.txt, README.md catalog row, docs/doc-deps.yaml, CHANGELOG.md.

Methodology references (required if estimator / math changes)

Method name(s): Classic Synthetic Control Method — donor-weight counterfactual with a nested/custom diagonal predictor-importance matrix V.
Paper / source link(s): Abadie, Diamond & Hainmueller (2010), JASA 105(490):493–505, https://doi.org/10.1198/jasa.2009.ap08746; Abadie & Gardeazabal (2003). Reviews on file under docs/methodology/papers/ (merged in synthetic control: PR-A paper reviews (ADH 2010/2015, Abadie 2021 JEL, CWZ 2021) #497). Inner solve / standardization / solution.v scaled-space behavior verified against the R Synth::synth source.
Intentional deviations from the source (all labeled in REGISTRY §SyntheticControl): standardization divisor pinned from the Synth source (ADH 2010 defers it to AG 2003 App. B); outer objective minimizes pre-period MSPE over all pre periods, not R's time.optimize.ssr window (so the nested V differs by an efficiency-only choice → Tier-2 parity is a band); softmax-on-simplex V parametrization; predictor/outcome aggregation fails closed on non-finite cells whereas R uses na.rm=TRUE (deliberate — na-dropping silently aggregates incomparable period subsets); aggregation restricted to linear combinations {mean, sum} (no median); standardize="none" option; 1×SD poor-fit threshold (defensive, matches SyntheticDiD). No analytical SE (NaN inference); placebo inference deferred.

Validation

Tests added/updated: tests/test_methodology_synthetic_control.py — 43 tests covering the 10 validation gates, custom_v cross-field rules, J==1 / T0==0 / T0==1 degeneracies, the NaN-inference contract, get_params↔set_params round-trip, the estimators re-export surface, and two-tier R-Synth parity.
R parity: reproduces the published Basque result (Cataluña 0.851 + Madrid 0.149, loss.v 0.0089). Tier-1 feeds R's solution.v via custom_v → donor weights match to atol=1e-3 (observed ~2e-5), deterministic/optimizer-independent; Tier-2 (@pytest.mark.slow) checks the nested fit in a band. Golden fixtures live in tests/data/ (so Tier-1 runs in isolated-install CI without R), generated by benchmarks/R/generate_synth_basque_golden.R; "Synth" added to benchmarks/R/requirements.R.
Backtest / simulation / notebook evidence: N/A (tutorial deferred to a later PR).
Deferred (tracked in TODO.md): wiring SyntheticControlResults into the practitioner / DiagnosticReport / BusinessReport routing → PR-2, where it pairs with the placebo-inference layer those reports would surface.

Security / privacy

Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

Add the classic Synthetic Control Method of Abadie, Diamond & Hainmueller (2010) as a standalone `SyntheticControl` estimator with a separate `SyntheticControlResults` container and a `synthetic_control()` convenience function, distinct from `SyntheticDiD` (donor weights only; no time weights or ridge). - Inner simplex solve W*(V) reuses utils._sc_weight_fw (V^½ folded into the predictor matrix, intercept=False, zeta=0, noise-scaled min_decrease). - Diagonal predictor-importance V selected data-driven (nested: softmax-on- simplex multistart Nelder-Mead + Powell polish, pre-period outcome MSPE) or user-supplied (custom_v). Per-row standardization (SD over donors+treated, ddof=1) matches the R Synth source; solution.v lives in the scaled space. - Reports gap path, att (mean post gap), pre_rmspe, donor_weights, v_weights, predictor_balance. No analytical SE — inference fields are NaN (placebo permutation inference reserved for a follow-up; _placebo_gaps/_rmspe_ratio/ _fit_snapshot reserved on the results object). - 10 validation gates: predictor-period leakage, absorbing post-suffix + no-anticipation cross-check, post canonicalization, donor filtering, empty windows, poor-fit warning, duplicate labels, inner non-convergence warning, order-independent gap path, standardize="none" deviation; plus fail-closed custom_v cross-field rules and J==1 / T0 degenerate handling. R-Synth parity (tests/test_methodology_synthetic_control.py; goldens generated by benchmarks/R/generate_synth_basque_golden.R into tests/data/): two-tier on the Basque study — Tier-1 feeds R's solution.v via custom_v and reproduces the published donor weights (region 10 0.851 + region 14 0.149) to atol=1e-3 deterministically; Tier-2 (@slow) checks the nested fit in a tolerance band (the nested V differs because the outer objective uses all pre periods, not R's time.optimize.ssr window). Docs: REGISTRY §SyntheticControl (deviation/Note labels), docs/api/ synthetic_control.rst + autosummary + index toctree, llms.txt (count 17→18) + llms-full.txt sections, README catalog row, doc-deps.yaml, CHANGELOG. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

…tes (2×P1) - Validate the treatment indicator on the FULL input BEFORE _resolve_treated_and_donors, so a non-{0,1} code fails closed instead of silently dropping a unit from both the treated and donor sets (which would quietly change the donor pool / weights / ATT). (codex R1 P1) - _resolve_periods now asserts D==1 in every post period (uninterrupted exposure) in addition to D==0 in every pre period (no anticipation), on both the inferred and explicit branches — an explicit suffix over a 0,0,1,0 path no longer averages a treated period with an untreated one. (codex R1 P1) - Add regressions: test_non_binary_treatment_rejected, test_untreated_period_in_post_rejected. (codex R1 P2) Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

…abel/mspe_v fixes - Fail closed on non-finite (NaN/inf) values in the data that actually enters the SCM matching problem: the treated/donor outcome panel (gap path / ATT / nested objective) and every selected predictor block (predictors over predictor_window, special_predictors, pre_period_outcomes). Previously NaN propagated silently to finite-looking uniform weights (e.g. a docs-style predictors=['school.high'] over the default all-pre window on the Basque fixture's late-start covariates). (codex R2 P1) - special_predictors label now uses the full ordered period list, so distinct sets sharing endpoints+length (e.g. [2000,2002,2004] vs [2000,2003,2004]) no longer collide in the duplicate guard / v_weights key. (codex R2 P2) - mspe_v (the OUTER-objective value) is now None on the custom and single-donor paths, matching its documented "nested-only" contract. (codex R2 P2) - Docs examples (api rst + llms-full) set predictor_window explicitly to steer away from the all-pre default on late-start covariates. (codex R2 P2) - Regressions: test_non_finite_predictor_rejected, test_non_finite_outcome_rejected, test_distinct_special_period_sets_not_duplicate. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

…s re-export, doc wording - Reject missing (NaN) values in treatment/unit/time up front in fit(), before donor classification: a partially-missing treatment history would otherwise be silently classified as never-treated by groupby(...).max() (NaN dropped), changing the donor pool / weights / ATT with no warning. (codex R3 P1) - Re-export SyntheticControl from diff_diff.estimators (mirrors the SyntheticDiD / TwoWayFixedEffects backward-compat re-export contract + __all__). (codex R3 P1) - Reword the short README / llms.txt catalog descriptions: "no inference in this release (inference fields are NaN — permutation/placebo planned)" instead of "permutation-only inference", matching the api/REGISTRY docs. (codex R3 P3) - Regressions: test_missing_treatment_value_rejected, test_estimators_module_reexport. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

… on outer-V non-convergence - Document the predictor/outcome missing-data contract as an explicit **Deviation from R:** in REGISTRY: aggregation fails closed on any non-finite cell, whereas R Synth::dataprep uses na.rm=TRUE. The fail-closed choice is deliberate (na-dropping silently aggregates different period subsets across units → incomparable predictors). Parity wording tightened to "row ORDER matches dataprep" across REGISTRY / api rst / _validate_predictors docstring. (codex R4 P1) - _outer_solve_V now tracks OptimizeResult.success across the multistart Nelder-Mead runs and the Powell polish, and emits a UserWarning when neither converged (previously only the inner Frank-Wolfe solve surfaced non-convergence). (codex R4 P2) - Regressions: test_all_na_predictor_window_rejected (documents partial==full fail-closed under the no-na.rm contract), test_outer_v_nonconvergence_warning. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

…onicalize predictor periods - Restrict predictor/special aggregation ops to {mean, sum} — both LINEAR combinations per ADH 2010 §2.3 (Ȳ = Σ k_s Y_is). median (non-linear) removed from _OP_FUNCS, the _PredictorSpec comment, the fit docstring, and llms-full; no deviation note needed since mean/sum are the documented linear forms. (codex R5 P1) - Canonicalize predictor_window / special_predictors / pre_period_outcomes period lists to unique + calendar-sorted (_canon) before labeling, mirroring the post_periods treatment: reordered [c,b,a]==[a,b,c] (so the duplicate-label guard catches them) and a repeated period no longer re-weights a "mean". (codex R5 P2) - Regressions: test_median_op_rejected, test_reordered_special_periods_are_duplicates, test_duplicate_predictor_window_periods_deduped. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

…ing-stack wiring (TODO) - Clarify the mspe_v docstring: populated only when the nested outer search runs (None on both the custom and degenerate single-donor paths). (codex R6 P3) - Defer the reporting-stack wiring P2 to PR-2 via a TODO.md row: wiring SyntheticControlResults into practitioner / DiagnosticReport / BusinessReport routing (so SCM results don't get generic PT/HonestDiD guidance) pairs naturally with the PR-2 placebo-inference layer those reports would surface. (codex R6 P2) Codex R6 verdict: ✅ no P0/P1. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

github-actions · 2026-05-30T22:01:10Z

Overall Assessment
⚠️ Needs changes — one unmitigated P1 methodology/documentation mismatch remains in SyntheticControl.

Executive Summary

Affected method: SyntheticControl (classic SCM from Abadie, Diamond & Hainmueller 2010; AG 2003 / Synth numerics for the nested V path).
The core SCM math, donor-pool logic, absorbing-treatment checks, fail-closed missing-data handling, and NaN inference contract are consistent with the new registry entry.
P1: the implementation only supports mean/sum/identity predictor aggregations, but the registry/source still describe general K_m linear combinations; because that restriction is not labeled in REGISTRY.md, it is an undocumented methodology deviation.
The missing practitioner / diagnostic / business-report routing is properly tracked in TODO.md, so it is informational only.
I could not execute the new tests in this environment because pytest, numpy, and pandas are unavailable.

Methodology

P1 The implementation narrows ADH’s predictor-construction surface to equal-weight mean/sum/identity operators, while the methodology docs still describe the source method’s general Ȳ_i^{K_m} = Σ_s k_s Y_is linear-combination setup. The restriction is explicit in code but not called out as a **Note:** / **Deviation from R:** in the registry or API docs: diff_diff/synthetic_control.py:L43-L47, diff_diff/synthetic_control.py:L262-L264, diff_diff/synthetic_control.py:L762-L785, docs/methodology/REGISTRY.md:L1971-L1990, docs/methodology/papers/abadie-diamond-hainmueller-2010-review.md:L75-L79, docs/api/synthetic_control.rst:L94-L100. Impact: users cannot reproduce valid SCM specifications that require non-uniform K_m weights, while the methodology surface currently implies broader ADH / R-Synth support. Concrete fix: either document this restriction in docs/methodology/REGISTRY.md with a **Note:** or **Deviation from R:** label and mirror it in docs/api/synthetic_control.rst, or extend the API to accept explicit weighted K_m constructions and add a regression/parity test for a non-uniform case.

Code Quality

No findings. Impact: none. Concrete fix: none.

Performance

No findings. Impact: none. Concrete fix: none.

Maintainability

No findings beyond the tracked routing TODO below. Impact: none. Concrete fix: none.

Tech Debt

P3 The PR correctly tracks the missing SCM routing in practitioner/reporting surfaces in TODO.md:L85, so I would not block on it. Impact: SyntheticControlResults passed through Practitioner / DiagnosticReport / BusinessReport will still receive generic parallel-trends / HonestDiD messaging. Concrete fix: none for this PR; the TODO is the right place to add SCM-native diagnostics alongside placebo inference.

Security

No findings. Impact: none. Concrete fix: none.

Documentation/Tests

No additional PR-specific findings beyond the methodology doc mismatch above. Impact: none. Concrete fix: none.
Verification note: I could not run tests/test_methodology_synthetic_control.py here because the review environment does not have pytest, numpy, or pandas.

Path to Approval

Add a **Note:** or **Deviation from R:** in docs/methodology/REGISTRY.md explicitly stating that PR-1 supports only equal-weight mean/sum/identity predictor constructions, not arbitrary weighted K_m combinations, and mirror that restriction in docs/api/synthetic_control.rst. If full ADH surface is intended instead, implement explicit weighted K_m support and cover it with a methodology test.

…tor restriction (P1) Add a **Note:** in REGISTRY §SyntheticControl (mirrored in docs/api/synthetic_control.rst) that predictor rows support only EQUAL-WEIGHT linear combinations — mean (k_s=1/T0), sum (k_s=1), and per-period outcome lags (identity) — and NOT ADH (2010) §2.3's general arbitrary-weight form Ȳ = Σ_s k_s Y_is (nor non-linear ops like median). The supported set still spans standard Synth::dataprep predictors.op + special.predictors usage; arbitrary-weight K_m is a deferred extension. Documents the restriction introduced when median was dropped, so it is no longer an undocumented methodology deviation. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

github-actions · 2026-05-30T22:08:07Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 536aeb5a2a7163efad1993250fc5e7d906b2f22e

Overall Assessment

✅ Looks good — the previous P1 methodology documentation gap is resolved, and I did not find a new unmitigated P0/P1 issue in the changed SCM code. One P2 edge-case warning gap remains, plus a tracked P3 tech-debt item.

Executive Summary

Affected method: SyntheticControl (classic SCM; ADH 2010, with AG 2003 / Synth-pinned numerics).
Re-review outcome: the prior P1 is resolved. The equal-weight predictor restriction is now explicitly documented in docs/methodology/REGISTRY.md:L1978-L1985 and docs/api/synthetic_control.rst:L107-L119.
Core SCM behavior is aligned with the registry on donor-pool construction, absorbing/no-anticipation validation, nested/custom V, and the NaN inference contract.
P2: the documented poor-fit warning can be skipped when the treated unit’s pre-period outcome path is flat (SD=0) even if pre-period RMSPE is nonzero.
The missing practitioner/report routing is properly tracked in TODO.md:L85, so it is informational only.
Verification note: I could not run tests/test_methodology_synthetic_control.py here because pytest, numpy, pandas, and scipy are unavailable.

Methodology

P2 The poor-fit diagnostic in diff_diff/synthetic_control.py:L427-L437 does not fire when the treated unit’s pre-treatment path has zero sample SD, because it is gated by pre_sd > 0. That misses cases where the registry’s documented warning condition still holds conceptually: “warn when pre-period RMSPE exceeds the SD of the treated unit’s pre-period outcomes” in docs/methodology/REGISTRY.md:L1964-L1968. Impact: an obviously poor SCM pre-fit on a flat treated pre-path can pass without the library’s intended fit warning. Concrete fix: warn whenever pre_rmspe > pre_sd, including the pre_sd == 0 case, and add a regression test for a constant treated pre-period series with nonzero pre-gap; the current warning test only covers a non-flat path in tests/test_methodology_synthetic_control.py:L402-L416.
P3 [Resolved previous finding] The equal-weight K_m restriction is now documented consistently between implementation and docs: diff_diff/synthetic_control.py:L43-L47, docs/methodology/REGISTRY.md:L1978-L1985, and docs/api/synthetic_control.rst:L107-L119. Impact: none. Concrete fix: none.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

P3 The missing SCM-native routing in practitioner/report surfaces is explicitly tracked in TODO.md:L85. Impact: those tools still fall back to generic parallel-trends/HonestDiD guidance for SyntheticControlResults. Concrete fix: none for this PR; the TODO is sufficient.

Security

No findings.

Documentation/Tests

No additional PR-specific findings.
Verification note: I could not execute tests/test_methodology_synthetic_control.py in this environment because pytest, numpy, pandas, and scipy are not installed.

…at pre-path (P2) The poor-fit warning was gated by `pre_sd > 0`, so a FLAT treated pre-period path (SD == 0) with nonzero pre-RMSPE never warned even though the synthetic clearly fails to reproduce a constant series. Change the gate to the literal REGISTRY contract (warn when pre_rmspe > pre_sd), including the SD == 0 case, with a scale-aware absolute floor (1e-8 * max(|Z1|, 1)) so a near-perfect flat fit (RMSPE ~ roundoff) does not spuriously warn. REGISTRY poor-fit Note updated to document the flat-path behavior (slightly broader than SyntheticDiD's SD>0-gated form). Regression: test_poor_fit_warning_flat_treated_pre_path. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

github-actions · 2026-05-30T22:15:53Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 95cd4f9efa00cdee700b92bb65738eacc7ca717a

Overall Assessment

✅ Looks good — the affected method is SyntheticControl (classic SCM per ADH 2010 / AG 2003 / Synth-pinned numerics), and I did not find a new unmitigated P0/P1 issue in the changed SCM paths. The prior methodology documentation gap is resolved. One P2 performance issue remains, plus a tracked P3 tech-debt item.

Executive Summary

Affected method: SyntheticControl in diff_diff/synthetic_control.py, with SyntheticControlResults in diff_diff/synthetic_control_results.py.
Re-review outcome: the prior P1 methodology documentation gap is resolved. The equal-weight K_m restriction and the flat-pre-path poor-fit warning behavior are now documented consistently with the implementation in docs/methodology/REGISTRY.md:L1978-L1985, docs/api/synthetic_control.rst:L107-L119, diff_diff/synthetic_control.py:L43-L47, and diff_diff/synthetic_control.py:L427-L434.
Core SCM contracts match the registry and local ADH 2010 review: single treated unit + never-treated donors, absorbing/no-anticipation checks, nested/custom V, and NaN analytical inference via safe_inference() in diff_diff/synthetic_control.py:L452-L453.
P2: _v_starts() still computes expensive heuristic starts even when n_starts=1, so the single-start path pays O(k) extra inner solves before optimization in diff_diff/synthetic_control.py:L914-L980.
The missing practitioner / DiagnosticReport / BusinessReport routing is properly tracked in TODO.md:L85, so it is informational only.
Verification note: I could not run tests/test_methodology_synthetic_control.py here because pytest, numpy, pandas, and scipy are unavailable in this environment.

Methodology

Affected method: SyntheticControl (classic SCM).

P3 Informational — No unmitigated methodology defect found in the changed estimator. The previous re-review issue is resolved: the implementation’s equal-weight-only predictor construction and flat-pre-path poor-fit warning now match the documented REGISTRY/API deviations in diff_diff/synthetic_control.py:L43-L47, diff_diff/synthetic_control.py:L427-L434, docs/methodology/REGISTRY.md:L1978-L1985, docs/api/synthetic_control.rst:L107-L119, and the regression coverage in tests/test_methodology_synthetic_control.py:L419-L435. Impact: none. Concrete fix: none.

Code Quality

No findings.

Performance

P2 _v_starts() eagerly computes the inverse-variance and univariate-fit heuristic candidates before truncating to n_starts, so n_starts=1 still incurs the univariate loop’s k inner Frank-Wolfe solves, and small n_starts values can still do unused candidate work in diff_diff/synthetic_control.py:L914-L980. Impact: the multistart knob does not actually bound startup cost on low-start runs, which can make high-k SCM fits materially slower than callers expect. Concrete fix: short-circuit _v_starts() when n_starts <= 1, and more generally stop generating heuristic/random candidates once len(candidates) == n_starts.

Maintainability

No findings.

Tech Debt

P3 The SCM-native practitioner/report routing gap is explicitly tracked in TODO.md:L85. Impact: routing SyntheticControlResults through those surfaces still yields generic parallel-trends/HonestDiD guidance rather than SCM-specific diagnostics. Concrete fix: none for this PR; the TODO tracking is sufficient.

Security

No findings.

Documentation/Tests

No PR-specific findings.
Verification note: static review only; I could not execute tests/test_methodology_synthetic_control.py in this environment because pytest, numpy, pandas, and scipy are not installed.

… (P2) _v_starts() eagerly computed the inverse-variance and univariate-fit heuristic candidates (the latter = O(k) inner Frank-Wolfe solves) before truncating to n_starts, so n_starts=1 still paid the univariate loop. Generate candidates lazily and stop once `target = max(n_starts, 1)` are collected: n_starts=1 now returns the uniform start without the univariate loop. Candidate ORDER is unchanged, so any given n_starts yields the same set as before (default n_starts=4 is identical — Basque Tier-2 parity preserved) — only unused work is skipped. Regression: test_n_starts_one_runs. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

github-actions · 2026-05-30T22:22:59Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 0adb638c013425a092f51f1fefef076f34d4422d

Overall Assessment

✅ Looks good — the affected method is SyntheticControl (classic SCM per ADH 2010 / AG 2003 / Synth-pinned numerics), and I did not find a new unmitigated P0/P1 issue in the changed SCM paths. The prior methodology-documentation concern remains resolved. One non-blocking P2 performance issue remains, plus one P3 tech-debt item that is already tracked in TODO.md.

Executive Summary

Affected method: SyntheticControl in diff_diff/synthetic_control.py with SyntheticControlResults in diff_diff/synthetic_control_results.py.
Re-review outcome: the previously raised methodology-doc alignment issue stays resolved. The equal-weight K_m restriction and the flat-pre-path poor-fit warning are now documented consistently with the implementation in docs/methodology/REGISTRY.md:L1978-L1985, docs/api/synthetic_control.rst:L107-L119, diff_diff/synthetic_control.py:L43-L47, and diff_diff/synthetic_control.py:L427-L434.
Core SCM contracts match the registry and local paper review: single treated unit / never-treated donors, absorbing contiguous-suffix post window with no-anticipation cross-checks, nested/custom diagonal V, and NaN analytical inference via safe_inference() in diff_diff/synthetic_control.py:L452-L453 and diff_diff/utils.py:L177-L211.
The previous _v_starts() eager-work issue is fixed: candidate generation is now lazy, so n_starts=1 no longer pays for the univariate-start loop (diff_diff/synthetic_control.py:L947-L989; regression coverage at tests/test_methodology_synthetic_control.py:L317-L323).
P2: the “inverse-variance” heuristic start is built from already-standardized predictors, so in the normal nondegenerate case it collapses to the same seed as the uniform start (diff_diff/synthetic_control.py:L383-L418, diff_diff/synthetic_control.py:L852-L877, diff_diff/synthetic_control.py:L956-L964).
The practitioner / DiagnosticReport / BusinessReport routing gap is explicitly tracked in TODO.md:L85, so it is informational only.

Methodology

P3 Informational — No unmitigated methodology defect found in the changed estimator. Impact: none. Concrete fix: none. The implementation and docs now agree on the documented deviations around equal-weight predictor aggregation and the broader poor-fit warning contract in docs/methodology/REGISTRY.md:L1978-L1985, docs/api/synthetic_control.rst:L107-L119, diff_diff/synthetic_control.py:L43-L47, and diff_diff/synthetic_control.py:L427-L434.

Code Quality

No findings.

Performance

P2 [Newly identified] _v_starts() computes its “inverse-variance” heuristic from the already row-standardized predictors, so for any nonzero-variance predictor row the variance is 1 by construction and the resulting theta is the same zero-vector as the uniform start. Impact: n_starts=2 effectively runs the same Nelder-Mead seed twice, and larger n_starts values get less search diversity than the parameter name implies, while still paying for the extra optimizer call. Concrete fix: build that heuristic from the unstandardized X1/X0 instead of X1s/X0s, or deduplicate candidate theta vectors before the multistart loop. Relevant code: diff_diff/synthetic_control.py:L383-L418, diff_diff/synthetic_control.py:L852-L877, diff_diff/synthetic_control.py:L956-L964.

Maintainability

No findings.

Tech Debt

P3 Informational — SyntheticControlResults is not yet wired into practitioner/report routing, but that limitation is explicitly tracked in TODO.md:L85. Impact: SCM outputs routed through those surfaces still get generic parallel-trends / HonestDiD guidance rather than SCM-native diagnostics. Concrete fix: none for this PR; the TODO tracking is sufficient.

Security

No findings.

Documentation/Tests

No PR-specific findings.
Static review only: I could not execute tests/test_methodology_synthetic_control.py in this environment because pytest is unavailable. The added coverage is otherwise strong on paper: validation gates, NaN inference, degenerate paths, and Basque R-parity are exercised in tests/test_methodology_synthetic_control.py:L94-L244, tests/test_methodology_synthetic_control.py:L273-L323, tests/test_methodology_synthetic_control.py:L411-L444, and tests/test_methodology_synthetic_control.py:L610-L721.

…start + distinct seeds (P2) The inverse-variance heuristic start was computed from the already-standardized predictors, where every row variance is ~1, so its theta collapsed to the same zero-vector as the uniform start — running the same Nelder-Mead seed twice and giving less search diversity than n_starts implies. Now: - compute the inverse-variance start from the UNSTANDARDIZED X1/X0 (threaded through _outer_solve_V → _v_starts), so it is a genuinely different seed on real-scale data; - de-duplicate candidates (_add_unique, atol=1e-9) so the multistart never runs the same seed twice — a collapsed heuristic slot is refilled by a fresh Dirichlet draw. Default n_starts=4 still yields 4 distinct seeds; Basque Tier-2 parity preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

github-actions · 2026-05-30T22:32:27Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 3b98d282481db94dc0be656d538e645d30862b49

Overall Assessment

⚠️ Needs changes — the affected method is SyntheticControl, and the ADH 2010 / Registry contracts are mostly implemented correctly, but the nested V path still suppresses inner Frank-Wolfe non-convergence during candidate evaluation and can therefore choose V from truncated W*(V) solves with no warning.

Executive Summary

Affected method: SyntheticControl in diff_diff/synthetic_control.py, with SyntheticControlResults in diff_diff/synthetic_control_results.py.
The core SCM contracts match the Methodology Registry: one treated unit, never-treated donors, absorbing/no-anticipation post window checks, predictor construction as covariates plus linear pre-period outcome combinations, and no analytical SE with NaN inference via safe_inference() (docs/methodology/REGISTRY.md:L1964-L1985, diff_diff/synthetic_control.py:L319-L455, diff_diff/utils.py:L177-L211).
P1 [Newly identified]: the nested V search ignores the inner-solver convergence flag on intermediate candidate evaluations, so the outer objective can silently rank V values using non-optimal donor weights (diff_diff/synthetic_control.py:L882-L913, diff_diff/synthetic_control.py:L977-L1077, diff_diff/synthetic_control.py:L446-L452).
The previously noted _v_starts() duplicate-seed performance issue appears fixed: the inverse-variance heuristic now uses raw-scale predictors and candidate generation is lazy (diff_diff/synthetic_control.py:L960-L976).
The practitioner / DiagnosticReport / BusinessReport routing gap is explicitly tracked in TODO.md:L85, so it is informational only.

Methodology

P1 [Newly identified] Impact: ADH 2010 defines the nested objective in terms of W*(V), but _outer_solve_V() and _v_starts() ignore converged from _inner_solve_W() on trial V values; only the final re-solve is surfaced to the user. If some inner solves hit inner_max_iter, the search can silently select a suboptimal V/ATT even when the final W*(V*) happens to converge. Concrete fix: propagate an inner-nonconvergence tracker through _v_starts() and _outer_solve_V(), and do not treat non-converged objective evaluations as ordinary valid search points. At minimum emit an aggregated warning; preferably return np.inf/retry so V selection never compares truncated inner solves. Mirror the aggregated-warning pattern already used when suppressed FW non-convergence is summarized in diff_diff/synthetic_did.py:L1692-L1704. Relevant code: diff_diff/synthetic_control.py:L882-L913, diff_diff/synthetic_control.py:L979-L988, diff_diff/synthetic_control.py:L1024-L1027, diff_diff/synthetic_control.py:L1048-L1077, diff_diff/synthetic_control.py:L446-L452.
P3 Informational Impact: the documented deviations around equal-weight predictor aggregation, fail-closed non-finite handling, all-pre-period outer MSPE, and standardize="none" are all explicitly recorded in the Registry/API docs, so I am not treating them as defects. Concrete fix: none. References: docs/methodology/REGISTRY.md:L1978-L1985, docs/api/synthetic_control.rst:L107-L119.

Code Quality

No findings.

Performance

No findings. The prior multistart duplicate-seed inefficiency looks resolved in diff_diff/synthetic_control.py:L960-L976.

Maintainability

No findings.

Tech Debt

P3 Informational Impact: SyntheticControlResults is not yet wired into practitioner/report routing, so those surfaces still give generic parallel-trends / HonestDiD guidance for SCM outputs. This is already tracked and therefore non-blocking. Concrete fix: none for this PR. Reference: TODO.md:L85.

Security

No findings.

Documentation/Tests

No additional PR-specific findings.
Static review only: I could not execute tests/test_methodology_synthetic_control.py in this environment because pytest, numpy, pandas, and scipy are unavailable.

Path to Approval

In diff_diff/synthetic_control.py, track inner Frank-Wolfe non-convergence across _v_starts() and _outer_solve_V(), and surface it before accepting a nested fit. Do not let the outer optimizer compare silently truncated W*(V) evaluations.
Add a regression test that forces non-convergence on intermediate inner solves during nested V search and asserts the estimator warns or rejects the fit instead of silently returning a nested solution.

…e during V search (P1) During the nested V search, _inner_solve_W's convergence flag was discarded on every intermediate evaluation (univariate starts + objective calls), so the outer optimizer could silently rank truncated W*(V) solves if inner solves hit inner_max_iter; only the final re-solve was surfaced. Now _v_starts returns its inner-solve counts and _outer_solve_V tallies intermediate non-convergence across the univariate starts AND every objective evaluation, emitting one aggregated UserWarning when the rate exceeds 5% (mirrors the synthetic_did.py bootstrap-FW aggregation). Healthy fits (converging inner solves) stay silent — Basque Tier-2 unaffected. Regression: test_inner_v_search_nonconvergence_warning. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

github-actions · 2026-05-30T22:40:47Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 7c361ee6ccba2367831b01e2252cc1e06b01ec72

Overall Assessment

⚠️ Needs changes — re-review of SyntheticControl. The estimator mostly matches ADH 2010 and the Methodology Registry, but the prior nested-V non-convergence blocker is only partially fixed: low-rate inner Frank-Wolfe failures are still allowed to influence V selection as if they were valid objective evaluations.

Executive Summary

Affected method: SyntheticControl in diff_diff/synthetic_control.py, with SyntheticControlResults in diff_diff/synthetic_control_results.py.
Core SCM alignment looks good: one treated unit, never-treated donors, absorbing/no-anticipation checks, simplex donor weights, predictor construction from covariates plus pre-period outcome combinations, and NaN inference via safe_inference() match the Registry and ADH 2010 review (docs/methodology/REGISTRY.md:L1964-L1976, docs/methodology/papers/abadie-diamond-hainmueller-2010-review.md:L80-L95, diff_diff/synthetic_control.py:L319-L455).
P1 [Previous finding partially addressed, still open]: _v_starts() and _outer_solve_V() still score _inner_solve_W() outputs even when conv=False; the new warning only fires above a 5% failure rate, but a single bad evaluation can still change an argmin in a nested optimization (diff_diff/synthetic_control.py:L981-L997, diff_diff/synthetic_control.py:L1038-L1106).
The documented deviations from R / ADH are properly labeled and non-blocking: all-pre MSPE window, softmax V, fail-closed missing-data aggregation, equal-weight-only predictor ops, and standardize="none" (docs/methodology/REGISTRY.md:L1978-L1985, docs/api/synthetic_control.rst:L102-L119).
The practitioner / DiagnosticReport / BusinessReport routing gap is tracked in TODO.md:L85, so it is informational only.
Static review only: I could not run tests/test_methodology_synthetic_control.py because pytest is not installed, and runtime deps like numpy are unavailable in this environment.

Methodology

Severity: P1 [Previous finding partially addressed, still open]. Impact: ADH 2010 defines nested SCM as choosing V* by minimizing pre-period MSPE of the true inner solution W*(V) (docs/methodology/papers/abadie-diamond-hainmueller-2010-review.md:L80-L89, docs/methodology/REGISTRY.md:L1972-L1973). In this PR, _v_starts() still builds heuristic starts from w_i even when conv_i=False, and _outer_solve_V() still returns ordinary objective values when conv=False; warning only occurs if failures exceed 5% (diff_diff/synthetic_control.py:L981-L997, diff_diff/synthetic_control.py:L1038-L1106). That narrows the prior problem, but does not remove it: in an argmin search, unlike a bootstrap summary, a single truncated evaluation can change the selected V, donor weights, and ATT with no warning. Concrete fix: treat any conv=False inner solve during nested search as unusable for V ranking by returning np.inf, retrying with a larger budget, or otherwise discarding that candidate, and warn on any non-zero discarded count.
Severity: P3 Informational. Impact: the PR’s deviations around all-pre-period MSPE, softmax parameterization of V, fail-closed missing-data handling, equal-weight-only predictor operators, and standardize="none" are explicitly documented in the Registry/API docs, so they are not defects. Concrete fix: none (docs/methodology/REGISTRY.md:L1978-L1985, docs/api/synthetic_control.rst:L102-L119).

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3 Informational. Impact: SyntheticControlResults is not yet routed through practitioner/report surfaces, but that limitation is explicitly tracked and therefore non-blocking. Concrete fix: none for this PR (TODO.md:L85).

Security

No findings.

Documentation/Tests

Severity: P2. Impact: the added non-convergence regression only exercises the high-failure-rate warning branch by forcing inner_max_iter=1; it would not catch the still-open low-rate failure case from the prior review (tests/test_methodology_synthetic_control.py:L320-L322). Concrete fix: add a regression that forces only one or a small minority of nested-search inner solves to fail and asserts the estimator warns or excludes those candidates.
Static review only: I could not execute the tests in this environment.

Path to Approval

In diff_diff/synthetic_control.py, stop treating any conv=False result from _inner_solve_W() as a valid nested-V objective point. Discard, retry, or penalize those candidates instead of letting them compete with converged solves.
Change the warning logic from a bootstrap-style >5% threshold to an “any discarded candidate” signal for nested V selection.
Add a regression test covering the low-rate case: exactly one or a small minority of nested-search inner solves fail, and the estimator must not silently accept those evaluations as ordinary objective values.

…es from V argmin (P1) Strengthen the prior fix: a non-converged inner Frank-Wolfe solve during the nested V search is now EXCLUDED from V ranking (not merely warned above a 5% rate) — in an argmin search even one truncated W*(V) could win and silently flip the selected V. The objective returns a large FINITE penalty (10×(max single-donor vertex MSPE)+1, which dominates any feasible objective value since the objective is convex in w → max at a simplex vertex) so that candidate can never be chosen; a non-converged univariate-start solve gets inf MSPE (→ zero heuristic weight). A finite penalty is used instead of np.inf because inf floods scipy's Nelder-Mead/Powell simplex arithmetic with RuntimeWarnings (and makes it churn). Warn on ANY non-zero non-convergence count (no rate threshold). Healthy fits unaffected (Basque Tier-2 unchanged; warning count back to baseline). Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

github-actions · 2026-05-30T22:51:06Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 08957d636a32c404b8b459c48ae695a27a440fd8

Overall Assessment

✅ Looks good — re-review of SyntheticControl. The prior P1 about nested-V ranking on non-converged inner solves appears resolved; I found one remaining non-blocking test gap.

Executive Summary

Affected method: SyntheticControl / SyntheticControlResults, implementing classic SCM from Abadie-Diamond-Hainmueller (2010) and Abadie-Gardeazabal (2003).
Previous blocker resolved: non-converged inner solves no longer compete in nested V selection; _v_starts() drops them from heuristic seeding and _outer_solve_V() penalizes them out of the argmin while warning on any occurrence (diff_diff/synthetic_control.py:L981-L1001, diff_diff/synthetic_control.py:L1037-L1127).
Core methodology matches the Registry: one treated unit, never-treated donors, absorbing/no-anticipation checks, simplex donor weights, pre-period outcome MSPE for nested V, and no analytical SE (docs/methodology/REGISTRY.md:L1964-L1976, diff_diff/synthetic_control.py:L319-L455).
The documented deviations from R are properly labeled and therefore non-blocking (docs/methodology/REGISTRY.md:L1978-L1985, docs/api/synthetic_control.rst:L107-L119).
Non-blocking P2: the new nested-nonconvergence regression still only exercises the blanket-failure warning path, not the specific low-rate exclusion behavior from the prior review (tests/test_methodology_synthetic_control.py:L317-L322).
The practitioner/report-routing gap is explicitly tracked in TODO.md, so it is informational only (TODO.md:L85).

Methodology

No unmitigated findings. The implementation now adheres to the Registry contracts for treatment structure, donor composition, weighting, nested V selection, and NaN-only inference (diff_diff/synthetic_control.py:L319-L455, diff_diff/synthetic_control.py:L981-L1127). The remaining departures from R are explicitly documented under **Note:** / **Deviation from R:**, so they are P3 informational rather than defects (docs/methodology/REGISTRY.md:L1978-L1985, docs/api/synthetic_control.rst:L107-L119).

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3. Impact: SyntheticControlResults is still not routed through practitioner / DiagnosticReport / BusinessReport, but that limitation is explicitly tracked and therefore non-blocking for this PR (TODO.md:L85). Concrete fix: none in PR-1; handle in the planned PR-2 routing/placebo-inference follow-up.

Security

No findings.

Documentation/Tests

Severity: P2. Impact: tests/test_methodology_synthetic_control.py:L317-L322 only forces inner_max_iter=1 and asserts the aggregated warning. It does not directly lock in the prior-review fix that a single low-rate conv=False inner solve must be excluded from nested-V ranking, not merely warned about. A future regression toward thresholded warning/exclusion could pass this test. Concrete fix: add a deterministic low-rate regression, ideally by monkeypatching _inner_solve_W() so exactly one chosen nested evaluation returns conv=False, then assert both the any-occurrence warning and that the selected V/weights reflect candidate exclusion.
Static review only: I could not run the suite in this environment because required runtime dependencies are missing (numpy import fails before package initialization).

…sion regression (P2) Add test_single_inner_nonconvergence_excluded_from_v_ranking: monkeypatch _inner_solve_W so exactly ONE objective evaluation (the uniform-start eval) reports conv=False, then assert (a) the any-occurrence "during nested V selection" warning fires and (b) the selected V is a genuine small-MSPE fit (res.mspe_v < 1.0, not the large penalty) — i.e. the truncated candidate was EXCLUDED from the argmin, not merely warned. Complements the blanket-failure (inner_max_iter=1) test. Test-only change. (Reaches the module via importlib since the diff_diff.synthetic_control attribute is the convenience function, which shadows the submodule — same pattern as diff_diff.trop.) Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

github-actions · 2026-05-30T23:01:38Z

🔁 AI review rerun (requested by @igerber)

Head SHA: fca42fe746d4bd73d8244a8ae34583baa0b64925

Overall Assessment

✅ Looks good — re-review of SyntheticControl. I found no unmitigated P0/P1 issues.

Executive Summary

Affected method: SyntheticControl / SyntheticControlResults, implementing classic SCM from Abadie-Diamond-Hainmueller (2010) and Abadie-Gardeazabal (2003).
The prior nested-V blocker appears resolved: non-converged inner solves are penalized out of V ranking and surfaced with an aggregate warning in diff_diff/synthetic_control.py:L1037-L1127.
The previous test gap is now closed by a targeted regression that forces exactly one failed inner solve and asserts it cannot win the argmin in tests/test_methodology_synthetic_control.py:L325-L357.
Core methodology matches the Registry contract: one treated unit, never-treated donor pool, simplex donor weights, nested/custom diagonal V, and NaN-only inference in docs/methodology/REGISTRY.md:L1964-L1976 and diff_diff/synthetic_control.py:L319-L455.
Remaining departures from R are explicitly documented under Note / Deviation from R, so they are informational rather than defects in docs/methodology/REGISTRY.md:L1978-L1985.
The practitioner/report-routing gap is explicitly tracked in TODO.md, so it is mitigated and non-blocking for this PR in TODO.md:L85-L85.

Methodology

No findings. The implementation adheres to the Registry on identification, weighting, nested V selection, and NaN-only inference, and the prior non-converged-inner-solve ranking issue is fixed in code plus regression coverage: diff_diff/synthetic_control.py:L319-L455, diff_diff/synthetic_control.py:L981-L1127, tests/test_methodology_synthetic_control.py:L317-L357.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3
Impact: SyntheticControlResults is not yet wired into practitioner / DiagnosticReport / BusinessReport, but that limitation is explicitly tracked and therefore mitigated for this PR.
Concrete fix: None in PR-1; handle in the planned PR-2 follow-up noted in TODO.md:L85-L85.

Security

No findings.

Documentation/Tests

No findings on the changed docs/tests. The prior re-review concern about low-rate inner-solve exclusion is now directly covered by tests/test_methodology_synthetic_control.py:L325-L357.
Static-review note: I could not execute the test file in this environment because numpy is not installed and pytest is unavailable.

igerber and others added 7 commits May 30, 2026 16:48

igerber added the ready-for-ci Triggers CI test workflows label May 30, 2026

igerber merged commit 4bf60fa into main May 31, 2026
33 of 34 checks passed

igerber deleted the feature/synthetic-control branch May 31, 2026 09:43

This was referenced May 31, 2026

Right-size pure-Python CI test runtime (SyntheticControl regression + SDID safe trims) #505

Merged

SyntheticControl: in-space placebo permutation inference + reporting-stack integration #511

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SyntheticControl estimator (classic SCM, Abadie-Diamond-Hainmueller 2010) — PR-1 core#501

Add SyntheticControl estimator (classic SCM, Abadie-Diamond-Hainmueller 2010) — PR-1 core#501
igerber merged 14 commits into
mainfrom
feature/synthetic-control

igerber commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 30, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

github-actions Bot commented May 30, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Path to Approval

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant