Add SyntheticControl estimator (classic SCM, Abadie-Diamond-Hainmueller 2010) — PR-1 core#501
Conversation
Add the classic Synthetic Control Method of Abadie, Diamond & Hainmueller (2010) as a standalone `SyntheticControl` estimator with a separate `SyntheticControlResults` container and a `synthetic_control()` convenience function, distinct from `SyntheticDiD` (donor weights only; no time weights or ridge). - Inner simplex solve W*(V) reuses utils._sc_weight_fw (V^½ folded into the predictor matrix, intercept=False, zeta=0, noise-scaled min_decrease). - Diagonal predictor-importance V selected data-driven (nested: softmax-on- simplex multistart Nelder-Mead + Powell polish, pre-period outcome MSPE) or user-supplied (custom_v). Per-row standardization (SD over donors+treated, ddof=1) matches the R Synth source; solution.v lives in the scaled space. - Reports gap path, att (mean post gap), pre_rmspe, donor_weights, v_weights, predictor_balance. No analytical SE — inference fields are NaN (placebo permutation inference reserved for a follow-up; _placebo_gaps/_rmspe_ratio/ _fit_snapshot reserved on the results object). - 10 validation gates: predictor-period leakage, absorbing post-suffix + no-anticipation cross-check, post canonicalization, donor filtering, empty windows, poor-fit warning, duplicate labels, inner non-convergence warning, order-independent gap path, standardize="none" deviation; plus fail-closed custom_v cross-field rules and J==1 / T0 degenerate handling. R-Synth parity (tests/test_methodology_synthetic_control.py; goldens generated by benchmarks/R/generate_synth_basque_golden.R into tests/data/): two-tier on the Basque study — Tier-1 feeds R's solution.v via custom_v and reproduces the published donor weights (region 10 0.851 + region 14 0.149) to atol=1e-3 deterministically; Tier-2 (@slow) checks the nested fit in a tolerance band (the nested V differs because the outer objective uses all pre periods, not R's time.optimize.ssr window). Docs: REGISTRY §SyntheticControl (deviation/Note labels), docs/api/ synthetic_control.rst + autosummary + index toctree, llms.txt (count 17→18) + llms-full.txt sections, README catalog row, doc-deps.yaml, CHANGELOG. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…tes (2×P1)
- Validate the treatment indicator on the FULL input BEFORE
_resolve_treated_and_donors, so a non-{0,1} code fails closed instead of
silently dropping a unit from both the treated and donor sets (which would
quietly change the donor pool / weights / ATT). (codex R1 P1)
- _resolve_periods now asserts D==1 in every post period (uninterrupted
exposure) in addition to D==0 in every pre period (no anticipation), on both
the inferred and explicit branches — an explicit suffix over a 0,0,1,0 path
no longer averages a treated period with an untreated one. (codex R1 P1)
- Add regressions: test_non_binary_treatment_rejected,
test_untreated_period_in_post_rejected. (codex R1 P2)
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…abel/mspe_v fixes - Fail closed on non-finite (NaN/inf) values in the data that actually enters the SCM matching problem: the treated/donor outcome panel (gap path / ATT / nested objective) and every selected predictor block (predictors over predictor_window, special_predictors, pre_period_outcomes). Previously NaN propagated silently to finite-looking uniform weights (e.g. a docs-style predictors=['school.high'] over the default all-pre window on the Basque fixture's late-start covariates). (codex R2 P1) - special_predictors label now uses the full ordered period list, so distinct sets sharing endpoints+length (e.g. [2000,2002,2004] vs [2000,2003,2004]) no longer collide in the duplicate guard / v_weights key. (codex R2 P2) - mspe_v (the OUTER-objective value) is now None on the custom and single-donor paths, matching its documented "nested-only" contract. (codex R2 P2) - Docs examples (api rst + llms-full) set predictor_window explicitly to steer away from the all-pre default on late-start covariates. (codex R2 P2) - Regressions: test_non_finite_predictor_rejected, test_non_finite_outcome_rejected, test_distinct_special_period_sets_not_duplicate. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…s re-export, doc wording - Reject missing (NaN) values in treatment/unit/time up front in fit(), before donor classification: a partially-missing treatment history would otherwise be silently classified as never-treated by groupby(...).max() (NaN dropped), changing the donor pool / weights / ATT with no warning. (codex R3 P1) - Re-export SyntheticControl from diff_diff.estimators (mirrors the SyntheticDiD / TwoWayFixedEffects backward-compat re-export contract + __all__). (codex R3 P1) - Reword the short README / llms.txt catalog descriptions: "no inference in this release (inference fields are NaN — permutation/placebo planned)" instead of "permutation-only inference", matching the api/REGISTRY docs. (codex R3 P3) - Regressions: test_missing_treatment_value_rejected, test_estimators_module_reexport. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
… on outer-V non-convergence - Document the predictor/outcome missing-data contract as an explicit **Deviation from R:** in REGISTRY: aggregation fails closed on any non-finite cell, whereas R Synth::dataprep uses na.rm=TRUE. The fail-closed choice is deliberate (na-dropping silently aggregates different period subsets across units → incomparable predictors). Parity wording tightened to "row ORDER matches dataprep" across REGISTRY / api rst / _validate_predictors docstring. (codex R4 P1) - _outer_solve_V now tracks OptimizeResult.success across the multistart Nelder-Mead runs and the Powell polish, and emits a UserWarning when neither converged (previously only the inner Frank-Wolfe solve surfaced non-convergence). (codex R4 P2) - Regressions: test_all_na_predictor_window_rejected (documents partial==full fail-closed under the no-na.rm contract), test_outer_v_nonconvergence_warning. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…onicalize predictor periods
- Restrict predictor/special aggregation ops to {mean, sum} — both LINEAR
combinations per ADH 2010 §2.3 (Ȳ = Σ k_s Y_is). median (non-linear) removed
from _OP_FUNCS, the _PredictorSpec comment, the fit docstring, and llms-full;
no deviation note needed since mean/sum are the documented linear forms. (codex R5 P1)
- Canonicalize predictor_window / special_predictors / pre_period_outcomes period
lists to unique + calendar-sorted (_canon) before labeling, mirroring the
post_periods treatment: reordered [c,b,a]==[a,b,c] (so the duplicate-label guard
catches them) and a repeated period no longer re-weights a "mean". (codex R5 P2)
- Regressions: test_median_op_rejected, test_reordered_special_periods_are_duplicates,
test_duplicate_predictor_window_periods_deduped.
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…ing-stack wiring (TODO) - Clarify the mspe_v docstring: populated only when the nested outer search runs (None on both the custom and degenerate single-donor paths). (codex R6 P3) - Defer the reporting-stack wiring P2 to PR-2 via a TODO.md row: wiring SyntheticControlResults into practitioner / DiagnosticReport / BusinessReport routing (so SCM results don't get generic PT/HonestDiD guidance) pairs naturally with the PR-2 placebo-inference layer those reports would surface. (codex R6 P2) Codex R6 verdict: ✅ no P0/P1. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
|
Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…tor restriction (P1) Add a **Note:** in REGISTRY §SyntheticControl (mirrored in docs/api/synthetic_control.rst) that predictor rows support only EQUAL-WEIGHT linear combinations — mean (k_s=1/T0), sum (k_s=1), and per-period outcome lags (identity) — and NOT ADH (2010) §2.3's general arbitrary-weight form Ȳ = Σ_s k_s Y_is (nor non-linear ops like median). The supported set still spans standard Synth::dataprep predictors.op + special.predictors usage; arbitrary-weight K_m is a deferred extension. Documents the restriction introduced when median was dropped, so it is no longer an undocumented methodology deviation. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — the previous P1 methodology documentation gap is resolved, and I did not find a new unmitigated P0/P1 issue in the changed SCM code. One P2 edge-case warning gap remains, plus a tracked P3 tech-debt item. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…at pre-path (P2) The poor-fit warning was gated by `pre_sd > 0`, so a FLAT treated pre-period path (SD == 0) with nonzero pre-RMSPE never warned even though the synthetic clearly fails to reproduce a constant series. Change the gate to the literal REGISTRY contract (warn when pre_rmspe > pre_sd), including the SD == 0 case, with a scale-aware absolute floor (1e-8 * max(|Z1|, 1)) so a near-perfect flat fit (RMSPE ~ roundoff) does not spuriously warn. REGISTRY poor-fit Note updated to document the flat-path behavior (slightly broader than SyntheticDiD's SD>0-gated form). Regression: test_poor_fit_warning_flat_treated_pre_path. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — the affected method is Executive Summary
Methodology Affected method:
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
… (P2) _v_starts() eagerly computed the inverse-variance and univariate-fit heuristic candidates (the latter = O(k) inner Frank-Wolfe solves) before truncating to n_starts, so n_starts=1 still paid the univariate loop. Generate candidates lazily and stop once `target = max(n_starts, 1)` are collected: n_starts=1 now returns the uniform start without the univariate loop. Candidate ORDER is unchanged, so any given n_starts yields the same set as before (default n_starts=4 is identical — Basque Tier-2 parity preserved) — only unused work is skipped. Regression: test_n_starts_one_runs. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good — the affected method is Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…start + distinct seeds (P2) The inverse-variance heuristic start was computed from the already-standardized predictors, where every row variance is ~1, so its theta collapsed to the same zero-vector as the uniform start — running the same Nelder-Mead seed twice and giving less search diversity than n_starts implies. Now: - compute the inverse-variance start from the UNSTANDARDIZED X1/X0 (threaded through _outer_solve_V → _v_starts), so it is a genuinely different seed on real-scale data; - de-duplicate candidates (_add_unique, atol=1e-9) so the multistart never runs the same seed twice — a collapsed heuristic slot is refilled by a fresh Dirichlet draw. Default n_starts=4 still yields 4 distinct seeds; Basque Tier-2 parity preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…e during V search (P1) During the nested V search, _inner_solve_W's convergence flag was discarded on every intermediate evaluation (univariate starts + objective calls), so the outer optimizer could silently rank truncated W*(V) solves if inner solves hit inner_max_iter; only the final re-solve was surfaced. Now _v_starts returns its inner-solve counts and _outer_solve_V tallies intermediate non-convergence across the univariate starts AND every objective evaluation, emitting one aggregated UserWarning when the rate exceeds 5% (mirrors the synthetic_did.py bootstrap-FW aggregation). Healthy fits (converging inner solves) stay silent — Basque Tier-2 unaffected. Regression: test_inner_v_search_nonconvergence_warning. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…es from V argmin (P1) Strengthen the prior fix: a non-converged inner Frank-Wolfe solve during the nested V search is now EXCLUDED from V ranking (not merely warned above a 5% rate) — in an argmin search even one truncated W*(V) could win and silently flip the selected V. The objective returns a large FINITE penalty (10×(max single-donor vertex MSPE)+1, which dominates any feasible objective value since the objective is convex in w → max at a simplex vertex) so that candidate can never be chosen; a non-converged univariate-start solve gets inf MSPE (→ zero heuristic weight). A finite penalty is used instead of np.inf because inf floods scipy's Nelder-Mead/Powell simplex arithmetic with RuntimeWarnings (and makes it churn). Warn on ANY non-zero non-convergence count (no rate threshold). Healthy fits unaffected (Basque Tier-2 unchanged; warning count back to baseline). Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — re-review of Executive Summary
Methodology No unmitigated findings. The implementation now adheres to the Registry contracts for treatment structure, donor composition, weighting, nested Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
|
…sion regression (P2) Add test_single_inner_nonconvergence_excluded_from_v_ranking: monkeypatch _inner_solve_W so exactly ONE objective evaluation (the uniform-start eval) reports conv=False, then assert (a) the any-occurrence "during nested V selection" warning fires and (b) the selected V is a genuine small-MSPE fit (res.mspe_v < 1.0, not the large penalty) — i.e. the truncated candidate was EXCLUDED from the argmin, not merely warned. Complements the blanket-failure (inner_max_iter=1) test. Test-only change. (Reaches the module via importlib since the diff_diff.synthetic_control attribute is the convenience function, which shadows the submodule — same pattern as diff_diff.trop.) Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — re-review of Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
SyntheticControl— the classic Synthetic Control Method of Abadie, Diamond & Hainmueller (2010) (originating in Abadie & Gardeazabal 2003) — as a standalone estimator (diff_diff/synthetic_control.py) with a separateSyntheticControlResultscontainer (diff_diff/synthetic_control_results.py) and asynthetic_control()convenience function. Donor (unit) weights only — no time weights or ridge — distinct fromSyntheticDiD.W*(V)reusesutils._sc_weight_fw(foldingV^½into the predictor matrix,intercept=False,zeta=0, noise-scaledmin_decrease). The diagonal predictor-importance matrixVis selected data-driven (v_method="nested": softmax-on-simplex multistart Nelder-Mead + derivative-free Powell polish, minimizing pre-period outcome MSPE) or user-supplied (v_method="custom"). Per-row standardization (SD over donors+treated, ddof=1) matches the RSynth::synthsource.α̂_1t = Y_1t − Σ_j w_j Y_jt,att(mean post-period gap),pre_rmspe, donor weights,v_weights, and a predictor-balance table. No analytical standard error —se/t_stat/p_value/conf_intare NaN (in-space placebo permutation inference is a planned follow-up;_placebo_gaps/_rmspe_ratio/_fit_snapshotare reserved on the results object).custom_vcross-field rules, and degenerate single-donor / single-pre-period handling.docs/methodology/REGISTRY.md§SyntheticControl (with**Deviation from R:**/**Note:**labels),docs/api/synthetic_control.rst+ autosummary stubs + index toctree,diff_diff/guides/llms.txt(estimator count 17→18) +llms-full.txt,README.mdcatalog row,docs/doc-deps.yaml,CHANGELOG.md.Methodology references (required if estimator / math changes)
V.docs/methodology/papers/(merged in synthetic control: PR-A paper reviews (ADH 2010/2015, Abadie 2021 JEL, CWZ 2021) #497). Inner solve / standardization /solution.vscaled-space behavior verified against the RSynth::synthsource.Synthsource (ADH 2010 defers it to AG 2003 App. B); outer objective minimizes pre-period MSPE over all pre periods, not R'stime.optimize.ssrwindow (so the nestedVdiffers by an efficiency-only choice → Tier-2 parity is a band); softmax-on-simplexVparametrization; predictor/outcome aggregation fails closed on non-finite cells whereas R usesna.rm=TRUE(deliberate — na-dropping silently aggregates incomparable period subsets); aggregation restricted to linear combinations{mean, sum}(nomedian);standardize="none"option; 1×SD poor-fit threshold (defensive, matches SyntheticDiD). No analytical SE (NaN inference); placebo inference deferred.Validation
tests/test_methodology_synthetic_control.py— 43 tests covering the 10 validation gates,custom_vcross-field rules, J==1 / T0==0 / T0==1 degeneracies, the NaN-inference contract,get_params↔set_paramsround-trip, theestimatorsre-export surface, and two-tier R-Synthparity.loss.v0.0089). Tier-1 feeds R'ssolution.vviacustom_v→ donor weights match toatol=1e-3(observed ~2e-5), deterministic/optimizer-independent; Tier-2 (@pytest.mark.slow) checks the nested fit in a band. Golden fixtures live intests/data/(so Tier-1 runs in isolated-install CI without R), generated bybenchmarks/R/generate_synth_basque_golden.R;"Synth"added tobenchmarks/R/requirements.R.TODO.md): wiringSyntheticControlResultsinto the practitioner / DiagnosticReport / BusinessReport routing → PR-2, where it pairs with the placebo-inference layer those reports would surface.Security / privacy
🤖 Generated with Claude Code