dCDH by_path Wave 5 #11: + heterogeneity (predict_het per-by_level) by igerber · Pull Request #412 · igerber/diff-diff

igerber · 2026-05-10T16:45:02Z

Summary

Lift gate at chaisemartin_dhaultfoeuille.py:1230-1234 so per-path event-study disaggregation composes with heterogeneity="<col>".
New _compute_path_heterogeneity_test helper + path_groups: Optional[Set[int]] filter on _compute_heterogeneity_test. Per-path beta / SE / inference computed on the path-restricted switcher subsample; cohort dummies absorb baseline by construction (no R-divergence warning needed).
Result schema: new path_heterogeneity_effects: Dict[Tuple[int, ...], Dict[int, Dict[str, Any]]] field; to_dataframe(level="by_path") extended with always-present het_* columns (NaN for placebo rows / when not requested).
R parity: introduces the FIRST predict_het parity baseline in the repo. Two new R generator scenarios (multi_path_reversible_predict_het global anchor + multi_path_reversible_by_path_predict_het per-path) using dont_drop_larger_lower=TRUE to match Python's drop_larger_lower=False requirement under reversal paths. Per-path beta / SE match R within rtol=1e-6.
Inference refresh hooks into the final R2 P1b block (per feedback_late_if_site_inference_refresh.md) so all per-path entries see the final df_survey after replicate-weight n_valid appends.
Multiplier bootstrap (n_bootstrap > 0) under by_path + heterogeneity + survey_design inherits the existing per-path multiplier-bootstrap-survey gate from PR Compose by_path / paths_of_interest with survey_design (Wave 4 #10) #408.

Methodology references (required if estimator / math changes)

Method name(s): de Chaisemartin & D'Haultfœuille reversible-treatment heterogeneity test (Web Appendix Section 1.5, Lemma 7) under per-path event-study disaggregation
Paper / source link(s): de Chaisemartin & D'Haultfœuille (2024) "Difference-in-Differences Estimators of Intertemporal Treatment Effects" — Web Appendix Section 1.5 (heterogeneity test); R did_multiplegt_dyn(..., by_path, predict_het) per-by_level dispatch (R/R/did_multiplegt_dyn.R:226-257)
Any intentional deviations from the source (and why): None on the per-path methodology — Python mirrors R's per-path predict_het exactly. The inherited cross-path cohort-sharing SE deviation from R for the OVERALL path_effects SE is unchanged and does not apply here (heterogeneity uses standard WLS coefficient IF, not the cohort-recentered IF allocator). R's predict_het dont_drop_larger_lower=TRUE flag is matched in fixture scenarios so reversal paths preserve cohort variation under heterogeneity testing.

Validation

Tests added/updated:
- tests/test_chaisemartin_dhaultfoeuille.py::TestByPathHeterogeneity — 12 tests across gate dispatch (5), behavior (4 incl. single-path telescope at atol=rtol=1e-14, zero-signal anti-regression, multi-baseline UserWarning anti-regression), DataFrame integration (1), edge cases (2)
- tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneity — global anchor (FIRST predict_het parity baseline)
- tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathHeterogeneity — per-path
- benchmarks/R/generate_dcdh_dynr_test_values.R — 2 new scenarios (Add comprehensive test coverage for utils module #20 + Update roadmap with current implementation limitations #21), 2 new extraction helpers (extract_dcdh_predict_het, extract_dcdh_by_path_predict_het)
- benchmarks/data/dcdh_dynr_golden_values.json regenerated (21 scenarios, 619 KB)
Backtest / simulation / notebook evidence (if applicable): N/A (additive estimator extension; full R parity in fixture suite)

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

github-actions · 2026-05-10T16:49:13Z

PR Review Report

Overall Assessment

⚠️ Needs changes: one unmitigated P1 finding from the required sibling-surface mirror audit.

Executive Summary

Affected method: ChaisemartinDHaultfoeuille dCDH reversible-treatment heterogeneity test, Web Appendix Section 1.5 / Lemma 7, under by_path and paths_of_interest.
Methodology registry now documents the per-path heterogeneity contract and the R-parity scope; I did not find an undocumented math or SE deviation in the estimator path.
Inference uses safe_inference() on the new path; pattern-wide grep found no new inline t_stat = effect / se anti-pattern.
P1: the new path_heterogeneity_effects surface is wired into the result dataclass and to_dataframe(level="by_path"), but not into summary(), unlike sibling per-path surfaces and global heterogeneity.
Could not run tests: pytest is not installed in this environment.

Methodology

Finding M-1: P3 Informational — documented partial/deviation surface, no action required

Severity: P3
Impact: The global heterogeneity implementation remains a documented partial predict_het port: post-treatment regressions only, no placebo regressions or joint null test. This is explicitly documented in docs/methodology/REGISTRY.md:L637, so it is not a defect under the review rules. The new per-path extension is also documented in the by-path note, including path-restricted switchers, unchanged variance machinery, survey composition, final df_survey refresh, and result schema at docs/methodology/REGISTRY.md:L643.
Concrete fix: None required.

Code Quality

Finding CQ-1: P1 — `summary()` does not render the new per-path heterogeneity surface

Severity: P1
Impact: The PR adds results.path_heterogeneity_effects and exposes it in to_dataframe(level="by_path"), but the sibling report surface is missing. Global heterogeneity has a dedicated summary renderer at diff_diff/chaisemartin_dhaultfoeuille_results.py:L1253-L1283, and existing per-path sibling surfaces render placebos, cumulated effects, and sup-t bands inside _render_path_effects_section at diff_diff/chaisemartin_dhaultfoeuille_results.py:L1308-L1431. The new heterogeneity field is only consumed by the DataFrame path at diff_diff/chaisemartin_dhaultfoeuille_results.py:L1823-L1906. This violates the required sibling-surface mirror audit (schema ↔ renderer, summary ↔ full) and makes summary() silently omit a computed public result.
Concrete fix: Extend _render_path_effects_section() to read self.path_heterogeneity_effects.get(path, {}) and render a per-path heterogeneity sub-block with beta, se, t_stat, and p_value, mirroring _render_heterogeneity_section(). Add a regression test asserting res.summary() contains the per-path heterogeneity block and path/horizon rows when by_path + heterogeneity is fit.

Performance

No P0/P1/P2 findings.

Maintainability

Finding MA-1: P1 — same as CQ-1

Severity: P1
Impact: The result schema and renderer are now out of sync for one public surface, increasing the chance future users or downstream tests treat summary() as complete when it is not.
Concrete fix: Same as CQ-1.

Tech Debt

No mitigating TODO entry found under TODO.md “Tech Debt from Code Reviews” for the missing summary surface. The P1 remains unmitigated.

Security

No findings.

Documentation/Tests

Finding DT-1: P2 — summary coverage missing for the new public surface

Severity: P2
Impact: The new tests cover gate dispatch, behavior, DataFrame integration, and R parity, but do not cover summary() for path_heterogeneity_effects. Existing sibling features have summary tests, so this gap allowed CQ-1 through.
Concrete fix: Add a TestByPathHeterogeneity::test_per_path_heterogeneity_renders_in_summary test.

Verification note: I could not run the targeted tests because pytest is not installed (/bin/bash: pytest: command not found). I did run static greps for the inference anti-pattern and sibling surfaces.

Path to Approval

Add summary rendering for path_heterogeneity_effects inside _render_path_effects_section().
Add a regression test that fits by_path + heterogeneity and asserts summary() includes the per-path heterogeneity rows.
Run the new test plus the existing TestByPathHeterogeneity and dCDH parity heterogeneity tests in an environment with pytest installed.

Extends `_render_path_effects_section` in chaisemartin_dhaultfoeuille_results.py to render a per-path "Heterogeneity Test (Section 1.5, partial)" sub-block when `path_heterogeneity_effects` is populated for the path. Mirrors the global `_render_heterogeneity_section` block scoped to each per-path section, alongside the existing per-path placebo / cumulated / sup-t blocks. Adds anti-regression test `TestByPathHeterogeneity::test_per_path_heterogeneity_renders_in_summary` that asserts both the section header and at least one rendered beta value appear in `summary()` output when `by_path + heterogeneity` is fit. Locks the schema↔renderer mirror so future per-path surfaces don't silently omit from `summary()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-05-10T16:53:35Z

/ai-review

github-actions · 2026-05-10T16:58:32Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 7f7e3d5bec9fb80aadcb80f66f0fb4a408514077

Overall Assessment

✅ Looks good — no unmitigated P0 or P1 findings in this re-review.

Executive Summary

Prior P1 is resolved: summary() now renders per-path heterogeneity rows at diff_diff/chaisemartin_dhaultfoeuille_results.py:L1421-L1443, with a regression test at tests/test_chaisemartin_dhaultfoeuille.py:L10331-L10370.
Affected method: dCDH reversible-treatment heterogeneity test, Web Appendix Section 1.5 / Lemma 7, under by_path / paths_of_interest.
Methodology is documented in docs/methodology/REGISTRY.md:L637 and docs/methodology/REGISTRY.md:L643; I found no undocumented estimator, weighting, or SE deviation.
New inference paths use safe_inference() and preserve NaN-consistent inference fields.
Completed sibling-surface, pattern-wide grep, and by_path ↔ paths_of_interest symmetry audits. No workflow paths: changes were in scope.
I could not run tests locally because pytest is unavailable in this environment.

Methodology

M-1: P3 Informational — documented partial R `predict_het` surface

Severity: P3
Impact: The global heterogeneity implementation remains a documented partial port: post-treatment regressions only, without R’s placebo regressions or joint null test. This is explicitly documented as a deviation in docs/methodology/REGISTRY.md:L637, so it is not a defect under the review rules. The new per-path extension is also documented, including path-restricted switchers, variance machinery, survey composition, df refresh, and result schema at docs/methodology/REGISTRY.md:L643.
Concrete fix: None required.

No P0/P1 methodology findings. The new per-path helper restricts eligibility through path_groups inside _compute_heterogeneity_test at diff_diff/chaisemartin_dhaultfoeuille.py:L5025-L5027, and dispatches per selected path at diff_diff/chaisemartin_dhaultfoeuille.py:L6459-L6535.

Code Quality

No P0/P1/P2 findings.

The prior missing renderer is fixed in the sibling report surface at diff_diff/chaisemartin_dhaultfoeuille_results.py:L1421-L1443. Pattern-wide grep found no new inline t_stat = effect / se inference anti-pattern in the changed dCDH path; the new and refreshed fields route through safe_inference() at diff_diff/chaisemartin_dhaultfoeuille.py:L5106-L5223 and diff_diff/chaisemartin_dhaultfoeuille.py:L4058-L4070.

Performance

No findings.

Maintainability

No P0/P1/P2 findings.

Sibling surfaces are now aligned: result schema at diff_diff/chaisemartin_dhaultfoeuille_results.py:L597-L607, summary rendering at diff_diff/chaisemartin_dhaultfoeuille_results.py:L1421-L1443, and to_dataframe(level="by_path") het_* columns at diff_diff/chaisemartin_dhaultfoeuille_results.py:L1781-L1932.

Tech Debt

No untracked blocking tech debt found for this PR. Existing TODO.md review debt entries appear unrelated to the new per-path heterogeneity surface.

Security

No findings. A lightweight secret-pattern scan over the changed text did not find candidate secrets.

Documentation/Tests

DT-1: P3 Informational — local test execution unavailable

Severity: P3
Impact: I could not execute the targeted tests because this environment lacks pytest (/usr/bin/python: No module named pytest). Static review shows coverage was added for gates, behavior, DataFrame output, summary rendering, and R parity at tests/test_chaisemartin_dhaultfoeuille.py:L9977-L10370 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L1324-L1472.
Concrete fix: Run the targeted dCDH heterogeneity and parity tests in CI or a full dev environment with test dependencies installed.

igerber · 2026-05-10T18:38:08Z

/ai-review

github-actions · 2026-05-10T18:41:20Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 6c8a68ca40f85d2b2cbc6155e5ad0c9e5149de0f

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings in this re-review.

Executive Summary

Prior P1 is resolved: summary() now renders per-path heterogeneity rows at diff_diff/chaisemartin_dhaultfoeuille_results.py:L1421-L1443, with coverage at tests/test_chaisemartin_dhaultfoeuille.py:L10534-L10572.
Affected method: dCDH reversible-treatment heterogeneity test, Web Appendix Section 1.5 / Lemma 7, now under by_path and paths_of_interest.
Methodology is documented in docs/methodology/REGISTRY.md:L637 and docs/methodology/REGISTRY.md:L643; I found no undocumented estimator, weighting, or SE deviation.
New inference paths use safe_inference() and refresh replicate-weight df for per-path heterogeneity at diff_diff/chaisemartin_dhaultfoeuille.py:L4058-L4070.
Completed sibling-surface, pattern-wide grep, by_path ↔ paths_of_interest symmetry, and workflow-scope audits. No GH Actions workflow paths: changes are in scope.
I could not run tests locally because this environment lacks numpy, pandas, and pytest.

Methodology

M-1: P3 Informational — documented partial R `predict_het` surface

Severity: P3
Impact: The global heterogeneity implementation remains a documented partial port: post-treatment regressions only, without R’s placebo regressions or joint null test. This is explicitly documented in docs/methodology/REGISTRY.md:L637, so it is not a defect. The new per-path extension is documented at docs/methodology/REGISTRY.md:L643.
Concrete fix: None required.

No P0/P1 methodology findings. The path restriction is implemented through path_groups in _compute_heterogeneity_test() at diff_diff/chaisemartin_dhaultfoeuille.py:L5025-L5027 and dispatched by _compute_path_heterogeneity_test() at diff_diff/chaisemartin_dhaultfoeuille.py:L6459-L6535.

Code Quality

No P0/P1/P2 findings.

The new result schema is wired into the result dataclass at diff_diff/chaisemartin_dhaultfoeuille_results.py:L597-L607, summary rendering at L1421-L1443, and to_dataframe(level="by_path") at L1819-L1937.

Performance

No findings.

Maintainability

No P0/P1/P2 findings.

Sibling surfaces are aligned: estimator output, summary rendering, DataFrame columns, docs, registry, and parity tests all reference the same {path: {l: ...}} path_heterogeneity_effects contract.

Tech Debt

No untracked blocking tech debt found for this PR. Existing TODO/registry limitations are unrelated or explicitly documented.

Security

No findings. A lightweight secret-pattern scan over the changed files found no candidate secrets.

Documentation/Tests

DT-1: P3 Informational — local test execution unavailable

Severity: P3
Impact: I could not execute the targeted tests because this environment lacks required test dependencies. Static review shows coverage for gates, behavior, summary, DataFrame output, survey composition, and R parity.
Concrete fix: Run the targeted dCDH heterogeneity and parity tests in CI or a full dev environment.

DT-2: P3 Informational — multi-baseline no-warning test is not actually multi-baseline

Severity: P3
Impact: tests/test_chaisemartin_dhaultfoeuille.py:L10276-L10302 claims to cover the multi-baseline no-warning anti-regression, but _by_path_het_data() uses paths that all start from baseline 0. This does not block approval because the behavior is documented and the methodology path is not silently wrong.
Concrete fix: Add a small fixture with switchers on at least two baseline values and assert no multi-baseline warning under by_path + heterogeneity.

Lifts the gate at chaisemartin_dhaultfoeuille.py:1230-1234 so per-path event-study disaggregation composes with heterogeneity="<col>" (Web Appendix Section 1.5, Lemma 7), mirroring R did_multiplegt_dyn(..., by_path, predict_het) per-by_level dispatch. Per-path heterogeneity is computed by re-running the Lemma 7 regression on each path-restricted switcher subsample. New `path_groups` (Optional[Set[int]]) parameter on _compute_heterogeneity_test restricts eligibility to switchers ON path p; the variance machinery (standard WLS vcov for non-survey, cell-period IF allocator for Binder TSL, group-level allocator for Rao-Wu replicate) is unchanged from the global heterogeneity path. Cohort dummies absorb baseline by construction, so multi-baseline switcher panels do not produce R-divergence (no parallel UserWarning like controls / trends_linear). Surfaces on results.path_heterogeneity_effects keyed {path: {l: {beta, se, t_stat, p_value, conf_int, n_obs}}} and on to_dataframe(level="by_path") via new always-present het_* columns, populated for positive horizons and NaN otherwise (mirrors cband_* / cumulated_* convention). Per-(path, horizon) inference is refreshed in the final R2 P1b block so all surfaces use the same df_survey after replicate-weight n_valid appends. R parity: introduces the FIRST predict_het R-parity baseline in the repo. Two new scenarios (multi_path_reversible_predict_het global anchor + multi_path_reversible_by_path_predict_het per-path) use dont_drop_larger_lower=TRUE to match drop_larger_lower=False and provide cohort variation under reversal paths. Per-path beta and SE match R within rtol=1e-6. Multiplier bootstrap (n_bootstrap > 0) under by_path + heterogeneity + survey_design inherits the existing per-path multiplier-bootstrap gate from PR #408. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Extends `_render_path_effects_section` in chaisemartin_dhaultfoeuille_results.py to render a per-path "Heterogeneity Test (Section 1.5, partial)" sub-block when `path_heterogeneity_effects` is populated for the path. Mirrors the global `_render_heterogeneity_section` block scoped to each per-path section, alongside the existing per-path placebo / cumulated / sup-t blocks. Adds anti-regression test `TestByPathHeterogeneity::test_per_path_heterogeneity_renders_in_summary` that asserts both the section header and at least one rendered beta value appear in `summary()` output when `by_path + heterogeneity` is fit. Locks the schema↔renderer mirror so future per-path surfaces don't silently omit from `summary()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds 3 slow regression tests for `by_path + heterogeneity + survey_design` composition that the original PR claimed via REGISTRY/CHANGELOG but lacked automated coverage: 1. `test_per_path_heterogeneity_under_survey_finite` — analytical Binder TSL SE finite per (path, l) under PSU=group survey design. 2. `test_per_path_heterogeneity_replicate_weights_propagates_n_valid` — BRR replicate-weight fit drives `_replicate_n_valid_list` through per-(path, l) heterogeneity calls and final `survey_metadata.df_survey == n_replicates - 1`. 3. `test_survey_design_plus_n_bootstrap_with_heterogeneity_still_raises` — confirms heterogeneity composition does not accidentally re-route around the existing per-path multiplier-bootstrap-survey gate. Also extends `to_dataframe(level="by_path")` docstring at `chaisemartin_dhaultfoeuille_results.py:1527` to list the new `het_*` columns alongside `cband_*` and `cumulated_*`, mirroring the always-present NaN-when-None convention. The empty-DataFrame schema already includes them; this is a docstring-only sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-05-11T00:56:04Z

/ai-review

github-actions · 2026-05-11T00:57:35Z

🔁 AI review rerun (requested by @igerber)

Head SHA: a57f5834601f5b31e7561a7db1e45094acd3323b

Overall Assessment

⚠️ Needs changes — one unmitigated P2 claim-vs-test mismatch remains.

Executive Summary

Affected method: ChaisemartinDHaultfoeuille dCDH heterogeneity test (Web Appendix Section 1.5 / Lemma 7), newly composed with by_path / paths_of_interest.
Methodology implementation is broadly consistent with the visible REGISTRY contract: per-path heterogeneity filters switchers by selected path and reuses the existing Lemma 7 regression / inference path.
Prior summary-rendering concern appears resolved: summary() now renders per-path heterogeneity rows.
Public surfaces are mostly wired: result dataclass field, summary(), and to_dataframe(level="by_path") all expose path_heterogeneity_effects / het_*.
Remaining blocker is a test-claim mismatch: the PR/CHANGELOG/REGISTRY claim multi-baseline per-path heterogeneity behavior is regression-tested, but the visible “multi-baseline” test fixture uses only baseline-0 switcher paths.

Methodology

M-1 — P3 Informational: documented partial heterogeneity implementation

Severity: P3
Impact: The dCDH heterogeneity test remains a partial implementation relative to R / the paper: post-treatment regressions only, no placebo regressions or joint null test. This is explicitly documented in the Methodology Registry under the dCDH heterogeneity notes, so it is not a defect.
Concrete fix: None required.

M-2 — No unmitigated methodology mismatch found in implementation

Severity: N/A
Impact: The new implementation path matches the visible REGISTRY claim for per-path heterogeneity: _compute_path_heterogeneity_test() enumerates selected paths and calls _compute_heterogeneity_test(..., path_groups=...), while _compute_heterogeneity_test() restricts eligible switchers via if path_groups is not None and g not in path_groups: continue (diff_diff/chaisemartin_dhaultfoeuille.py:_compute_heterogeneity_test, _compute_path_heterogeneity_test). This is consistent with the documented “path-restricted switcher subsample” contract.
Concrete fix: None.

Code Quality

No P0/P1/P2 code-quality findings in the visible changed source.

The new field is threaded through:

Fit result construction: diff_diff/chaisemartin_dhaultfoeuille.py near path_heterogeneity_effects=...
Result dataclass: diff_diff/chaisemartin_dhaultfoeuille_results.py near path_heterogeneity_effects
Summary rendering: ChaisemartinDHaultfoeuilleResults._render_path_effects_section
DataFrame rendering: ChaisemartinDHaultfoeuilleResults.to_dataframe(level="by_path")

Performance

No findings.

The path heterogeneity helper re-enumerates treatment paths after path_effects already did so. This is documented in-code and is likely acceptable given the feature scope; no performance blocker.

Maintainability

No unmitigated maintainability findings.

The schema shape {path: {l: {...}}} is documented in source docstrings and handled separately from path_effects[path]["horizons"], which avoids overloading the existing path-effect schema.

Tech Debt

No untracked blocking tech debt found in the visible context.

Security

No findings in the visible diff. I did not run a secret scanner or shell grep; this is based only on the loaded prompt content.

Documentation/Tests

DT-1 — P2: claimed multi-baseline behavior is not actually tested

Severity: P2
Impact: The PR/CHANGELOG/REGISTRY explicitly claim that per-path heterogeneity on multi-baseline switcher panels does not produce the R-divergence warning used for controls / trends_linear, because cohort dummies absorb baseline. The visible test named test_per_path_heterogeneity_no_multi_baseline_warning does not actually construct a multi-baseline switcher panel: _by_path_het_data() defines paths [(0, 1, 1, 1), (0, 1, 0, 0), (0, 1, 1, 0)], all starting from baseline 0 (tests/test_chaisemartin_dhaultfoeuille.py:_by_path_het_data). As a result, the shipped test does not cover the explicitly claimed multi-baseline behavior.

This is a claim-vs-test mismatch under the single-pass audit rules. No TODO mitigation is visible in the prompt, and shipped-behavior test gaps are P2.

Concrete fix: Add or correct a test with at least two switcher baseline values, e.g.:

Include baseline-0 joiner paths such as (0, 1, 1, 1).
Include baseline-1 leaver paths such as (1, 0, 0, 0) or another valid reversible path beginning with 1.
Fit with ChaisemartinDHaultfoeuille(drop_larger_lower=False, by_path=2) or paths_of_interest=[(0,1,1,1), (1,0,0,0)], L_max=3, heterogeneity="het_x".
Assert:
1. res.path_heterogeneity_effects contains both selected paths or the expected observed subset.
2. No warning matching the controls/trends-linear multi-baseline divergence text is emitted.
3. At least one per-path heterogeneity entry is finite where the design has sufficient observations.

Path to Approval

Add a real multi-baseline per-path heterogeneity test as described in DT-1, using switcher paths with different first elements / baseline treatment values.
Ensure the new test filters out unrelated expected warnings, but fails on any warning containing the multi-baseline R-divergence language used by by_path + controls / by_path + trends_linear.
Re-run the targeted dCDH heterogeneity tests and parity tests in CI.

The prior `test_per_path_heterogeneity_no_multi_baseline_warning` used `_by_path_het_data` whose paths all start with `D=0`, so it never exercised the multi-baseline switcher panel regime claimed in REGISTRY ("cohort dummies absorb baseline by construction"). The new CI reviewer flagged this as a P2 claim-vs-test mismatch. Replaces the test with a TRUE multi-baseline DGP via new `_multi_baseline_het_data` static fixture: 60 joiners (D_{g,1}=0, path (0,1,1,1)) + 60 leavers (D_{g,1}=1, path (1,0,0,0)), F_g in {3,4,5} for both baselines. Selected via `paths_of_interest=[(0,1,1,1), (1,0,0,0)]`. Test now asserts: 1. Fixture sanity: switcher baselines actually span {0, 1}. 2. Both selected paths populate `path_heterogeneity_effects`. 3. Each path has ≥1 finite (path, l) entry — confirms the regression is non-degenerate under multi-baseline switchers. 4. No multi-baseline UserWarning emitted (cohort dummies absorb). 5. No verbatim `by_path / paths_of_interest + controls/trends_linear` divergence warning emitted (strict text-fragment check). Empirically verified: this fixture produces finite beta/SE on every in-window horizon for both paths and emits zero baseline warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-05-11T21:23:13Z

/ai-review

github-actions · 2026-05-11T21:24:19Z

🔁 AI review rerun (requested by @igerber)

Head SHA: cdbc6bb85f445d8b115de093d6a36bff1f5bb183

Overall Assessment

✅ Looks good — no unmitigated P0/P1/P2 findings in the visible context.

Executive Summary

Affected method: ChaisemartinDHaultfoeuille dCDH heterogeneity test (Web Appendix Section 1.5 / Lemma 7), newly composed with by_path / paths_of_interest.
The prior P2 test-gap finding appears resolved: the new test_per_path_heterogeneity_no_multi_baseline_warning fixture now uses both baseline-0 joiners and baseline-1 leavers.
Implementation matches the visible Methodology Registry contract: per-path heterogeneity reuses the existing Lemma 7 regression with a path-restricted switcher filter, and public result/rendering surfaces are wired.
Claim-vs-shipped surfaces are covered in the visible diff: implementation, public docstrings, summary(), to_dataframe(level="by_path"), R parity tests, survey/replicate tests, and no-warning multi-baseline test are all present.
No security issues or accidental secrets are visible in the loaded diff. I did not run shell scans or inspect files outside the prompt.

Methodology

M-1 — P3 Informational: documented partial heterogeneity implementation

Severity: P3
Impact: The heterogeneity test remains a partial implementation relative to the full R/paper surface: post-treatment regressions are implemented, while placebo regressions and a joint null test are not. This is explicitly documented in the Methodology Registry under dCDH heterogeneity testing, so it is not a defect.
Concrete fix: None required.

M-2 — Previous P2 resolved: real multi-baseline behavior is now tested

Severity: Resolved / N/A
Impact: The previous review flagged that the “multi-baseline” no-warning test only used baseline-0 switcher paths. The updated test now constructs a true multi-baseline panel: joiners with baseline 0 and path (0, 1, 1, 1), plus leavers with baseline 1 and path (1, 0, 0, 0) (tests/test_chaisemartin_dhaultfoeuille.py::TestByPathHeterogeneity._multi_baseline_het_data, test_per_path_heterogeneity_no_multi_baseline_warning). It also asserts both paths surface finite heterogeneity entries and no multi-baseline warning is emitted.
Concrete fix: None required.

M-3 — Methodology implementation matches visible Registry contract

Severity: N/A
Impact: The Registry’s new per-path heterogeneity contract says the path filter restricts eligible switchers inside the Lemma 7 regression while leaving the variance machinery unchanged. The implementation does this by adding path_groups to _compute_heterogeneity_test() and skipping groups not in that set (diff_diff/chaisemartin_dhaultfoeuille.py::_compute_heterogeneity_test), with _compute_path_heterogeneity_test() enumerating paths and calling the same heterogeneity helper per selected path. This is consistent with the documented “path-restricted switcher subsample” semantics.
Concrete fix: None required.

Code Quality

No P0/P1/P2 code-quality findings in the visible changed source.

Notable positive checks:

The old NotImplementedError gate for by_path / paths_of_interest + heterogeneity was removed while retaining gates for design2 and honest_did (diff_diff/chaisemartin_dhaultfoeuille.py:fit).
The new path_groups filter is localized and does not duplicate the Lemma 7 regression logic.
The final replicate-weight df refresh includes the new {path: {l: ...}} schema separately, avoiding accidental misuse of the existing path-effect schema refresher.

Performance

No blocking performance findings.

P3 Informational: path enumeration is repeated for heterogeneity

Severity: P3
Impact: _compute_path_heterogeneity_test() re-enumerates treatment paths even though _compute_path_effects() already did so earlier in fit(). The helper documents this and suppresses duplicate warnings. Given the feature scope and path enumeration cost relative to estimation/bootstrap work, this is acceptable.
Concrete fix: Optional future refactor: pass the already-computed selected paths and masks into _compute_path_heterogeneity_test() to avoid repeated enumeration.

Maintainability

No unmitigated maintainability findings.

The new result schema is documented consistently:

ChaisemartinDHaultfoeuilleResults.path_heterogeneity_effects dataclass field and class docstring (diff_diff/chaisemartin_dhaultfoeuille_results.py).
Public estimator docstrings for by_path, paths_of_interest, and heterogeneity (diff_diff/chaisemartin_dhaultfoeuille.py).
Registry and API docs updated (docs/methodology/REGISTRY.md, docs/api/chaisemartin_dhaultfoeuille.rst).
Rendering surfaces added in summary() and to_dataframe(level="by_path").

Tech Debt

No untracked blocking tech debt found in the visible context.

P3 Informational: existing documented heterogeneity limitations remain

Severity: P3
Impact: The dCDH heterogeneity feature is still explicitly documented as partial: no placebo heterogeneity and no joint null test. This limitation is already captured in the Registry with a Note/Deviation-style explanation and is not introduced as an undocumented silent behavior here.
Concrete fix: None required for this PR.

Security

No findings in the visible diff.

I did not run a secret scanner or shell grep; this assessment is limited to the prompt-provided diff and full changed source files.

Documentation/Tests

No unmitigated documentation/test findings.

Claim-vs-shipped audit based on visible context:

Implementation: Present via _compute_path_heterogeneity_test() and path_groups filtering in _compute_heterogeneity_test().
Tests: Present for gate removal, mutex preservation, finite signal, single-path telescope, zero-signal anti-regression, true multi-baseline no-warning behavior, survey TSL, replicate-weight df propagation, survey+bootstrap gate, DataFrame integration, summary rendering, unobserved path omission, and R parity (tests/test_chaisemartin_dhaultfoeuille.py::TestByPathHeterogeneity, tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneity, ::TestDCDHDynRParityByPathHeterogeneity).
Public docstrings: Updated in estimator and result dataclass docstrings.
Rendering surfaces: summary() now renders per-path heterogeneity rows; to_dataframe(level="by_path") includes always-present het_* columns.
Cross-doc consistency: CHANGELOG, Registry, API docs, guide, source docstrings, implementation, and visible tests are aligned for the new per-path heterogeneity behavior.

igerber · 2026-05-12T23:55:19Z

/ai-review

github-actions · 2026-05-13T00:05:13Z

🔁 AI review rerun (requested by @igerber)

Head SHA: cdbc6bb85f445d8b115de093d6a36bff1f5bb183

Overall Assessment

⚠️ Needs changes — highest unmitigated finding is P1 (the new Registry note overstates per-path predict_het parity with R), with additional P2 claim-vs-test gaps.

Executive Summary

Affected method: dCDH heterogeneity testing (Web Appendix Section 1.5 / Lemma 7) newly composed with by_path / paths_of_interest.
The estimator plumbing itself is mostly complete: the old fit-time gate is gone, path_heterogeneity_effects is wired into results, and both summary() and to_dataframe(level="by_path") were extended.
The feature-specific Methodology Registry note currently overclaims R parity on the placebo side.
The new replicate-weight test does not actually prove the final-df_survey refresh path added for path_heterogeneity_effects.
The public SE-parity claim (rtol=1e-6) is stronger than what the new parity tests enforce (SE_RTOL = 1e-5).
Survey coverage is only exercised through by_path, not the reciprocal paths_of_interest selector that the docs also advertise.

Methodology

M1 (P1) docs/methodology/REGISTRY.md (new “Per-path heterogeneity testing” paragraph) says placebo rows stay NaN because “R does not ship per-path predict_het on placebos either,” while the implementation hard-codes het_* placebo rows to NaN in diff_diff/chaisemartin_dhaultfoeuille_results.py:L1885-L1896. The visible DIDmultiplegtDYN docs say predict_het also shows placebo regression tables, and the by_path dispatcher forwards both placebo and predict_het into each per-path did_multiplegt_main call. Impact: the Methodology Registry currently presents this as parity-preserving when the visible upstream contract suggests it is still a Python-side limitation/deviation. Concrete fix: either implement per-path placebo heterogeneity / joint-null output, or rewrite the Registry / changelog / API text to call this an explicit deviation from R rather than parity. (rdrr.io)
M2 (P3) diff_diff/chaisemartin_dhaultfoeuille.py:L950-L957 still documents heterogeneity as a partial implementation (post-treatment regressions only, no placebo regressions or joint null test). Impact: this limitation itself is already documented and is not a blocker. Concrete fix: none, beyond keeping the new per-path docs aligned with that existing caveat.

Code Quality

No P0/P1/P2 findings in the changed Python implementation.

Performance

PF1 (P3) _compute_path_heterogeneity_test() re-enumerates treatment paths even though fit() already did that work for path_effects (diff_diff/chaisemartin_dhaultfoeuille.py:L6454-L6500). Impact: extra work and duplicate warning-suppression state, but not correctness-critical. Concrete fix: thread the selected path list / masks into the helper instead of recomputing them.

Maintainability

No P0/P1/P2 findings beyond the documentation/test issues below.

Tech Debt

No separate P0/P1/P2 tech-debt finding beyond the untracked test gaps in this PR.

Security

No security or secret-exposure issue is visible in the provided diff.

Documentation/Tests

T1 (P2) diff_diff/chaisemartin_dhaultfoeuille.py:L4047-L4065 adds a dedicated final-df_survey refresh loop for path_heterogeneity_effects, but tests/test_chaisemartin_dhaultfoeuille.py::TestByPathHeterogeneity.test_per_path_heterogeneity_replicate_weights_propagates_n_valid only checks survey_metadata.df_survey and finite se. Impact: this test would still pass if the new heterogeneity branch failed to append n_valid or left stale t_stat / p_value / conf_int, because earlier survey code paths already populate the shared accumulator. Concrete fix: assert that at least one (path, horizon) entry matches safe_inference(beta, se, df=res.survey_metadata.df_survey) after fit, ideally in a fixture where the final df differs from the compute-time snapshot.
T2 (P2) CHANGELOG.md:9 says per-path SE matches R within rtol=1e-6, but the new parity classes tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneity and ::TestDCDHDynRParityByPathHeterogeneity both set SE_RTOL = 1e-5. Impact: a 1e-6-to-1e-5 SE drift would still ship despite the public claim. Concrete fix: either tighten the tests to 1e-6, or relax the published claim to the tested tolerance.
T3 (P2) The code and docs claim both by_path and paths_of_interest compose with heterogeneity under survey_design (diff_diff/chaisemartin_dhaultfoeuille.py:L3872-L3895, CHANGELOG.md:9), but all new survey-specific tests in TestByPathHeterogeneity use by_path=...; none use paths_of_interest=[...]. Impact: the reciprocal selector surface is advertised but not regression-tested under analytical Binder TSL, replicate weights, or the survey+bootstrap gate. Concrete fix: add at least one analytical survey test and one replicate/gate test using paths_of_interest.

Path to Approval

Correct the per-path heterogeneity Registry/changelog wording around R placebo parity, or implement the missing per-path placebo/joint-test surface.
Add a replicate-weight regression that proves path_heterogeneity_effects inference fields are recomputed from the final survey_metadata.df_survey.
Align the stated SE-parity tolerance and the tests.
Add paths_of_interest + heterogeneity + survey_design regression coverage.

Address residual P1+P2s from re-audit of PR #412

The CHANGELOG paths_of_interest entry referenced `chaisemartin_dhaultfoeuille.py:1118` for the by_path precondition gate, but later edits (lifting the survey_design mutex in #408 and composing in heterogeneity in #412) have drifted the actual gate location to ~L1216. Drop the numeric anchor entirely and describe the gate textually so future edits do not silently invalidate the line reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The restored CI reviewer surfaced findings the degraded reviewer missed across all 5 prior rounds on PR igerber#412: P1 (REGISTRY + code comment): the claim that "R does not ship per-path predict_het on placebos either, so parity is preserved by deferral" contradicts what R's `did_multiplegt_dyn(..., by_path, predict_het)` dispatcher actually does - it forwards `predict_het` into each per-path `did_multiplegt_main` call along with `placebo`, so R may emit per-path placebo heterogeneity rows we do not yet mirror. Rewrite both surfaces (chaisemartin_dhaultfoeuille_results.py code comment and REGISTRY.md DataFrame-integration paragraph) as an explicit Python- side deferral rather than a verified R-parity. Add a TODO row to track validating R's actual placebo predict_het output and either implementing parity or documenting the deviation explicitly. P2 (REGISTRY rtol claim): the per-path heterogeneity R-parity paragraph claimed "rtol ~1e-6 on point estimates AND SE", but the parity tests use BETA_RTOL=1e-6 and SE_RTOL=1e-5 (one decade looser on SE). Split the claim into the two separate tolerances and note the WLS-denominator/cohort-recentering numerical drift that motivates the looser SE bound. P2 (replicate-weight df_survey refresh): the existing test only checked finite SE; it would have passed if the new dedicated heterogeneity refresh loop failed to recompute t_stat / p_value / conf_int at the final df_survey. Strengthen the test to call `safe_inference(beta, se, df=df_survey)` on the first finite entry and assert the stored inference fields match - this anti-regression covers the dedicated post-call refresh added for path_heterogeneity_ effects. P2 (paths_of_interest survey gap): the documented composability of `paths_of_interest + heterogeneity + survey_design` was not regression- locked - all existing survey-specific tests used `by_path=k`. Add test_paths_of_interest_heterogeneity_survey_design_analytical (verify analytical Binder TSL fits, selector ordering preserved, finite SE per populated (path, l)) and test_paths_of_interest_heterogeneity_ survey_n_bootstrap_gate (verify the multiplier-bootstrap gate applies under paths_of_interest too). No estimator behavior, weighting, variance/SE, identification, or default statistical surface changed in source - documentation accuracy plus expanded regression coverage only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Restored CI reviewer caught one actionable docs-drift item: stale mentions of "plug-in-only SE" and a `survey_design` mutex that igerber#408 itself lifted. 1. CHANGELOG paths_of_interest entry listed `survey_design` in the by_path precondition gate alongside `heterogeneity` / `design2` / `honest_did`. Both `survey_design` (lifted by igerber#408) and `heterogeneity` (composed in by igerber#412) have been removed from that gate. Update the parenthetical to reflect the shipped gate and note the lifted mutexes explicitly to avoid misleading readers about current behavior. 2. `_compute_path_effects` docstring described SE only as "plug-in via _plugin_se(...)". Add a dedicated SE-formation section listing the non-survey plug-in path and the new survey path (path-restricted per-period IF routed through `_survey_se_from_group_if`, replicate-weight df propagation via `replicate_n_valid_list`, and the shared post-call `_refresh_path_inference` reconciling stored inference fields against the final `df_survey`). 3. `_compute_path_placebos` docstring mirrored the same plug-in-only description. Add a parallel SE-formation section pointing back at `_compute_path_effects` for the shared survey contract. No runtime behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Address #412 holistic re-audit residuals (R2)

igerber added ready-for-ci Triggers CI test workflows and removed ready-for-ci Triggers CI test workflows labels May 10, 2026

igerber mentioned this pull request May 10, 2026

Replace CI Codex agent with single-shot Responses API + tighten verdict bar #415

Merged

igerber and others added 3 commits May 10, 2026 20:55

igerber force-pushed the dcdh-by-path-heterogeneity branch from 6c8a68c to a57f583 Compare May 11, 2026 00:55

igerber added the ready-for-ci Triggers CI test workflows label May 11, 2026

igerber merged commit 2f77f55 into main May 11, 2026
28 of 29 checks passed

igerber deleted the dcdh-by-path-heterogeneity branch May 11, 2026 23:32

igerber mentioned this pull request May 13, 2026

Address residual P1+P2s from re-audit of PR #412 #422

Merged

2 tasks

igerber added a commit that referenced this pull request May 13, 2026

Merge pull request #422 from igerber/fix-audit-412

9a2435a

Address residual P1+P2s from re-audit of PR #412

igerber mentioned this pull request May 13, 2026

Address residual P3 docs drift from re-audit of PR #408 #424

Merged

1 task

igerber mentioned this pull request May 14, 2026

Address #412 holistic re-audit residuals (R2) #430

Merged

1 task

igerber added a commit that referenced this pull request May 14, 2026

Merge pull request #430 from igerber/fix-audit-412-r2

93ff96d

Address #412 holistic re-audit residuals (R2)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dCDH by_path Wave 5 #11: + heterogeneity (predict_het per-by_level)#412

dCDH by_path Wave 5 #11: + heterogeneity (predict_het per-by_level)#412
igerber merged 4 commits into
mainfrom
dcdh-by-path-heterogeneity

igerber commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

igerber commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

igerber commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

igerber commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

igerber commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

Uh oh!

igerber commented May 12, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 10, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 10, 2026

PR Review Report

Overall Assessment

Executive Summary

Methodology

Finding M-1: P3 Informational — documented partial/deviation surface, no action required

Code Quality

Finding CQ-1: P1 — summary() does not render the new per-path heterogeneity surface

Performance

Maintainability

Finding MA-1: P1 — same as CQ-1

Tech Debt

Security

Documentation/Tests

Finding DT-1: P2 — summary coverage missing for the new public surface

Path to Approval

Uh oh!

igerber commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Overall Assessment

Executive Summary

Methodology

M-1: P3 Informational — documented partial R predict_het surface

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

DT-1: P3 Informational — local test execution unavailable

Uh oh!

igerber commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Overall Assessment

Executive Summary

Methodology

M-1: P3 Informational — documented partial R predict_het surface

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

DT-1: P3 Informational — local test execution unavailable

DT-2: P3 Informational — multi-baseline no-warning test is not actually multi-baseline

Uh oh!

igerber commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Overall Assessment

Executive Summary

Methodology

M-1 — P3 Informational: documented partial heterogeneity implementation

M-2 — No unmitigated methodology mismatch found in implementation

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

DT-1 — P2: claimed multi-baseline behavior is not actually tested

Path to Approval

Uh oh!

igerber commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Overall Assessment

Executive Summary

Methodology

M-1 — P3 Informational: documented partial heterogeneity implementation

M-2 — Previous P2 resolved: real multi-baseline behavior is now tested

Finding CQ-1: P1 — `summary()` does not render the new per-path heterogeneity surface

M-1: P3 Informational — documented partial R `predict_het` surface

M-1: P3 Informational — documented partial R `predict_het` surface