Skip to content

Phase 2: Conley space-time HAC + panel-estimator wire-up (MPD/TWFE)#426

Merged
igerber merged 7 commits into
mainfrom
spillover-conley-phase2
May 14, 2026
Merged

Phase 2: Conley space-time HAC + panel-estimator wire-up (MPD/TWFE)#426
igerber merged 7 commits into
mainfrom
spillover-conley-phase2

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 14, 2026

Summary

  • Phase 2 of the spillover-conley initiative. Adds the block-decomposed panel Conley sandwich (matches R conleyreg with lag_cutoff > 0) and lifts the Phase 1 fit-time rejection on MultiPeriodDiD and TwoWayFixedEffects.
  • DifferenceInDifferences(vcov_type="conley") continues to raise with a redirect to MPD/TWFE because DiD.fit() has no unit column declaration.
  • Three new R-conleyreg parity fixtures (panel_haversine_lag1, panel_haversine_lag2, panel_lat_lon_realistic_lag1) at lag ∈ {1, 2, 1}; parity verified at ~5.7e-16 max abs diff.
  • Includes the Codex local-review fixes: MPD + survey_design front-door reject (was a silent-bypass P0 — return_vcov=False skipped the conley + weights validator and compute_survey_vcov overwrote vcov), _validate_conley_kwargs rejects NaN / pd.NA unit IDs (was silently dropping rows from the per-unit serial HAC), MPD conley_coords None-guard (was raw TypeError), TWFE docstring updated.

Methodology references

  • Method name(s): Conley (1999) spatial HAC, panel block-decomposed form (matches R conleyreg lag_cutoff > 0)
  • Paper / source link(s): Conley, T. G. (1999). J. Econometrics 92(1), 1-45. Düsterhöft, C. (2021). conleyreg, CRAN v0.1.9 — https://github.com/cdueben/conleyreg (src/XeeXhC.cpp, src/time_dist.cpp). Newey & West (1987) for the Bartlett temporal weights.
  • Any intentional deviations from the source (and why):
    • Not a Driscoll-Kraay product kernel. Empirical spike against conleyreg::time_dist.cpp showed the temporal contribution is an additive within-unit Bartlett sandwich (lag=0 excluded), summed onto the within-period spatial sandwich. The original plan assumed a multiplicative K_space · K_time kernel; that was empirically refuted at ~1e-3 vs ~1e-14 for the block-decomposed form. Documented in REGISTRY.
    • Temporal kernel hardcoded Bartlett. conleyreg's kernel argument controls ONLY the spatial taper; the temporal kernel is unconditionally (1 - |lag|/(L+1)). diff-diff matches this asymmetry exactly for R parity. Documented as Note (deviation from R-symmetric API) in REGISTRY.

Validation

  • Tests added/updated: tests/test_conley_vcov.py (+756 lines, 4 new test classes / 23 new tests covering panel-helper math, validator co-requirements, NaN-unit / pd.NA rejection, MPD + survey reject, MPD missing-coords reject, TWFE block-decomposed integration, FWL composability, TestConleyParitySpacetime against the 3 new R fixtures).
  • Backtest / simulation / notebook evidence (if applicable): R conleyreg (CRAN v0.1.9) parity on 6 fixtures (3 cross-sectional + 3 panel) at atol=1e-6; observed max abs diff ~5.7e-16. Regenerable via Rscript benchmarks/R/generate_conley_golden.R.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

igerber and others added 2 commits May 13, 2026 19:40
Implements the spillover-conley initiative Phase 2: block-decomposed
panel Conley sandwich matching R conleyreg with lag_cutoff > 0.

The sandwich is XeeX_total = XeeX_spatial + XeeX_serial, where the
spatial component sums within each period only and the serial component
sums within each unit with Bartlett-style weights (1 - |lag|/(L+1)) for
lag in {1..L}. Same-time pairs are excluded from the serial component
to avoid double-counting the diagonal already in the spatial component.
The temporal kernel is hardcoded Bartlett regardless of conley_kernel,
matching conleyreg::time_dist.cpp. Spike against R conleyreg at
n=15 across lag_cutoff in {0, 1, 2} matched to ~1.8e-14.

API:
- compute_robust_vcov, solve_ols, LinearRegression gain three optional
  kwargs: conley_time, conley_unit, conley_lag_cutoff (three-way
  co-required at the validator).
- MultiPeriodDiD and TwoWayFixedEffects gain conley_lag_cutoff;
  conley_time / conley_unit auto-derived from data[time] / data[unit]
  at fit-time.
- DifferenceInDifferences(vcov_type='conley') continues to raise with
  a redirect to MPD/TWFE (DiD.fit() has no unit column declaration).
- SyntheticDiD rejection contract extended to conley_lag_cutoff.
- TWFE auto-cluster on the Conley path is silently dropped; explicit
  cluster= raises (combined kernel deferred). inference='wild_bootstrap'
  + conley raises (incompatible inference modes).

R parity:
- 3 new panel fixtures (panel_haversine_lag1, panel_haversine_lag2,
  panel_lat_lon_realistic_lag1) generated by benchmarks/R/
  generate_conley_golden.R; observed max abs diff ~5.7e-16.
- TestConleyParitySpacetime + TestConleyPanelHelper added.
- Pre-existing TestConleyEstimatorIntegration / TestConleyTWFE /
  TestConleyEstimatorValidation panel-rejection tests flipped to
  behavioral asserts where the rejection was lifted.

Doc surfaces:
- REGISTRY ConleySpatialHAC extended with the block-decomposed math,
  panel-API restrictions table, and a Note (deviation from R-symmetric
  API) for the hardcoded Bartlett temporal kernel.
- CHANGELOG Unreleased entry, llms.txt + llms-full.txt with the panel
  examples, README catalog line, paper-review updates.
- TODO.md: panel wire-up row removed (shipped); cluster combo +
  sparse k-d-tree rows reframed as follow-up.

Deferred (tracked in TODO.md as follow-up spillover-conley rows):
- Conley + cluster_ids combined kernel
- Sparse k-d-tree fast path for n > 20_000
- DifferenceInDifferences vcov_type='conley'
- Conley + weights / survey_design (Phase 5)
- SyntheticDiD vcov_type='conley'

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…uard

- Conley + survey_design front-door reject on MultiPeriodDiD.fit() so
  the user gets NotImplementedError instead of silent BRR/TSL SEs. The
  previous bypass: MPD passed return_vcov=not _use_survey_vcov to
  solve_ols, so the conley+weights guard inside _compute_robust_vcov_numpy
  never fired, and compute_survey_vcov silently overwrote the vcov.
- solve_ols also gains a top-level Conley-only validator that catches
  bypass cases for direct compute_robust_vcov callers. Scoped to Conley
  only — hc2_bm + replicate-weight silently routing to BRR is a
  long-standing intentional contract preserved by other tests.
- _validate_conley_kwargs now rejects NaN / pd.NA in conley_unit. The
  prior code accepted them and np.unique + boolean mask silently dropped
  those rows from the per-unit serial HAC sum.
- MultiPeriodDiD.fit() adds an explicit conley_coords / conley_cutoff_km
  None-guard so missing kwargs raise ValueError instead of a raw
  TypeError on `self.conley_coords[0]`.
- TwoWayFixedEffects class docstring updated from Phase 1
  "Conley rejected" / "Phase 2 will add Driscoll-Kraay product kernel"
  to the shipped block-decomposed contract.
- Regression tests in tests/test_conley_vcov.py:
  * TestConleyValidatorHelpers: NaN-float and pd.NA-object unit cases
  * TestConleyEstimatorIntegration: MPD + survey_design (pweight TSL and
    stratified PSU) raises; MPD missing conley_coords raises ValueError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall assessment

⚠️ Needs changes

Executive summary

  • One unmitigated P1: the new panel Conley path computes temporal lag from raw time labels, so conley_lag_cutoff depends on encoding rather than panel-period order.
  • The additive block-decomposed form and hardcoded Bartlett temporal kernel are documented in docs/methodology/REGISTRY.md; I am not treating those as defects.
  • The previous silent-bypass class of issues around Conley + survey/weights/cluster appears to be front-doored correctly in the changed estimator and solve_ols paths.
  • Minor follow-up: panel result summaries still assume Conley is unreachable and will omit the variance-family label.
  • I could not run the test suite in this sandbox because pytest is not installed, so this is a static review.

Methodology

  • Severity P1. Impact: MultiPeriodDiD and TwoWayFixedEffects pass raw time column values directly into the new panel Conley kernel (diff_diff/estimators.py:L1543-L1554, diff_diff/twfe.py:L364-L372), and _compute_conley_vcov then uses abs(t_i - t_j) as the temporal lag (diff_diff/conley.py:L343-L366) without validating or normalizing the time scale (diff_diff/conley.py:L227-L255). But the registry/docs describe conley_lag_cutoff as a lag in panel periods (docs/methodology/REGISTRY.md:L2940-L2945, diff_diff/guides/llms-full.txt:L1900-L1904). That contract only holds when time is already a dense, unit-spaced period index. Common encodings like 202012/202101, datetimes/Period, or the shipped TWFE example time="post", conley_lag_cutoff=2 (diff_diff/guides/llms-full.txt:L1927-L1933) violate that assumption, so valid serial pairs can be misweighted, dropped, or rejected based purely on label scale. Concrete fix: normalize ordered time levels to dense consecutive codes before building lag_mat, or reject non-dense/non-numeric encodings with a clear front-door ValueError and document that requirement on all panel Conley surfaces.
  • Severity P3. Impact: the additive XeeX_spatial + XeeX_serial form and hardcoded Bartlett temporal kernel are documented deviations in docs/methodology/REGISTRY.md:L2968-L2987; no action required.

Code Quality

  • No additional findings beyond the time-index contract issue above.

Performance

  • No findings. The dense O(n^2) path is explicitly documented and tracked in TODO.md, so I am not treating it as a blocker.

Maintainability

  • Severity P3. Impact: panel result summaries still assume Conley is unreachable on DiD/TWFE/MPD and therefore omit the variance-family label for newly supported Conley fits (diff_diff/results.py:L56-L80). Concrete fix: add a conley branch to _format_vcov_label, update the stale comments, and add summary coverage.

Tech Debt

  • No new untracked tech debt findings. The remaining Conley follow-ups are already tracked in TODO.md.

Security

  • No findings.

Documentation/Tests

  • No additional blocking findings beyond the methodology item above. The new parity/helper coverage only exercises sequential integer time labels (tests/test_conley_vcov.py:L1617-L1641 and nearby helper fixtures), so the time-encoding assumption is currently untested. Concrete fix: add regression cases for non-unit-spaced labels and binary-post TWFE Conley usage.

Path to Approval

  1. Add a single panel-time normalizer/validator for Conley. Either map ordered time levels to dense 0..T-1 codes before lag_mat = |t_i - t_j|, or reject any panel Conley request where the time input is not already a dense period index.
  2. Add regression tests for non-unit-spaced time labels (202012 -> 202101, datetime64/Period, and TWFE with binary post plus conley_lag_cutoff>1) and assert either correct dense-lag handling or a clean ValueError.
  3. Fix the user-facing docs/example so the HAC period index is unambiguous, especially the current TwoWayFixedEffects(... time="post", conley_lag_cutoff=2) example.

P1 (Methodology): `_compute_conley_vcov` previously used raw `time` values
in `abs(t_i - t_j)`, which meant `conley_lag_cutoff` semantics depended on
label encoding rather than panel-period order. On non-dense encodings
(YYYYMM like 202012/202101, datetime64, or binary `post`-style 0/1 panels)
the raw difference does not equal the count of panel periods, so valid
serial pairs could be silently dropped or misweighted.

The fix normalizes `time` to dense panel-period codes `0..T-1` via
`np.unique(return_inverse=True)` before the lag computation, so
`conley_lag_cutoff` always counts panel periods regardless of how `time`
is encoded (int year, YYYYMM, datetime64, pd.Period, strings). The
spatial within-period loop also uses the same dense codes for consistency.
On dense integer labels (the parity-test convention) this is a no-op and
R conleyreg parity holds at 1e-14; on non-dense encodings diff-diff is
the more robust default vs R's literal label-difference convention.

P3 (Maintainability): `_format_vcov_label` in results.py gains a "conley"
branch and surfaces "Conley spatial HAC (1999)" in DiD/MPD/TWFE result
summaries (was previously omitting the label because Conley used to be
unreachable on the panel surface).

Doc surfaces:
- REGISTRY § ConleySpatialHAC: new "Note (deviation from R conleyreg
  literal: time-label normalization)" documenting the dense-code
  convention and the R-divergence on non-dense encodings.
- llms-full.txt: fixed the misleading `time="post", conley_lag_cutoff=2`
  example (now uses `time="period"`), added a "Note on
  `conley_lag_cutoff` semantics" paragraph.

Regression tests:
- `TestConleyPanelHelper::test_time_label_normalization_non_unit_spaced_int`:
  year-like (2020, 2021, 2022) and YYYYMM (202011, 202012, 202101) labels
  produce the same vcov as dense codes (1, 2, 3).
- `TestConleyPanelHelper::test_time_label_normalization_datetime64`:
  irregularly-spaced datetime64 labels normalize correctly.
- `TestConleyTWFE::test_twfe_conley_binary_post_label_normalization`:
  TWFE with binary `post` (the exact example the codex reviewer flagged)
  produces finite SE.
- `TestConleyTWFE::test_twfe_conley_summary_emits_conley_label`:
  summary contains "Conley spatial HAC" for panel Conley fits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: fa820a8b357741003b20ea7029326d5c97293d27


Overall assessment

⚠️ Needs changes

Executive summary

  • The prior time-label P1 is only partially resolved: _compute_conley_vcov() now normalizes panel time labels to dense period codes, but the public MultiPeriodDiD / TwoWayFixedEffects wire-up still coerces time to float64, so the newly documented datetime64 / pd.Period / string contract is not actually supported end-to-end.
  • The additive block-decomposed panel sandwich, hardcoded Bartlett temporal kernel, and the deviation from literal R raw-time lag computation are all documented in docs/methodology/REGISTRY.md:L2968-L3002; I am not treating those as defects.
  • The earlier Conley + survey/weights silent-bypass class appears front-doored correctly on the changed paths in diff_diff/linalg.py:L663-L675, diff_diff/estimators.py:L1402-L1410.
  • MultiPeriodDiD’s public API docstring is stale and still describes pre-Phase-2 behavior.
  • I could not run pytest in this sandbox because pytest is not installed.

Methodology

  • Severity P1. Impact: MultiPeriodDiD.fit() and TwoWayFixedEffects.fit() still build conley_time via values.astype(np.float64) in diff_diff/estimators.py:L1547-L1554 and diff_diff/twfe.py:L364-L372. But the registry and guide now promise that panel Conley accepts arbitrary ordered labels and normalizes them internally, including datetime64, pd.Period, and strings, in docs/methodology/REGISTRY.md:L2991-L3002 and diff_diff/guides/llms-full.txt:L1927-L1937. That documented contract is therefore unreachable on the public estimator surfaces: numeric-like labels work, but non-float-castable labels fail before the helper’s dense-code normalization runs. Concrete fix: pass the raw ordered time array through to Conley (to_numpy() / values without float coercion), then add estimator-level regressions for non-dense, non-numeric labels; if TWFE is intentionally narrower, reject unsupported label types explicitly and narrow the docs/example to that actual contract.
  • Severity P3. Impact: the additive XeeX_spatial + XeeX_serial form, hardcoded Bartlett temporal kernel, and dense-code normalization deviation from literal R behavior are documented in docs/methodology/REGISTRY.md:L2968-L3002. Concrete fix: none.

Code Quality

  • No new findings.

Performance

  • No new findings. The dense O(n^2) follow-up remains tracked in TODO.md:L119-L120.

Maintainability

  • No new findings beyond the documentation drift noted below.

Tech Debt

  • No new untracked tech debt findings. The remaining Conley cluster/sparse/survey follow-ups are already tracked in TODO.md:L118-L123.

Security

  • No findings.

Documentation/Tests

  • Severity P3. Impact: the public MultiPeriodDiD docstring still says Conley is rejected at fit time and still says unit does not affect SE computation in diff_diff/estimators.py:L1036-L1050 and diff_diff/estimators.py:L1147-L1152, which now contradicts the shipped Phase 2 behavior. The new normalization coverage is also helper-level only in tests/test_conley_vcov.py:L1903-L1935, so it would not catch the wrapper-level coercion bug above. Concrete fix: update the API docstring to the Phase 2 contract and add estimator-level normalization regressions for MultiPeriodDiD / TwoWayFixedEffects.
  • I could not run the test suite because pytest is not installed in this sandbox.

Path to Approval

  1. Remove the astype(np.float64) coercion when MultiPeriodDiD and TwoWayFixedEffects construct conley_time, so the helper receives the raw ordered labels it was updated to normalize.
  2. Add regression tests that hit the public estimator surfaces, not just _compute_conley_vcov(), with non-dense label encodings. At minimum: MultiPeriodDiD with datetime64; if the documented contract remains broader, also pd.Period and string labels. For TWFE, either support that contract end-to-end or reject it explicitly.
  3. Update the stale MultiPeriodDiD API docstring and any guide/example text so the documented Conley time-label contract matches the shipped estimator behavior.

P1 (Methodology): the prior fix normalized `time` inside
`_compute_conley_vcov` but `MultiPeriodDiD.fit()` and
`TwoWayFixedEffects.fit()` still coerced `data[time].values.astype(np.float64)`
before passing to the helper. datetime64 / pd.Period / string time labels
fail before the helper's normalization runs, so the documented
"normalizes to dense panel-period codes" contract was unreachable on the
public estimator surfaces.

Fix: replace `.astype(np.float64)` with `np.asarray(...)` so the original
ordered labels reach the helper, which then normalizes via
`np.unique(return_inverse=True)`.

P3 (Documentation): updated the `MultiPeriodDiD` class docstring's
`vcov_type="conley"` bullet to describe the Phase 2 block-decomposed
contract (was still saying "rejected at fit-time" / "Phase 2 will add
the space-time product kernel"). Also updated the `unit` fit-arg
docstring to note it is REQUIRED when `vcov_type="conley"` rather than
"does NOT affect SE computation".

Regression: `test_multi_period_did_conley_with_datetime64_time` fits MPD
with `time_dt` (pd.to_datetime) and `time_int` (0,1,2) on the same panel
and asserts the diagonal SEs match at atol=1e-10. Verifies the end-to-end
estimator surface, not just the helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 14, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 3755a44ab18364ba0e831939bce33a0073fe8405


Overall assessment

⚠️ Needs changes

Executive summary

  • The prior public-surface time-label P1 is only partially resolved: MultiPeriodDiD now passes raw time labels through to Conley, but TwoWayFixedEffects still does not actually support the newly documented “arbitrary ordered labels / panel-period index” contract.
  • The additive block-decomposed sandwich, hardcoded Bartlett temporal kernel, and dense-code normalization are all documented in docs/methodology/REGISTRY.md; I am not treating those as methodology defects.
  • The earlier Conley + survey silent-bypass class appears fixed on the changed solve_ols/MultiPeriodDiD paths.
  • conley_lag_cutoff is a new public parameter that materially changes the vcov semantics, but it is not propagated into the panel result objects.
  • I could not run the test suite in this sandbox; importing the package failed because core Python deps are missing (numpy is not installed here).

Methodology

  • Severity P1. Impact: the TWFE half of the documented Phase 2 contract is still not true end-to-end. TwoWayFixedEffects.fit() explicitly treats time as a binary/numeric post indicator and builds _treatment_post = treated * time from the raw column, so nonnumeric labels and the new “panel-period index” guide example are not actually supported on this estimator. The code still warns that multi-period time produces treated * period_number, not the standard DiD ATT. See diff_diff/twfe.py:L206-L229 and diff_diff/twfe.py:L275-L279. That conflicts with the new public contract in diff_diff/guides/llms-full.txt:L1927-L1937 and the registry’s generic “datetime64 / pd.Period / strings” normalization note in docs/methodology/REGISTRY.md:L2991-L3002. Concrete fix: either explicitly narrow TWFE to binary numeric post indicators and reject unsupported label types before the Conley path, or implement a TWFE-specific recoding layer that makes the documented two-level nonnumeric labels actually work.
  • Severity P3. Impact: the load-bearing methodology deviations in this PR are documented, so they are not defects: additive XeeX_spatial + XeeX_serial, Bartlett-only temporal kernel, and dense-code normalization are all covered in docs/methodology/REGISTRY.md:L2968-L3002. Concrete fix: none.

Code Quality

  • No new findings.

Performance

  • No new findings. The dense O(n^2) path and sparse fast-path follow-up are already tracked in TODO.md:L119-L123.

Maintainability

Tech Debt

  • No new untracked tech-debt findings. Combined cluster kernels, survey-weighted Conley, callable-metric validation, and the sparse path are already tracked in TODO.md:L119-L123.

Security

  • No findings.

Documentation/Tests

  • Severity P3. Impact: the updated MultiPeriodDiD docstring says Conley + inference="wild_bootstrap" raises, but the implementation still warns and falls back to analytical inference. See diff_diff/estimators.py:L1036-L1049 versus diff_diff/estimators.py:L1176-L1185. Concrete fix: either add the explicit raise or rewrite the docstring and add a regression test for the intended behavior.
  • The new TWFE coverage does not exercise the documented nonnumeric time-label contract; the only new normalization test is still a 0/1 rename in tests/test_conley_vcov.py:L1384-L1402.
  • I could not run pytest here because the sandbox is missing Python dependencies.

Path to Approval

  1. Resolve the TWFE contract mismatch: either reject non-binary/non-numeric time inputs on the Conley TWFE path, or implement the recoding needed to make the documented two-level nonnumeric labels work. In either case, remove or replace the current multi-period TwoWayFixedEffects(... time="period" ...) Conley example.
  2. Add an end-to-end TWFE regression that covers the intended contract: either a passing datetime/Period/string case if support is intended, or an explicit reject test if TWFE remains binary-only.
  3. Thread conley_lag_cutoff into DiDResults / MultiPeriodDiDResults metadata and serialization so downstream consumers can tell which Conley variant produced the reported SEs.

…tion

P1 (Methodology): the prior commit's "arbitrary ordered labels" contract
was unreachable on TwoWayFixedEffects because TWFE's design step builds
`_treatment_post = treated * time` from raw column values, which fails
on datetime64 / pd.Period / string labels. Narrowing the docs to make
explicit that the non-numeric-label contract is MultiPeriodDiD-only (MPD
builds period dummies, not a `treated * time` product). TWFE inherits
its pre-existing numeric-time constraint.

- llms-full.txt: split the panel example into a TWFE block (binary post
  indicator only, `conley_lag_cutoff=1`) and an MPD block (multi-period
  time, arbitrary orderable encoding). New caveat paragraph spells out
  the TWFE numeric-time constraint.
- test_twfe_conley_non_numeric_time_fails: TWFE + string-encoded time
  raises a clean error (string * int multiplication fails inside the
  estimator) — regression for the narrowed contract.

P1 (Maintainability): `conley_lag_cutoff` is a new public parameter that
materially changes vcov semantics (0 = spatial only, >0 = adds within-
unit Bartlett serial HAC), but the result objects didn't expose it.

- DiDResults + MultiPeriodDiDResults: new `conley_lag_cutoff: Optional[int]`
  field threaded through the estimator-side construction.
- `_format_vcov_label` now includes `lag_cutoff=<int>` in the Conley
  label so summary() readers can tell which Conley variant produced
  the reported SEs (e.g. "Conley spatial HAC (1999), lag_cutoff=1").
- TestConleyTWFE summary test asserts both the label format and the
  programmatic `res.conley_lag_cutoff` accessor.

P3 (Doc/test): MPD docstring claimed `inference="wild_bootstrap"` + conley
raises but only TWFE had that guard. Added the explicit raise in MPD.fit()
to match the documented contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: bcada5c28667261ef1bd1ef389192f059add4341


Overall Assessment

⚠️ Needs changes

Executive Summary

  • The prior TWFE public-contract P1 is resolved: the docs now narrow TWFE Conley to numeric/binary post indicators, while MultiPeriodDiD keeps the generic time-label normalization contract (diff_diff/guides/llms-full.txt:L1927-L1966).
  • The load-bearing methodology deviations in this PR are documented in the Methodology Registry, so I am not treating them as defects.
  • The earlier Conley+survey silent-bypass class appears fixed: solve_ols() now front-doors the Conley+weights/cluster rejection, and MPD/TWFE add estimator-level Conley guards (diff_diff/linalg.py:L663-L675, diff_diff/estimators.py:L1391-L1427, diff_diff/twfe.py:L171-L201).
  • The previous conley_lag_cutoff propagation P1 is only partially resolved: result objects and summary() now carry it, but to_dict() still drops it, so serialized outputs remain ambiguous.
  • I could not run the test suite here because this sandbox does not have numpy or pytest installed.

Methodology

  • Severity P3. Impact: No unmitigated methodology defect found in the changed estimator math. The panel Conley implementation in diff_diff/conley.py:L291-L384 matches the documented registry contract rather than raw Conley (1999): additive XeeX_spatial + XeeX_serial, Bartlett-only temporal kernel, and dense-code time normalization are all explicitly documented in docs/methodology/REGISTRY.md:L2968-L3002. Concrete fix: none.

Code Quality

  • No findings.

Performance

  • Severity P3. Impact: The dense O(n^2) Conley path remains, but it is already warned in diff_diff/conley.py:L257-L265 and tracked as deferred work in TODO.md:L119-L120, so this is not a blocker for this PR. Concrete fix: none.

Maintainability

  • Severity P1. Impact: conley_lag_cutoff still does not survive result serialization. The field is present on the result dataclasses and used in summary() (diff_diff/results.py:L144-L146, diff_diff/results.py:L223-L231, diff_diff/results.py:L465-L467, diff_diff/results.py:L565-L573), but both serializers omit it (diff_diff/results.py:L274-L300, diff_diff/results.py:L705-L738). Downstream to_dict() / to_dataframe() consumers still cannot tell whether the reported SEs came from within-period-only Conley (lag_cutoff=0) or the panel space-time extension (lag_cutoff>0). Concrete fix: add at least vcov_type, cluster_name, and conley_lag_cutoff to both to_dict() methods and cover that in regression tests.

Tech Debt

  • Severity P3. Impact: Remaining Conley follow-ups such as DiD support, combined spatial+cluster kernels, sparse fast path, and survey-weighted Conley are explicitly tracked in TODO.md:L118-L123; per the review policy these are not blockers. Concrete fix: none.

Security

  • No findings.

Documentation/Tests

  • Severity P3. Impact: The new Conley tests verify summary/field propagation (tests/test_conley_vcov.py:L1404-L1422) but do not cover the serialized result surfaces where the remaining metadata gap lives. Concrete fix: once the serializer gap is fixed, add assertions on serialized TWFE and MPD Conley results.

Path to Approval

  1. Add vcov_type, cluster_name, and conley_lag_cutoff to DiDResults.to_dict() and MultiPeriodDiDResults.to_dict().
  2. Add targeted regressions in tests/test_conley_vcov.py that serialize one TWFE Conley fit and one MPD Conley fit and assert the exported dict contains vcov_type == "conley" and the expected conley_lag_cutoff.

…o_dict

P1 (Maintainability): the prior commit added `conley_lag_cutoff` to the
result dataclasses and summary(), but `DiDResults.to_dict()` and
`MultiPeriodDiDResults.to_dict()` still omitted it. Downstream programmatic
consumers (notebooks, adapters, pipelines) that serialize results to dicts
couldn't tell which Conley variant produced the SEs.

Fix: both `to_dict()` methods now include `vcov_type`, `cluster_name`, and
`conley_lag_cutoff` when set. Conditional emission preserves the existing
behavior for non-conley / non-cluster fits (no new keys appear in the
serialized dict for unrelated estimators).

Regression: `test_twfe_conley_to_dict_carries_lag_cutoff` and
`test_multi_period_did_conley_to_dict_carries_lag_cutoff` fit a TWFE +
MPD Conley panel and assert `to_dict()` exposes the expected fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 2cf5ebed3c94d0283944e91472090ce4597d75d1


Overall Assessment

⚠️ Needs changes

Executive Summary

  • The prior serializer finding is resolved: both result serializers now carry vcov_type and conley_lag_cutoff when present (diff_diff/results.py:L265-L294, diff_diff/results.py:L704-L735).
  • The affected method is ConleySpatialHAC as wired into MultiPeriodDiD and TwoWayFixedEffects. The additive panel block decomposition, Bartlett-only temporal kernel, and dense time normalization are documented in the Methodology Registry, so I am not treating those as defects (docs/methodology/REGISTRY.md:L2968-L3002, diff_diff/conley.py:L291-L384).
  • The earlier MPD Conley+survey silent-bypass class appears fixed by the front-door Conley validation in solve_ols() plus the estimator-level MPD survey guard (diff_diff/linalg.py:L663-L675, diff_diff/estimators.py:L1411-L1418).
  • Remaining P1: TWFE Conley fits still record cluster_name="unit" in result metadata / serialized outputs even though the Conley path explicitly drops TWFE’s auto-cluster before fitting (diff_diff/twfe.py:L364-L381, diff_diff/twfe.py:L560-L597, diff_diff/results.py:L289-L294).
  • The new tests cover Conley summary labeling and conley_lag_cutoff serialization, but they do not assert that TWFE clears cluster_name on the Conley path, so this provenance bug currently escapes (tests/test_conley_vcov.py:L1449-L1485).
  • Static review only: I could not run the suite in this sandbox because pytest and numpy are unavailable.

Methodology

  • Severity P3. Impact: No unmitigated methodology defect found in the changed estimator math. This PR affects the ConleySpatialHAC registry entry on MultiPeriodDiD / TwoWayFixedEffects; the code matches the documented conleyreg-style block-decomposed panel contract rather than the symmetric space-time kernel discussed in the acreg/Colella review, and that choice is explicitly documented as a registry note/deviation (docs/methodology/papers/conley-1999-review.md:L56-L60, docs/methodology/papers/colella-et-al-2019-review.md:L82-L95, docs/methodology/REGISTRY.md:L2968-L3002, diff_diff/conley.py:L291-L384). Concrete fix: none.

Code Quality

  • No findings.

Performance

  • Severity P3. Impact: The dense O(n^2) Conley distance-matrix path remains, but it is already warned in the validator and tracked in TODO.md, so it is not a blocker for this PR (diff_diff/conley.py:L257-L265, TODO.md:L120-L123). Concrete fix: none.

Maintainability

  • Severity P1. Impact: TwoWayFixedEffects(vcov_type="conley") violates the variance-provenance contract by composing cluster_name from cluster_var instead of from the actual cluster input used in estimation. TWFE defaults cluster_var to unit (diff_diff/twfe.py:L247-L273), then explicitly suppresses clustering on the Conley path via _conley_cluster_override = None (diff_diff/twfe.py:L364-L381), but still stores _twfe_cluster_label = unit in the result (diff_diff/twfe.py:L560-L597). Because DiDResults.to_dict() serializes any non-None cluster_name (diff_diff/results.py:L136-L145, diff_diff/results.py:L289-L294), downstream consumers are told these SEs were clustered when the documented Conley contract says TWFE auto-clustering is dropped (docs/methodology/REGISTRY.md:L2954-L2956). This is a P1 semantic-contract violation in composed metadata. Concrete fix: set TWFE result cluster_name from the actual cluster IDs passed to LinearRegression; on the Conley path that should resolve to None.

Tech Debt

  • Severity P3. Impact: The remaining spillover-Conley follow-ups (DiD support, combined spatial+cluster kernel, survey-weighted Conley, callable-metric validation) are already tracked in TODO.md, so they are not blockers under the review policy (TODO.md:L118-L123). Concrete fix: none.

Security

  • No findings.

Documentation/Tests

  • Severity P3. Impact: The added TWFE Conley tests assert summary labeling and conley_lag_cutoff serialization, but they never check that cluster_name is cleared when auto-cluster is dropped (tests/test_conley_vcov.py:L1449-L1485). That is a gap relative to the existing TWFE variance-metadata tests, which already lock cluster_name is None when a TWFE path intentionally disables auto-clustering (tests/test_estimators_vcov_type.py:L603-L639). Concrete fix: add a TWFE Conley regression that asserts res.cluster_name is None and that res.to_dict() does not advertise cluster_name.
  • Static review only: test execution was not possible in this sandbox because pytest and numpy are unavailable.

Path to Approval

  1. In TwoWayFixedEffects.fit, derive cluster_name from the cluster IDs actually passed into LinearRegression instead of from the pre-Conley cluster_var default; on the Conley path this should be None.
  2. Add a regression test for TwoWayFixedEffects(vcov_type="conley", conley_lag_cutoff=...) asserting both res.cluster_name is None and that the serialized result omits cluster_name (or leaves it None).

P1 (Maintainability): TWFE drops its auto-unit-cluster on the Conley path
(`_conley_cluster_override = None` in the LinearRegression call) but still
recorded `_twfe_cluster_label = unit` in the result metadata. Downstream
consumers reading `res.cluster_name` or `res.to_dict()["cluster_name"]`
were told the SEs were CR1-clustered when they were actually Conley
spatial HAC with no clustering.

Fix: when `_fit_vcov_type == "conley"`, set `_twfe_cluster_label = None`
so result-level provenance mirrors the actual cluster IDs passed to
LinearRegression.

Regression: `test_twfe_conley_cluster_name_is_none` asserts both
`res.cluster_name is None` and that `to_dict()` doesn't advertise the
`cluster_name` key on a TWFE Conley fit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: a0ff9bee8d523cc62239ec91f19e37a1f845bc2d


Overall Assessment

✅ Looks good

Executive Summary

  • The affected methodology is ConleySpatialHAC as wired into MultiPeriodDiD and TwoWayFixedEffects; DifferenceInDifferences(vcov_type="conley") remains intentionally rejected because DiD.fit() still has no unit= declaration (diff_diff/conley.py:L281-L384, diff_diff/estimators.py:L361-L377, diff_diff/estimators.py:L1386-L1427, diff_diff/twfe.py:L165-L201).
  • The panel estimator math matches the documented Methodology Registry contract: additive block decomposition, Bartlett-only temporal kernel, and dense time-label normalization are all explicitly documented deviations, so they are not defects under the review policy (docs/methodology/REGISTRY.md:L2968-L3002, docs/methodology/papers/conley-1999-review.md:L280-L282).
  • The prior re-review P1 is resolved: TWFE now clears cluster_name on the Conley path, and the regression test locks that metadata behavior (diff_diff/twfe.py:L359-L381, diff_diff/twfe.py:L560-L603, tests/test_conley_vcov.py:L1487-L1505).
  • The earlier survey-bypass class is also closed by front-door guards before survey vcov can overwrite a Conley request (diff_diff/linalg.py:L663-L675, diff_diff/estimators.py:L1411-L1418, diff_diff/linalg.py:L2720-L2737, tests/test_conley_vcov.py:L1102-L1179).
  • The new tests cover validator edge cases, MPD/TWFE integration, panel parity against R conleyreg, and direct helper math; I did not find a new unmitigated P0/P1 in the changed code (tests/test_conley_vcov.py:L345-L470, tests/test_conley_vcov.py:L984-L1592, tests/test_conley_vcov.py:L1805-L2175).
  • Static review only: I could not execute the suite in this sandbox because pytest is unavailable.

Methodology

  • Severity P3 (informational). Impact: No unmitigated methodology defect found in the changed estimator math. The PR’s panel Conley implementation matches the Registry’s documented conleyreg-style block-decomposed contract rather than a multiplicative product kernel, and the documented Bartlett-only temporal kernel / dense time normalization are valid documented deviations, not review defects. Concrete fix: none (diff_diff/conley.py:L291-L384, docs/methodology/REGISTRY.md:L2968-L3002, docs/methodology/papers/conley-1999-review.md:L280-L282).

Code Quality

  • No findings. The changed estimator paths do not introduce the inline inference anti-pattern or partial-NaN inference bug called out in the review rules: MultiPeriodDiD uses safe_inference() for period and average effects, and TWFE continues to use the shared LinearRegression inference path (diff_diff/estimators.py:L1793-L1852, diff_diff/twfe.py:L537-L544).

Performance

  • Severity P3 (informational). Impact: The Conley implementation still builds dense distance matrices, so large-n memory remains the main performance limit, but the warning is explicit and the sparse fast path is already tracked in TODO.md, making this non-blocking for the PR. Concrete fix: none; tracked follow-up (diff_diff/conley.py:L257-L265, TODO.md:L120-L123).

Maintainability

  • No findings. The previous TWFE variance-provenance issue is fixed: the Conley path drops auto-clustering in both the estimator wiring and the serialized results surface, so metadata now reflects the actual variance path (diff_diff/twfe.py:L359-L381, diff_diff/twfe.py:L560-L603, diff_diff/results.py:L287-L294, tests/test_conley_vcov.py:L1487-L1505).

Tech Debt

  • Severity P3 (informational). Impact: The remaining Conley follow-ups are properly tracked and therefore non-blocking under the stated review policy: DiD support, combined spatial+cluster kernel, sparse k-d-tree path, survey-weighted Conley, and callable-metric validation. Concrete fix: none; tracked in TODO.md (TODO.md:L118-L123).

Security

  • No findings.

Documentation/Tests

  • Severity P3 (informational). Impact: The Registry documents time-label normalization for integer, datetime64, pd.Period, and string encodings, but the new regression tests only pin integer and datetime64 cases. That leaves part of the advertised normalization surface unpinned by tests. Concrete fix: add helper-level regressions for pd.Period and string labels with intended chronological ordering (docs/methodology/REGISTRY.md:L2991-L3002, tests/test_conley_vcov.py:L2046-L2114).
  • Static review only: I could not execute tests/test_conley_vcov.py in this sandbox because pytest is unavailable (/bin/bash: pytest: command not found).

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 14, 2026
@igerber igerber merged commit b56e232 into main May 14, 2026
31 of 32 checks passed
@igerber igerber deleted the spillover-conley-phase2 branch May 14, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant