Phase 2: Conley space-time HAC + panel-estimator wire-up (MPD/TWFE)#426
Conversation
Implements the spillover-conley initiative Phase 2: block-decomposed
panel Conley sandwich matching R conleyreg with lag_cutoff > 0.
The sandwich is XeeX_total = XeeX_spatial + XeeX_serial, where the
spatial component sums within each period only and the serial component
sums within each unit with Bartlett-style weights (1 - |lag|/(L+1)) for
lag in {1..L}. Same-time pairs are excluded from the serial component
to avoid double-counting the diagonal already in the spatial component.
The temporal kernel is hardcoded Bartlett regardless of conley_kernel,
matching conleyreg::time_dist.cpp. Spike against R conleyreg at
n=15 across lag_cutoff in {0, 1, 2} matched to ~1.8e-14.
API:
- compute_robust_vcov, solve_ols, LinearRegression gain three optional
kwargs: conley_time, conley_unit, conley_lag_cutoff (three-way
co-required at the validator).
- MultiPeriodDiD and TwoWayFixedEffects gain conley_lag_cutoff;
conley_time / conley_unit auto-derived from data[time] / data[unit]
at fit-time.
- DifferenceInDifferences(vcov_type='conley') continues to raise with
a redirect to MPD/TWFE (DiD.fit() has no unit column declaration).
- SyntheticDiD rejection contract extended to conley_lag_cutoff.
- TWFE auto-cluster on the Conley path is silently dropped; explicit
cluster= raises (combined kernel deferred). inference='wild_bootstrap'
+ conley raises (incompatible inference modes).
R parity:
- 3 new panel fixtures (panel_haversine_lag1, panel_haversine_lag2,
panel_lat_lon_realistic_lag1) generated by benchmarks/R/
generate_conley_golden.R; observed max abs diff ~5.7e-16.
- TestConleyParitySpacetime + TestConleyPanelHelper added.
- Pre-existing TestConleyEstimatorIntegration / TestConleyTWFE /
TestConleyEstimatorValidation panel-rejection tests flipped to
behavioral asserts where the rejection was lifted.
Doc surfaces:
- REGISTRY ConleySpatialHAC extended with the block-decomposed math,
panel-API restrictions table, and a Note (deviation from R-symmetric
API) for the hardcoded Bartlett temporal kernel.
- CHANGELOG Unreleased entry, llms.txt + llms-full.txt with the panel
examples, README catalog line, paper-review updates.
- TODO.md: panel wire-up row removed (shipped); cluster combo +
sparse k-d-tree rows reframed as follow-up.
Deferred (tracked in TODO.md as follow-up spillover-conley rows):
- Conley + cluster_ids combined kernel
- Sparse k-d-tree fast path for n > 20_000
- DifferenceInDifferences vcov_type='conley'
- Conley + weights / survey_design (Phase 5)
- SyntheticDiD vcov_type='conley'
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…uard
- Conley + survey_design front-door reject on MultiPeriodDiD.fit() so
the user gets NotImplementedError instead of silent BRR/TSL SEs. The
previous bypass: MPD passed return_vcov=not _use_survey_vcov to
solve_ols, so the conley+weights guard inside _compute_robust_vcov_numpy
never fired, and compute_survey_vcov silently overwrote the vcov.
- solve_ols also gains a top-level Conley-only validator that catches
bypass cases for direct compute_robust_vcov callers. Scoped to Conley
only — hc2_bm + replicate-weight silently routing to BRR is a
long-standing intentional contract preserved by other tests.
- _validate_conley_kwargs now rejects NaN / pd.NA in conley_unit. The
prior code accepted them and np.unique + boolean mask silently dropped
those rows from the per-unit serial HAC sum.
- MultiPeriodDiD.fit() adds an explicit conley_coords / conley_cutoff_km
None-guard so missing kwargs raise ValueError instead of a raw
TypeError on `self.conley_coords[0]`.
- TwoWayFixedEffects class docstring updated from Phase 1
"Conley rejected" / "Phase 2 will add Driscoll-Kraay product kernel"
to the shipped block-decomposed contract.
- Regression tests in tests/test_conley_vcov.py:
* TestConleyValidatorHelpers: NaN-float and pd.NA-object unit cases
* TestConleyEstimatorIntegration: MPD + survey_design (pweight TSL and
stratified PSU) raises; MPD missing conley_coords raises ValueError.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall assessment Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
P1 (Methodology): `_compute_conley_vcov` previously used raw `time` values in `abs(t_i - t_j)`, which meant `conley_lag_cutoff` semantics depended on label encoding rather than panel-period order. On non-dense encodings (YYYYMM like 202012/202101, datetime64, or binary `post`-style 0/1 panels) the raw difference does not equal the count of panel periods, so valid serial pairs could be silently dropped or misweighted. The fix normalizes `time` to dense panel-period codes `0..T-1` via `np.unique(return_inverse=True)` before the lag computation, so `conley_lag_cutoff` always counts panel periods regardless of how `time` is encoded (int year, YYYYMM, datetime64, pd.Period, strings). The spatial within-period loop also uses the same dense codes for consistency. On dense integer labels (the parity-test convention) this is a no-op and R conleyreg parity holds at 1e-14; on non-dense encodings diff-diff is the more robust default vs R's literal label-difference convention. P3 (Maintainability): `_format_vcov_label` in results.py gains a "conley" branch and surfaces "Conley spatial HAC (1999)" in DiD/MPD/TWFE result summaries (was previously omitting the label because Conley used to be unreachable on the panel surface). Doc surfaces: - REGISTRY § ConleySpatialHAC: new "Note (deviation from R conleyreg literal: time-label normalization)" documenting the dense-code convention and the R-divergence on non-dense encodings. - llms-full.txt: fixed the misleading `time="post", conley_lag_cutoff=2` example (now uses `time="period"`), added a "Note on `conley_lag_cutoff` semantics" paragraph. Regression tests: - `TestConleyPanelHelper::test_time_label_normalization_non_unit_spaced_int`: year-like (2020, 2021, 2022) and YYYYMM (202011, 202012, 202101) labels produce the same vcov as dense codes (1, 2, 3). - `TestConleyPanelHelper::test_time_label_normalization_datetime64`: irregularly-spaced datetime64 labels normalize correctly. - `TestConleyTWFE::test_twfe_conley_binary_post_label_normalization`: TWFE with binary `post` (the exact example the codex reviewer flagged) produces finite SE. - `TestConleyTWFE::test_twfe_conley_summary_emits_conley_label`: summary contains "Conley spatial HAC" for panel Conley fits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
P1 (Methodology): the prior fix normalized `time` inside `_compute_conley_vcov` but `MultiPeriodDiD.fit()` and `TwoWayFixedEffects.fit()` still coerced `data[time].values.astype(np.float64)` before passing to the helper. datetime64 / pd.Period / string time labels fail before the helper's normalization runs, so the documented "normalizes to dense panel-period codes" contract was unreachable on the public estimator surfaces. Fix: replace `.astype(np.float64)` with `np.asarray(...)` so the original ordered labels reach the helper, which then normalizes via `np.unique(return_inverse=True)`. P3 (Documentation): updated the `MultiPeriodDiD` class docstring's `vcov_type="conley"` bullet to describe the Phase 2 block-decomposed contract (was still saying "rejected at fit-time" / "Phase 2 will add the space-time product kernel"). Also updated the `unit` fit-arg docstring to note it is REQUIRED when `vcov_type="conley"` rather than "does NOT affect SE computation". Regression: `test_multi_period_did_conley_with_datetime64_time` fits MPD with `time_dt` (pd.to_datetime) and `time_int` (0,1,2) on the same panel and asserts the diagonal SEs match at atol=1e-10. Verifies the end-to-end estimator surface, not just the helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…tion P1 (Methodology): the prior commit's "arbitrary ordered labels" contract was unreachable on TwoWayFixedEffects because TWFE's design step builds `_treatment_post = treated * time` from raw column values, which fails on datetime64 / pd.Period / string labels. Narrowing the docs to make explicit that the non-numeric-label contract is MultiPeriodDiD-only (MPD builds period dummies, not a `treated * time` product). TWFE inherits its pre-existing numeric-time constraint. - llms-full.txt: split the panel example into a TWFE block (binary post indicator only, `conley_lag_cutoff=1`) and an MPD block (multi-period time, arbitrary orderable encoding). New caveat paragraph spells out the TWFE numeric-time constraint. - test_twfe_conley_non_numeric_time_fails: TWFE + string-encoded time raises a clean error (string * int multiplication fails inside the estimator) — regression for the narrowed contract. P1 (Maintainability): `conley_lag_cutoff` is a new public parameter that materially changes vcov semantics (0 = spatial only, >0 = adds within- unit Bartlett serial HAC), but the result objects didn't expose it. - DiDResults + MultiPeriodDiDResults: new `conley_lag_cutoff: Optional[int]` field threaded through the estimator-side construction. - `_format_vcov_label` now includes `lag_cutoff=<int>` in the Conley label so summary() readers can tell which Conley variant produced the reported SEs (e.g. "Conley spatial HAC (1999), lag_cutoff=1"). - TestConleyTWFE summary test asserts both the label format and the programmatic `res.conley_lag_cutoff` accessor. P3 (Doc/test): MPD docstring claimed `inference="wild_bootstrap"` + conley raises but only TWFE had that guard. Added the explicit raise in MPD.fit() to match the documented contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…o_dict P1 (Maintainability): the prior commit added `conley_lag_cutoff` to the result dataclasses and summary(), but `DiDResults.to_dict()` and `MultiPeriodDiDResults.to_dict()` still omitted it. Downstream programmatic consumers (notebooks, adapters, pipelines) that serialize results to dicts couldn't tell which Conley variant produced the SEs. Fix: both `to_dict()` methods now include `vcov_type`, `cluster_name`, and `conley_lag_cutoff` when set. Conditional emission preserves the existing behavior for non-conley / non-cluster fits (no new keys appear in the serialized dict for unrelated estimators). Regression: `test_twfe_conley_to_dict_carries_lag_cutoff` and `test_multi_period_did_conley_to_dict_carries_lag_cutoff` fit a TWFE + MPD Conley panel and assert `to_dict()` exposes the expected fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
P1 (Maintainability): TWFE drops its auto-unit-cluster on the Conley path (`_conley_cluster_override = None` in the LinearRegression call) but still recorded `_twfe_cluster_label = unit` in the result metadata. Downstream consumers reading `res.cluster_name` or `res.to_dict()["cluster_name"]` were told the SEs were CR1-clustered when they were actually Conley spatial HAC with no clustering. Fix: when `_fit_vcov_type == "conley"`, set `_twfe_cluster_label = None` so result-level provenance mirrors the actual cluster IDs passed to LinearRegression. Regression: `test_twfe_conley_cluster_name_is_none` asserts both `res.cluster_name is None` and that `to_dict()` doesn't advertise the `cluster_name` key on a TWFE Conley fit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
conleyregwithlag_cutoff > 0) and lifts the Phase 1 fit-time rejection onMultiPeriodDiDandTwoWayFixedEffects.DifferenceInDifferences(vcov_type="conley")continues to raise with a redirect to MPD/TWFE becauseDiD.fit()has nounitcolumn declaration.conleyregparity fixtures (panel_haversine_lag1,panel_haversine_lag2,panel_lat_lon_realistic_lag1) atlag ∈ {1, 2, 1}; parity verified at ~5.7e-16 max abs diff.survey_designfront-door reject (was a silent-bypass P0 —return_vcov=Falseskipped theconley + weightsvalidator andcompute_survey_vcovoverwrote vcov),_validate_conley_kwargsrejects NaN /pd.NAunit IDs (was silently dropping rows from the per-unit serial HAC), MPDconley_coordsNone-guard (was rawTypeError), TWFE docstring updated.Methodology references
conleyreglag_cutoff > 0)conleyreg, CRAN v0.1.9 — https://github.com/cdueben/conleyreg (src/XeeXhC.cpp,src/time_dist.cpp). Newey & West (1987) for the Bartlett temporal weights.conleyreg::time_dist.cppshowed the temporal contribution is an additive within-unit Bartlett sandwich (lag=0 excluded), summed onto the within-period spatial sandwich. The original plan assumed a multiplicativeK_space · K_timekernel; that was empirically refuted at ~1e-3 vs ~1e-14 for the block-decomposed form. Documented in REGISTRY.conleyreg'skernelargument controls ONLY the spatial taper; the temporal kernel is unconditionally(1 - |lag|/(L+1)). diff-diff matches this asymmetry exactly for R parity. Documented asNote (deviation from R-symmetric API)in REGISTRY.Validation
tests/test_conley_vcov.py(+756 lines, 4 new test classes / 23 new tests covering panel-helper math, validator co-requirements, NaN-unit /pd.NArejection, MPD + survey reject, MPD missing-coords reject, TWFE block-decomposed integration, FWL composability,TestConleyParitySpacetimeagainst the 3 new R fixtures).conleyreg(CRAN v0.1.9) parity on 6 fixtures (3 cross-sectional + 3 panel) atatol=1e-6; observed max abs diff ~5.7e-16. Regenerable viaRscript benchmarks/R/generate_conley_golden.R.Security / privacy