Stage 5: Imputations
Most ROIDs have no raw or transformed rate. Imputations estimate missing values using a hierarchical fallback chain.
Why Imputations Are Needed
Payer MRFs and hospital MRFs don't cover everything. Where raw data reveals the shape of a contract, we make smart inferences based on inferred provisions — detecting MS-DRG base rates, OP surgical groupers, and OP percentage base rates — and apply them to fill missing ROIDs.
Long Rates — The Raw Material
Before any imputation runs, available raw and transformed rates are pivoted into "long" format: one row per rate value. This enables aggregations across similar ROIDs, which is the foundation of every imputation tier.
1
Long rates — raw columns
Iterates over raw dollar/percentage rate columns. For each non-NULL value, emits one row with: roid, rate, accuracy_score, colname, rate_type, provider_type, payer_id. Excludes J/Q drug codes.
→ tmp_int_imputations_long_rates_raw_columns
2
Long rates — transformed columns
Same structure as raw columns but sourced from transformation columns. Each non-NULL transformed rate becomes one row.
→ tmp_int_imputations_long_rates_transformed_columns
3
Long rates — combined
UNION of raw and transformed long rates. Only non-outlier rates (accuracy_score > min_score) feed in — prevents bad rates from contaminating imputed values.
→ tmp_int_imputations_long_rates
Full Imputation Chain
1
RC Global + RC HCPCS + RC Carveouts
Use revenue code rates from hospital MRFs to impute HCPCS codes. RC Global applies a family-level rate to all HCPCS in that family. RC HCPCS uses a validated crosswalk table. RC Carveouts handle specific codes with overriding prices.
→ tmp_int_imputations_rc_global, tmp_int_imputations_rc_hcpcs
2
MS-DRG Base Rate Detection
Detect base rate structure: if observed rates closely follow rate ≈ base × CMS DRG weight, impute missing DRGs using the detected base rate. Requires minimum 5 observed DRGs with CV < 0.15.
→ tmp_int_msdrg_base_rates
3
Main Imputations (chunked by payer)
Core imputation. For each ROID with no raw rate, tries 5 aggregation tiers from most specific (provider + payer + network + code) to least specific (code nationally). Stops at the first tier meeting minimum N.
→ tmp_int_imputations
4
Derived Imputations (chunked by payer)
Secondary gap-filling for Hospital and ASC ROIDs using rate_object_space, gross charges, and long rates. Handles percentage-of-charges structures that could not be resolved in the main pass.
→ tmp_int_imputations_derived
5
CSTM Imputations
Custom surgical grouper-based imputations for specific payers (UHC, Aetna, Cigna, and others). Groups HCPCS into procedure tiers and imputes missing codes using tier averages.
→ tmp_int_imputations_cstm
6
APR-DRG Imputations
Final fallback for APR-DRG ROIDs. Aggregates derived and CSTM imputations through the APR-DRG → MS-DRG crosswalk and joins gross charges for the APR-DRG ROID.
→ tmp_int_imputations_aprdrg
Walk-Through: A ROID with No Raw Rate
UHC PPO, Hospital X, MS-DRG 871 (Septicemia), Inpatient
Starting point: This ROID has no raw rate — UHC did not publish a negotiated rate for MS-DRG 871 at Hospital X in its MRF.
Observed Inpatient rates at Hospital X + UHC PPO: MS-DRG 470 reports \$20,400 (CMS weight 2.04 → implied base \$10,000); MS-DRG 392 reports \$11,200 (weight 1.12 → \$10,000); MS-DRG 291 reports \$8,900 (weight 0.89 → \$10,000).
Base rate detection: 45 of 50 observed DRGs converge on implied base = \$10,000. 45 > 10 and 45/50 = 90% → MS-DRG base rate activated.
Impute MS-DRG 871: CMS weight 3.50. Imputed rate = \$10,000 × 3.50 = \$35,000 — matching the contract's actual relative weight structure rather than a generic cross-ROID average.
imputed_rate = $35,000 | source = tmp_int_msdrg_base_rates | inferred-provision imputation outranks generic tier fallbacks
Imputation Scores
| Score | Meaning |
|---|---|
| 6 | Raw rate available, not an outlier — highest quality (passthrough) |
| 3 | Hospital MRF gross charge with percentage-of-charge, or state-level Medicare benchmark |
| 2 | Not validated, not an outlier — estimated from similar ROIDs |
| 1 | Outlier — rate falls outside expected bounds |
| 0 | Default / no imputation found at any tier |