Version: 3.0

Stage 5: Imputations

Most ROIDs have no raw or transformed rate. Imputations estimate missing values using a hierarchical fallback chain.

Why Imputations Are Needed

Payer MRFs and hospital MRFs don't cover everything. Where raw data reveals the shape of a contract, we make smart inferences based on inferred provisions — detecting MS-DRG base rates, OP surgical groupers, and OP percentage base rates — and apply them to fill missing ROIDs.

Long Rates — The Raw Material

Before any imputation runs, available raw and transformed rates are pivoted into "long" format: one row per rate value. This enables aggregations across similar ROIDs, which is the foundation of every imputation tier.

Long rates — raw columns

Iterates over raw dollar/percentage rate columns. For each non-NULL value, emits one row with: roid, rate, accuracy_score, colname, rate_type, provider_type, payer_id. Excludes J/Q drug codes.

→ tmp_int_imputations_long_rates_raw_columns

Long rates — transformed columns

Same structure as raw columns but sourced from transformation columns. Each non-NULL transformed rate becomes one row.

→ tmp_int_imputations_long_rates_transformed_columns

Long rates — combined

UNION of raw and transformed long rates. Only non-outlier rates (accuracy_score > min_score) feed in — prevents bad rates from contaminating imputed values.

→ tmp_int_imputations_long_rates

Full Imputation Chain

RC Global + RC HCPCS + RC Carveouts

Use revenue code rates from hospital MRFs to impute HCPCS codes. RC Global applies a family-level rate to all HCPCS in that family. RC HCPCS uses a validated crosswalk table. RC Carveouts handle specific codes with overriding prices.

→ tmp_int_imputations_rc_global, tmp_int_imputations_rc_hcpcs

MS-DRG Base Rate Detection

Detect base rate structure: if observed rates closely follow rate ≈ base × CMS DRG weight, impute missing DRGs using the detected base rate. Requires minimum 5 observed DRGs with CV < 0.15.

→ tmp_int_msdrg_base_rates

Main Imputations (chunked by payer)

Core imputation. For each ROID with no raw rate, tries 5 aggregation tiers from most specific (provider + payer + network + code) to least specific (code nationally). Stops at the first tier meeting minimum N.

→ tmp_int_imputations

Derived Imputations (chunked by payer)

Secondary gap-filling for Hospital and ASC ROIDs using rate_object_space, gross charges, and long rates. Handles percentage-of-charges structures that could not be resolved in the main pass.

→ tmp_int_imputations_derived

CSTM Imputations

Custom surgical grouper-based imputations for specific payers (UHC, Aetna, Cigna, and others). Groups HCPCS into procedure tiers and imputes missing codes using tier averages.

→ tmp_int_imputations_cstm

APR-DRG Imputations

Final fallback for APR-DRG ROIDs. Aggregates derived and CSTM imputations through the APR-DRG → MS-DRG crosswalk and joins gross charges for the APR-DRG ROID.

→ tmp_int_imputations_aprdrg

Walk-Through: A ROID with No Raw Rate

UHC PPO, Hospital X, MS-DRG 871 (Septicemia), Inpatient

Starting point: This ROID has no raw rate — UHC did not publish a negotiated rate for MS-DRG 871 at Hospital X in its MRF.

Observed Inpatient rates at Hospital X + UHC PPO: MS-DRG 470 reports \$20,400 (CMS weight 2.04 → implied base \$10,000); MS-DRG 392 reports \$11,200 (weight 1.12 → \$10,000); MS-DRG 291 reports \$8,900 (weight 0.89 → \$10,000).

Base rate detection: 45 of 50 observed DRGs converge on implied base = \$10,000. 45 > 10 and 45/50 = 90% → MS-DRG base rate activated.

Impute MS-DRG 871: CMS weight 3.50. Imputed rate = \$10,000 × 3.50 = \$35,000 — matching the contract's actual relative weight structure rather than a generic cross-ROID average.

imputed_rate = $35,000 | source = tmp_int_msdrg_base_rates | inferred-provision imputation outranks generic tier fallbacks

Imputation Scores

Score	Meaning
6	Raw rate available, not an outlier — highest quality (passthrough)
3	Hospital MRF gross charge with percentage-of-charge, or state-level Medicare benchmark
2	Not validated, not an outlier — estimated from similar ROIDs
1	Outlier — rate falls outside expected bounds
0	Default / no imputation found at any tier

Why Imputations Are Needed​

Long Rates — The Raw Material​

Full Imputation Chain​

Walk-Through: A ROID with No Raw Rate​

Imputation Scores​

On this page:

Why Imputations Are Needed

Long Rates — The Raw Material

Full Imputation Chain

Walk-Through: A ROID with No Raw Rate

Imputation Scores