Version: 3.0

Imputations

Most ROIDs have no raw or transformed rate. Imputations estimate missing values using a hierarchical fallback chain — from the most specific available data to the least specific.

Overview

Why Imputations Are Needed

The Rate Object Space defines every (payer × provider × code × network) combination CLD will try to price. But payer MRFs and hospital MRFs don't cover everything — a payer might list rates for a hospital but omit certain CPT codes, or cover a network broadly without per-provider detail. Rather than leaving those ROIDs with canonical_rate = NULL, imputations estimate what the rate probably is, based on similar payers, providers, and codes.

Long Rates — the raw material

Before any imputation can be computed, all available raw and transformed rates are pivoted into a "long" format — one row per rate value rather than one row per ROID. This enables aggregations across similar ROIDs (e.g., "what do all payers pay this provider for code 99213?").

Long rates — raw columns — Iterates over all raw dollar/percentage rate columns in combined_raw. For each non-NULL value, emits one long row with: roid, rate, accuracy_score, colname, rate_type, provider_type, payer_id. Excludes J/Q drug codes (priced separately). Joins to tmp_int_accuracy_raw to get the accuracy score for each value. → tmp_int_imputations_long_rates_raw_columns
Long rates — transformed columns — Same structure but pulling from transformation columns (pct-to-dollar, drug dosage, anesthesia, etc.). → tmp_int_imputations_long_rates_transformed_columns
Long rates — combined — UNION of raw + transformed long tables. Only non-outlier rates (accuracy_score > min_score) feed into imputation aggregations — this prevents bad rates from contaminating imputed values. → tmp_int_imputations_long_rates

The Full Imputation Chain

RC Global + RC HCPCS + RC Carveouts — Use revenue code (RC) rates from hospital MRFs to impute HCPCS codes. RC Global: one rate applies to all HCPCS in a revenue code family. RC HCPCS: crosswalk resolves specific HCPCS codes. RC Carveouts: payer-specific overrides. → tmp_int_imputations_rc_global, tmp_int_imputations_rc_hcpcs
MS-DRG Base Rate Detection — Detect whether a provider + payer has a consistent base rate structure (rate ≈ base × CMS DRG weight). If so, impute missing DRGs by multiplying the detected base rate by the CMS weight for that DRG. → tmp_int_msdrg_base_rates
Main Imputations (chunked by payer) — Core imputation task. For each ROID with no raw rate, tries 5 aggregation tiers from most specific to least specific. Returns the first tier with enough observations. → tmp_int_imputations
Derived Imputations (chunked by payer) — Secondary gap-filling for Hospital and ASC ROIDs using rate_object_space + gross charges + long rates. Builds on the main imputations output. → tmp_int_imputations_derived
CSTM Imputations — Custom surgical grouper-based imputations for specific payers (UHC, Aetna, Cigna, etc.). Groups codes into procedure tiers; imputes using tier averages from payer contract grouper tables. → tmp_int_imputations_cstm
APR-DRG Imputations — Final fallback for APR-DRG coded ROIDs. Uses derived and CSTM outputs joined via the APR-DRG crosswalk (MS-DRG → APR-DRG). → tmp_int_imputations_aprdrg

Walk-Through: A ROID with No Raw Rate

ROID: Cigna PPO, Community Hospital, CPT 99213, Professional — no raw rate in payer MRF or hospital MRF
Long rates check: Cigna PPO has rates for 99213 at other hospitals. These rates land in the long rates table and feed into tier aggregations.
Tier 1 (Provider + Payer + Network + Code): Community Hospital + Cigna PPO + 99213 → only 2 observed rates (n < 5) → skip
Tier 2 (Provider + Code, all payers): Community Hospital + 99213 → 8 observed rates across payers → avg = $155 → use this

Result

imputed_rate = $155 (tier 2) | imputation_score = 2 | always lower priority than validated raw rates in rate selection

Imputation Scores

Score	Meaning
6	Raw rate available, not an outlier — highest quality (passthrough, not actually imputed)
3	Hospital MRF gross charge with proof-of-concept, or state-level Medicare benchmark
2	Not validated, not an outlier — estimated from similar ROIDs
1	Outlier — rate falls outside expected bounds
0	Default / no imputation found at any tier

Imputation scores feed into the final canonical_rate_score (1–5 scale) after rate selection. Imputed rates always score below validated raw rates, so raw data wins whenever available.

Imputation Tiers

The main imputation task tries 5 aggregation tiers in order, from most specific to least specific. The first tier with enough observations wins.

Tier	Partition key	Min N	Example
1	Provider + Payer + Network + Code	5	Hospital X, UHC PPO, code 99213 — rates at this exact combination
2	Provider + Code (all payers)	5	Hospital X, code 99213 — across all payers at this provider
3	Provider Type + Code	3	All Hospitals, code 99213 — provider-type-level benchmark
4	State + Code	—	California, code 99213 — state-wide benchmark
5	Code (national)	—	Code 99213 nationally — last resort

Aggregations within each tier are computed by the aggregator() macro in imputations.sql, which computes AVG, COUNT, and STDDEV over the long rates for that partition. The macro uses accuracy_score > min_score to exclude outlier rates from the aggregate.

Walk-through: imputation for missing rate

ROID: Hospital X, UHC PPO, code 99213 — no raw or transformed rate exists
Tier 1: Hospital X + UHC + PPO + 99213 → only 2 observed rates (n < 5) → skip
Tier 2: Hospital X + 99213 (all payers) → 8 observed rates, avg = $185 → use this

Result

imputed_rate = $185 (tier 2: provider + code across payers) | imputation_score = 2

MS-DRG Base Rate Detection

Detect if a provider + payer has a consistent base rate structure (rate ≈ base_rate × CMS DRG weight). If so, impute missing DRGs using the detected base rate.

Why Base Rates?

Many hospital contracts are structured as a single base rate that is multiplied by the CMS relative weight for each DRG. If CLD observes this pattern across several DRGs for a given provider + payer, it can detect the implied base rate and use it to fill in DRGs where no rate was reported.

Walk-through: base rate detection

Observed rates at Hospital X, UHC PPO:

MS-DRG 470 (CMS weight 2.04): rate = $20,400 → implied base = $10,000
MS-DRG 392 (CMS weight 1.12): rate = $11,200 → implied base = $10,000
MS-DRG 291 (CMS weight 0.89): rate = $8,900  → implied base = $10,000
MS-DRG 194 (CMS weight 1.50): rate = $15,100 → implied base ≈ $10,067

CV check: coefficient of variation across implied bases < 0.15 → consistent base rate pattern confirmed
Detected base rate: $10,000 (average of implied bases)
Impute missing DRG 871 (weight 3.50): $10,000 × 3.50 = **$ 35,000**

The detection uses the long rates table filtered to billing_code_type = 'MS-DRG' and bill_type = 'Inpatient'. Allowed amount columns are excluded. Minimum 5 observed DRG rates are required; CV must be < 0.15.

Revenue Code (RC) Imputations

Use revenue code rates from hospital MRFs to impute HCPCS codes. RC = Revenue Code — the 4-digit codes hospitals use to report services at the departmental level (e.g., 0120 = Medical/Surgical Supplies).

Why Revenue Codes?

Hospital MRFs sometimes report rates at the revenue code family level rather than at the individual HCPCS level. For example, a hospital might report a single rate for "Operating Room" (RC 036x) without breaking it down by individual procedure codes. The RC imputations reverse-engineer HCPCS rates from these family-level prices using crosswalk tables.

Three RC Imputation Types

RC Global: A payer posts a single rate for a revenue code family in core_rates (billing_code_type = 'RC'). CLD applies this rate to all HCPCS codes in that family. Scoped to inpatient RCs (codes starting with 00, 01, 020, 021). → tmp_int_imputations_rc_global
RC HCPCS: More granular crosswalk from revenue codes to specific HCPCS codes using rc_to_hcpcs_cross_validated_75. When a payer reports an RC rate, this table maps it to the specific HCPCS codes that are typically billed under that revenue code. → tmp_int_imputations_rc_hcpcs
RC Carveout: Some HCPCS codes have specific pricing that overrides the family rate — they are "carved out" of the RC imputation and treated separately.

info

RC imputations are preprocessing inputs, not the final imputation output. They feed into the main imputations.sql chain alongside long rates, providing additional rate observations for the tier aggregations.

CSTM & APR-DRG Imputations

Late-stage fallback imputations after the main chain. CSTM uses surgical procedure groupers; APR-DRG uses the crosswalk from MS-DRG outputs.

CSTM (Custom Surgical Grouper) Imputations

Some payers structure their contracts around surgical procedure tiers rather than individual HCPCS codes. CSTM imputations encode these payer-specific grouper relationships:

Reference tables: tq_intermediate.cld_utils.contract_surg_grouper_{payer}_v0 — one per payer (UHC, Aetna, Cigna, etc.)
Groups HCPCS codes into procedure tiers; imputes missing codes using tier averages from the payer's observed rates
Source: pulls from core_rates joined to network/provider spines — scoped to Inpatient and Outpatient bill_types
Output: tmp_int_imputations_cstm

APR-DRG Imputations

APR-DRG ROIDs (used by some hospitals instead of MS-DRG) can be imputed by leveraging the derived and CSTM imputation outputs for related MS-DRGs, then crosswalking back via the APR-DRG crosswalk table:

Filter rate_object_space to APR-DRG billing codes
Aggregate derived imputations and CSTM imputations using the APR-DRG → MS-DRG crosswalk
Join back with gross charges for the APR-DRG ROID
Output: tmp_int_imputations_aprdrg

info

Imputation order matters: Main imputations → MS-DRG base rates → Derived imputations → CSTM → APR-DRG. Each stage can use outputs from previous stages. CSTM and APR-DRG are last resorts — they only activate when earlier stages found nothing.

Overview​

Why Imputations Are Needed​

Long Rates — the raw material​

The Full Imputation Chain​

Walk-Through: A ROID with No Raw Rate​

Imputation Scores​

Imputation Tiers​

MS-DRG Base Rate Detection​

Why Base Rates?​

Revenue Code (RC) Imputations​

Why Revenue Codes?​

Three RC Imputation Types​

CSTM & APR-DRG Imputations​

CSTM (Custom Surgical Grouper) Imputations​

APR-DRG Imputations​

On this page: