Skip to main content
Version: 3.0

Rate Selection

By the time rate selection runs, each ROID has dozens of candidate rates across raw, transformed, and imputed columns — all scored 0–7. Rate selection picks the one winner per ROID and writes it to canonical_rate.

Overview

How All Sources Come Together

Rate selection reads from tmp_int_accuracy_brit — the final wide table produced by BRIT accuracy. Every rate column and its _validation_score counterpart are present on a single row per ROID. Rate selection does not join anything new; it just scans what's already there.

Selection sequence:

  1. BRIT Accuracy Table — One row per ROID. Every rate column (raw, transformed, imputed) with its _validation_score. Input to rate selection. → tmp_int_accuracy_brit
  2. Build Rate Arrays — Collect all non-NULL rate columns into parallel arrays: rate values, scores, sources, types, methodologies. Array position encodes priority for tiebreaking. → (CTE inside combined_main.sql)
  3. Select Canonical Ratearray_max(rate_score_array) finds the highest score. ARRAY_POSITION extracts the index. All canonical_* fields are populated from that index. → tmp_int_combined_no_whisp
  4. Add Whispers — Optional enrichment layer. Summary statistics at payer/network/provider/code aggregation levels are joined in. → tmp_int_combined

Canonical Rate Score (1–5)

After selection, the internal 0–7 accuracy score is translated into a user-facing 1–5 scale:

Canonical ScoreInternal ScoreMeaning
57.xPayer + hospital MRF independently agreed within ±20%
46.xSingle MRF source, within outlier bounds (not cross-validated)
34.x or 5.xWithin bounds, no counterparty validation
22.x or 3.xImputed estimate (enhanced or benchmark-backed)
11.xOutlier — outside all acceptable bounds, last resort
00No rate — ROID has no usable data from any source
info

QA tip: When checking CLD output, always filter on canonical_rate_score > 1 to exclude outliers, or canonical_rate_score > 0 to exclude empty ROIDs. A NULL canonical_rate means no data was found at any tier — the ROID still exists in the output, just with nulls.

Output Columns

ColumnMeaningExample
canonical_rateThe selected dollar amount18500.00
canonical_rate_score1–5 confidence (user-facing)5
canonical_rate_sourceWhich source won ("payer", "hospital", "imputation", "payer_hospital")"payer_hospital"
canonical_rate_typeExact rate column that won, prefixed by category"raw: payer_negotiated_rate"
canonical_contract_methodologyContract methodology of the winning rate"Fee Schedule"
canonical_rate_class"Raw", "Transform", or "Impute""Raw"
canonical_gross_charge_typeGross charge source used (for enhanced rates)"mrf_gross_charge_cbsa_median"

Rate Type Categories

Every rate in the CLD output belongs to one of four categories. These categories reflect how defensible the rate is: where the data came from and how much estimation was involved.

CategorySourceDescriptionTypical canonical_rate_type prefix
Posted RatesPayer MRF, Hospital MRFFixed dollar amounts from MRF files. No estimation involved. The most defensible category.raw: payer_*, raw: hospital_*_dollar, impute: msdrg_mrf_base_rate_*
Real-World RatesPayer MRF, Hospital MRF, Komodo claimsVariable-rate provisions where all components come from reported data (e.g. 90% of charges × hospital's own reported charge). Rate may fluctuate claim-to-claim but is fully grounded in source data.raw: hospital_*_allowed_amount, transform: hosp_per_diem_mult_glos, transform: *_gc_hosp_perc_to_dol
Enhanced RatesMRF provision + estimated benchmark componentMRF reports one component (e.g. 90% of charges) but not the charge itself. Turquoise fills in the missing component from Komodo or geographic averages.transform: *_gc_komodo_perc_to_dol, transform: *_gc_hosp_cbsa_perc_to_dol, most impute:*
Benchmark RatesMedicare reference dataEntirely derived from reference or benchmark data — no reliable information from MRFs was available. Last resort.benchmark_* columns

Tie-Breaking Priority

When two rates have the same accuracy score, the category hierarchy determines the winner:

PriorityCategoryRationale
1Posted RatesAll components from reported data, no estimation
2Real-World RatesAll components from reported data, but rate varies by claim
3Enhanced RatesProvision reported, but one component estimated from benchmarks
4Benchmark RatesEntirely benchmark-derived — most speculative

This tie-breaking matters primarily during orchestrator merging, when multiple sub-version runs are combined. Within a single sub-version run, the CDF-based decimal tiebreaker typically produces unique scores — ties are rare.

canonical_rate_type records the exact rate column that won, with a category prefix: raw: payer_negotiated_rate, transform: hospital_perc_of_total_billed_charges_gc_hosp_perc_to_dol, impute: msdrg_base_rate_mult_cms_weight. Users can filter or pivot by rate type prefix to understand what fraction of their coverage comes from each tier of data quality.

info

Sub-version merging: The orchestrator runs N sub-DAGs (one per month) then merges them. When merging, a validated rate (score 7) from an older sub-version beats an unvalidated rate (score 6) from a newer one. Same-score ties across sub-versions use the rate type hierarchy above.


Rate Arrays

Parallel arrays per ROID, with rate values, accuracy scores, and metadata in corresponding positions. Array order defines priority for same-score tiebreaking.

Array Structure

Each ROID gets parallel arrays of the same length. Position [i] in each array corresponds to the same rate column:

ArrayContent at [i]Example
rate_arrayDollar value125.00
rate_score_arrayAccuracy score (0–7.x)7.00000125
source_arrayData source label"payer"
rate_type_arrayColumn name with prefix"raw: payer_negotiated_rate"
methodology_arrayContract methodology"Fee Schedule"
rate_class_array"Raw", "Transform", or "Impute""Raw"

Priority Ordering (Inpatient vs Non-Inpatient)

Array position matters when multiple rates have identical scores. The ordering differs by bill type:

Inpatient

  1. Hospital raw rates
  2. Hospital untransformed
  3. Payer raw rates
  4. Payer untransformed
  5. Hospital transforms (provider GC)
  6. Payer transforms (provider GC)
  7. Hospital transforms (CBSA GC)
  8. Payer transforms (CBSA GC)
  9. State-level transforms
  10. Imputations

Non-Inpatient

Similar but:

  • Untransformed rates get slight priority
  • Provider-level GC before CBSA
  • CBSA before state
  • Hospital source before payer source within each tier

The arrays are also built without imputations (*_no_impute variants) for intermediate tracking — e.g., best_idx_no_impute records what the canonical rate would be if no imputations existed.


Selection Algorithm

The core algorithm: find max score, use its array index to select rate and all metadata.

Special Rules

  • Score = 7 → "payer_hospital": When the best score is 7.x (counterparty validated), canonical_rate_source is always set to "payer_hospital" regardless of which column won — because both sources agreed and the label should reflect that
  • Multiple best indices: best_payer_idx, best_hospital_idx, and best_idx_no_impute are tracked separately alongside best_idx for downstream analysis and QA
  • Gross charge type derivation: canonical_gross_charge_type is inferred from the rate_type name — e.g., a column ending in gc_hosp_cbsa_perc_to_dol → "mrf_gross_charge_cbsa_median"
  • NULL rates: ROIDs where every column scores 0 get canonical_rate = NULL and canonical_rate_score = 0. The ROID row is preserved — a NULL canonical rate means a genuine coverage gap, not a missing row
info

Determinism: The algorithm is fully deterministic. Given the same input arrays, it always produces the same canonical rate. The CDF and rate/1e8 tiebreakers ensure no two rates have exactly the same score in practice.


Walk-Through Example

Step-by-step canonical rate selection for two contrasting ROIDs.

Example A — Cross-validated: Code 99213, Provider X, Payer Y

Available rates:

  • Payer negotiated rate: $125
  • Hospital case rate: $120
  • Payer pct-to-dol transform: 130(81130 (81% × 160 gross charge)
  • Imputed rate: $115

Accuracy scoring:

  • Payer 125vshospital[125 vs hospital [120]: |125125-120|=5205 ≤ 20%×125=$25 → MATCH → score = 7 + 125/1e8 = 7.00000125
  • Hospital 120vspayer[120 vs payer [125]: |120120-125|=5205 ≤ 20%×120=$24 → MATCH → score = 7.00000120
  • Transform $130: no counterparty match, within bounds, CDF=0.54 → score = 6.54
  • Imputed $115: within bounds, CDF=0.23 → score = 4.23

Rate arrays (in order):

rate_array:       [$125,        $120,         $130,        $115       ]
rate_score_array: [7.00000125, 7.00000120, 6.54, 4.23 ]
source_array: ["payer", "hospital", "payer", "imputation"]
rate_type_array: ["raw: payer_negotiated_rate", "raw: hospital_case_rate_dollar", "transform: ...", "impute: ..."]

Selection:

best_score = array_max([7.00000125, 7.00000120, 6.54, 4.23]) = 7.00000125
best_idx = ARRAY_POSITION(rate_score_array, 7.00000125) = 1

canonical_rate = $125.00
canonical_rate_score = 5 (7.x → confidence 5)
canonical_rate_source = "payer_hospital" (score=7 → both sources agreed)
canonical_rate_type = "raw: payer_negotiated_rate"
canonical_rate_class = "Raw"
info

The payer rate (125)winsoverthehospitalrate(125) wins over the hospital rate (120) because 7.00000125 > 7.00000120. Both are counterparty-validated — the tiebreaker (rate/1e8) slightly prefers the higher dollar amount. Source is labeled "payer_hospital" because score=7 means both agreed.


Example B — Outlier payer rate: Code 99213, Urgent Care, BCBS PPO

Available rates:

  • Payer MRF rate: 4,200MedicareMPFSfor99213is4,200 — Medicare MPFS for 99213 is 92. This is 45× Medicare.
  • Imputed rate (RC HCPCS): $155 — derived from RC family average across similar codes

Accuracy scoring:

  • Payer $4,200: 45× Medicare, upper bound = 30× → OUTLIER → score = 1.xx
  • Imputed $155: within bounds (1.7× Medicare), CDF=0.61 → score = 2.61

Selection: score 2.61 > score 1.xx → imputed rate wins

canonical_rate        = $155.00
canonical_rate_score = 2 (imputed, within bounds)
canonical_rate_source = "imputation"
canonical_rate_type = "impute: rc_family_gc_hosp_perc_to_dol"
info

The outlier payer rate is preserved in the wide table (its column is not deleted) but loses to the imputation because score 1 < score 2. A user filtering on canonical_rate_score > 1 would see the imputed 155andnotthe155 and not the 4,200 outlier.