Pipeline Architecture
Clear Rates builds a wide table, one row per ROID, by progressively adding columns across phases. Rate selection then scans those columns — guided by accuracy scores — to pick a single canonical rate per ROID.
The Core Idea: Rows from ROS, Columns from Everything Else
The Rate Object Space (ROS) defines the row set — every valid (payer × network × provider × code) combination Clear Rates will try to price. Every downstream phase joins against the ROS by ROID and contributes new columns. No phase changes the row count.
payer_negotiated_rate = 18000, hospital_pct_of_total_billed_charges_pct = 130. Komodo has no data — those columns stay NULL.hospital_perc_of_total_billed_charges_gc_hosp_perc_to_dol = 0.01 × 130 × 28000 = $36,400. The negotiated $18,000 is already in dollars.payer_negotiated_rate ($18,000, score 7) becomes canonical_rate = 18000, canonical_rate_score = 5.Pipeline Tables
Each phase reads from and writes to a set of named tables. Sub-version tables include a date suffix (e.g. _2026_02) and are prefixed tmp_. Final production tables have no suffix.
The Wide Table Structure
By the time rate selection runs, each ROID row has dozens of populated or NULL columns — one per source, method, and gross charge variant.
| Phase | Column group | Example columns |
|---|---|---|
| Raw — Payer MRF | One per negotiated_type | payer_negotiated_rate, payer_fee_schedule_rate, payer_percentage_rate |
| Raw — Hospital MRF | One per contract_methodology × amount type | hospital_fee_schedule_dollar, hospital_pct_of_total_billed_charges_pct, hospital_per_diem_rate |
| Transformations — Pct-to-Dollar | 6 rate types × 6 gross charge sources | payer_gc_hosp_perc_to_dol, hospital_perc_of_total_billed_charges_gc_hosp_perc_to_dol |
| Transformations — Drug / Anesthesia | Drug dosage methods; anesthesia per negotiated_type | drug_dosage_std_dollar, payer_negotiated_rate_anesthesia_cf |
| Imputations | One per imputation tier | imputed_rate, imputed_rate_rc, imputed_rate_cstm |
Rate Selection: One Winner per ROID
Rate selection scans all scored columns for a ROID and picks the one with the highest accuracy score. The winner is written to canonical_rate, and the selection is recorded in canonical_rate_source and canonical_rate_subversion for full traceability.
ROIDs where every column scored 0 (no data at all) get canonical_rate = NULL. They still appear in the output — the ROS row is preserved — with NULL rate columns indicating a genuine coverage gap.
Lookback Runs
The orchestrator processes one sub-version (month of data) per sub-DAG run. When it processes historical months — any month that is not the most recent in the run list — that run is flagged as a lookback run.
is_max_sub_version = sub_version == max_sub_version
lookback_run = not is_max_sub_version
Lookback runs execute a lighter version of the pipeline to reduce cost and runtime:
| Stage | Normal run | Lookback run |
|---|---|---|
| Provider spine | All configured provider types | Excludes ASC, Physician Group, Dialysis, DME, Urgent Care |
| Network spine | All network types, including Narrow and Exchange | Skips Narrow and Exchange network mappings |
| Everything else | Full pipeline | Identical |
The rationale: historical months are generally stable — their raw data doesn't change, and the network/provider coverage for those months is already captured. Lookback runs refresh the pricing calculations without rebuilding the full scope.