Version: 3.0

Pipeline Architecture

Clear Rates builds a wide table, one row per ROID, by progressively adding columns across phases. Rate selection then scans those columns — guided by accuracy scores — to pick a single canonical rate per ROID.

The Core Idea: Rows from ROS, Columns from Everything Else

The Rate Object Space (ROS) defines the row set — every valid (payer × network × provider × code) combination Clear Rates will try to price. Every downstream phase joins against the ROS by ROID and contributes new columns. No phase changes the row count.

How a single ROID accumulates data across phases

Stage 1 — Rate Object Space: A ROID is minted for (UHC, Choice Plus PPO, Mass General, MS-DRG 470, Inpatient). This row now exists in the pipeline. All downstream columns start as NULL.

Stage 2 — Raw Data: UHC's payer MRF reports a negotiated rate of $18,000 for this combination. The hospital MRF reports 130% of billed charges. Both land as separate columns: payer_negotiated_rate = 18000, hospital_pct_of_total_billed_charges_pct = 130. Komodo has no data — those columns stay NULL.

Stage 3 — Transformations: The 130% figure is resolved against Mass General's gross charge for MS-DRG 470 ($28,000): hospital_perc_of_total_billed_charges_gc_hosp_perc_to_dol = 0.01 × 130 × 28000 = $36,400. The negotiated $18,000 is already in dollars.

Stage 4 — Imputations: An imputed rate is computed for this ROID regardless — imputations always run. It will only become the canonical rate if no raw or transformed rate with a sufficient accuracy score exists.

Stage 5 — Accuracy: Each non-NULL rate column gets a score. The payer negotiated rate ($18,000) is validated against the hospital MRF — accuracy score = 7. The pct-to-dollar transform ($36,400) is not outlier-validated — accuracy score = 4.

Stage 6 — Rate Selection: The highest-scored non-NULL column wins: payer_negotiated_rate ($18,000, score 7) becomes canonical_rate = 18000, canonical_rate_score = 5.

Pipeline Tables

Each phase reads from and writes to a set of named tables. Sub-version tables include a date suffix (e.g. _2026_02) and are prefixed tmp_. Final production tables have no suffix.

The Wide Table Structure

By the time rate selection runs, each ROID row has dozens of populated or NULL columns — one per source, method, and gross charge variant.

Phase	Column group	Example columns
Raw — Payer MRF	One per negotiated_type	`payer_negotiated_rate`, `payer_fee_schedule_rate`, `payer_percentage_rate`
Raw — Hospital MRF	One per contract_methodology × amount type	`hospital_fee_schedule_dollar`, `hospital_pct_of_total_billed_charges_pct`, `hospital_per_diem_rate`
Transformations — Pct-to-Dollar	6 rate types × 6 gross charge sources	`payer_gc_hosp_perc_to_dol`, `hospital_perc_of_total_billed_charges_gc_hosp_perc_to_dol`
Transformations — Drug / Anesthesia	Drug dosage methods; anesthesia per negotiated_type	`drug_dosage_std_dollar`, `payer_negotiated_rate_anesthesia_cf`
Imputations	One per imputation tier	`imputed_rate`, `imputed_rate_rc`, `imputed_rate_cstm`

Rate Selection: One Winner per ROID

Rate selection scans all scored columns for a ROID and picks the one with the highest accuracy score. The winner is written to canonical_rate, and the selection is recorded in canonical_rate_source and canonical_rate_subversion for full traceability.

ROIDs where every column scored 0 (no data at all) get canonical_rate = NULL. They still appear in the output — the ROS row is preserved — with NULL rate columns indicating a genuine coverage gap.

Lookback Runs

The orchestrator processes one sub-version (month of data) per sub-DAG run. When it processes historical months — any month that is not the most recent in the run list — that run is flagged as a lookback run.

is_max_sub_version = sub_version == max_sub_version
lookback_run = not is_max_sub_version

Lookback runs execute a lighter version of the pipeline to reduce cost and runtime:

Stage	Normal run	Lookback run
Provider spine	All configured provider types	Excludes ASC, Physician Group, Dialysis, DME, Urgent Care
Network spine	All network types, including Narrow and Exchange	Skips Narrow and Exchange network mappings
Everything else	Full pipeline	Identical

The rationale: historical months are generally stable — their raw data doesn't change, and the network/provider coverage for those months is already captured. Lookback runs refresh the pricing calculations without rebuilding the full scope.

The Core Idea: Rows from ROS, Columns from Everything Else​

Pipeline Tables​

The Wide Table Structure​

Rate Selection: One Winner per ROID​

Lookback Runs​

On this page:

The Core Idea: Rows from ROS, Columns from Everything Else

Pipeline Tables

The Wide Table Structure

Rate Selection: One Winner per ROID

Lookback Runs