Skip to main content
Version: 3.0

Pipeline Architecture

Clear Rates builds a wide table, one row per ROID, by progressively adding columns across phases. Rate selection then scans those columns — guided by accuracy scores — to pick a single canonical rate per ROID.

The Core Idea: Rows from ROS, Columns from Everything Else

The Rate Object Space (ROS) defines the row set — every valid (payer × network × provider × code) combination Clear Rates will try to price. Every downstream phase joins against the ROS by ROID and contributes new columns. No phase changes the row count.

How a single ROID accumulates data across phases
Stage 1 — Rate Object Space: A ROID is minted for (UHC, Choice Plus PPO, Mass General, MS-DRG 470, Inpatient). This row now exists in the pipeline. All downstream columns start as NULL.
Stage 2 — Raw Data: UHC's payer MRF reports a negotiated rate of $18,000 for this combination. The hospital MRF reports 130% of billed charges. Both land as separate columns: payer_negotiated_rate = 18000, hospital_pct_of_total_billed_charges_pct = 130. Komodo has no data — those columns stay NULL.
Stage 3 — Transformations: The 130% figure is resolved against Mass General's gross charge for MS-DRG 470 ($28,000): hospital_perc_of_total_billed_charges_gc_hosp_perc_to_dol = 0.01 × 130 × 28000 = $36,400. The negotiated $18,000 is already in dollars.
Stage 4 — Imputations: An imputed rate is computed for this ROID regardless — imputations always run. It will only become the canonical rate if no raw or transformed rate with a sufficient accuracy score exists.
Stage 5 — Accuracy: Each non-NULL rate column gets a score. The payer negotiated rate ($18,000) is validated against the hospital MRF — accuracy score = 7. The pct-to-dollar transform ($36,400) is not outlier-validated — accuracy score = 4.
Stage 6 — Rate Selection: The highest-scored non-NULL column wins: payer_negotiated_rate ($18,000, score 7) becomes canonical_rate = 18000, canonical_rate_score = 5.

Pipeline Tables

Each phase reads from and writes to a set of named tables. Sub-version tables include a date suffix (e.g. _2026_02) and are prefixed tmp_. Final production tables have no suffix.

The Wide Table Structure

By the time rate selection runs, each ROID row has dozens of populated or NULL columns — one per source, method, and gross charge variant.

PhaseColumn groupExample columns
Raw — Payer MRFOne per negotiated_typepayer_negotiated_rate, payer_fee_schedule_rate, payer_percentage_rate
Raw — Hospital MRFOne per contract_methodology × amount typehospital_fee_schedule_dollar, hospital_pct_of_total_billed_charges_pct, hospital_per_diem_rate
Transformations — Pct-to-Dollar6 rate types × 6 gross charge sourcespayer_gc_hosp_perc_to_dol, hospital_perc_of_total_billed_charges_gc_hosp_perc_to_dol
Transformations — Drug / AnesthesiaDrug dosage methods; anesthesia per negotiated_typedrug_dosage_std_dollar, payer_negotiated_rate_anesthesia_cf
ImputationsOne per imputation tierimputed_rate, imputed_rate_rc, imputed_rate_cstm

Rate Selection: One Winner per ROID

Rate selection scans all scored columns for a ROID and picks the one with the highest accuracy score. The winner is written to canonical_rate, and the selection is recorded in canonical_rate_source and canonical_rate_subversion for full traceability.

ROIDs where every column scored 0 (no data at all) get canonical_rate = NULL. They still appear in the output — the ROS row is preserved — with NULL rate columns indicating a genuine coverage gap.

Lookback Runs

The orchestrator processes one sub-version (month of data) per sub-DAG run. When it processes historical months — any month that is not the most recent in the run list — that run is flagged as a lookback run.

is_max_sub_version = sub_version == max_sub_version
lookback_run = not is_max_sub_version

Lookback runs execute a lighter version of the pipeline to reduce cost and runtime:

StageNormal runLookback run
Provider spineAll configured provider typesExcludes ASC, Physician Group, Dialysis, DME, Urgent Care
Network spineAll network types, including Narrow and ExchangeSkips Narrow and Exchange network mappings
Everything elseFull pipelineIdentical

The rationale: historical months are generally stable — their raw data doesn't change, and the network/provider coverage for those months is already captured. Lookback runs refresh the pricing calculations without rebuilding the full scope.