Version: 3.0

Stage 3: Raw Data Ingestion

Four data sources are ingested, mapped onto the Rate Object Space, and merged into a single combined raw table.

Four Data Sources

Source	What it contains	Coverage	Key challenge
Payer MRF	Negotiated rates filed by insurers (Transparency in Coverage), keyed by provider NPI and payer_data_network	All payer-network-provider combinations the payer reports	PG rates use group NPI or EIN → provider_id mapping; code crosswalks needed
Hospital MRF	Rates published by hospitals (Hospital Price Transparency), by plan name. Fee schedule, case rate, % billed, per diem	Hospital-specific rates	Plan names must be mapped to Clear Rates network_ids via Plan Bridge
Komodo allowables	Real-world allowed amounts from claims data at NPI × payer, ZIP3, state, national levels	Where Clear Rates has licensed Komodo data	Claims-based (what was actually paid), not the contracted rate
Gross charges	Hospital's undiscounted list price from chargemasters. MRF-reported and Komodo-derived, 6 fallbacks	All hospitals with public chargemasters	Used for pct-to-dollar transformations; Infinity values must be filtered before percentile calculations

Pipeline Flow

Payer MRF

core_rates → NPI→provider_id → bill type → code crosswalks (APR-DRG, APC, EAPG) → tmp_raw_mrf_payer_rates. 6 payer chunks in parallel. PG rates take a separate 3-step path via build_pg_rates_prebuild → build_payer_pg_mrf → apply_custom_pg_logic.

→ tmp_raw_mrf_payer_rates

Hospital MRF

Plan Bridge first (keyword matching, confidence scoring) → hospital_rates joined → network_id resolved → tmp_raw_mrf_hospital_rates. NDC drug codes crosswalked before hospital build. 3 provider chunks in parallel.

→ tmp_raw_mrf_hospital_rates

Supporting Sources

Gross charges (7 state chunks) + Komodo allowables + device gross charges run in parallel.

→ tmp_raw_gross_charges / tmp_raw_komodo

NDC Preprocessing

Runs once before the hospital pipeline. Normalizes NDC formats, maps to HCPCS via Medispan.

→ tmp_ref_ndc_crosswalk

Combined Raw

All four sources LEFT JOINed onto the Rate Object Space. Most ROIDs have NULL rates — they are gap candidates for imputation.

→ tmp_int_combined_raw

Validations + QA

Row count checks, uniqueness tests, non-blocking QA trigger.

→ —

What "Raw" Means

A rate in tmp_int_combined_raw has been:

Mapped to a ROID (payer + network + provider + code combination)
Placed in a source-specific column (e.g. payer_negotiated_rate, hospital_fee_schedule_dollar)

It has not been transformed, benchmarked, imputed, or scored. Most ROIDs have NULL values for most columns — that is expected.

The Wide Raw Table

tmp_int_combined_raw is partitioned by payer_id, network_type, and bill_type. It carries separate columns for each source and methodology:

Column group	Examples
Payer MRF	`payer_negotiated_rate`, `payer_fee_schedule_rate`, `payer_percentage_rate`
Hospital MRF	`hospital_fee_schedule_dollar`, `hospital_pct_of_total_billed_charges_pct`, `hospital_per_diem_rate`, `hospital_case_rate_dollar`
Komodo	`komodo_allowed_npi`, `komodo_allowed_zip3`, `komodo_allowed_state`, `komodo_allowed_national`
Gross charges	`gc_hosp`, `gc_hosp_cbsa`, `gc_hosp_state`, `gc_komodo`, `gc_komodo_cbsa`, `gc_komodo_state` (6 sources)

For any given ROID, most of these columns are NULL.

Chunking Pattern

Large source tables are split across parallel Airflow tasks to avoid single-task timeouts.

Compute chunks — get_payer_chunks() or get_ros_payer_chunks() groups payers or providers into N buckets
Parallel execution — each chunk runs as a separate Airflow task via .expand() (dynamic task mapping)
Union — a consolidation task merges all chunk outputs into the final table

Key gotcha: hospital_rates.payer_id is BIGINT

hospital_rates.payer_id is BIGINT while the rest of the pipeline uses VARCHAR. Always CAST(hr.payer_id AS VARCHAR) when joining hospital data to any Clear Rates table.

Four Data Sources​

Pipeline Flow​

What "Raw" Means​

The Wide Raw Table​

Chunking Pattern​

On this page:

Four Data Sources

Pipeline Flow

What "Raw" Means

The Wide Raw Table

Chunking Pattern