Skip to main content
Version: 3.0

Stage 2: Rate Object Space (ROS)

The ROS is the complete set of valid (payer × network × provider × code) combinations Clear Rates will try to price. Every downstream phase joins against it.

Pipeline Flow

1
Geo-join payers to providers
INNER JOIN network spine to provider spine via geographic matching: NATIONAL, state overlap, CBSA overlap, or national_payer_coverage=True. Exchange networks are joined to Hospital providers only.
tmp_ref_payer_provider
2
Join codes via plausibility
INNER JOIN code_plausibility_flattened on provider_type. PG and DME require additional evidence filters from core_rates to further reduce scope.
tmp_ref_code_plausibility_{sub_version}
3
Hash each row → ROID
SHA256 over a JSON array of 9 fields; first 12 bytes → 24-character hex ROID. Stable across re-runs as long as the 9 fields are unchanged.
tmp_rate_object_space
4
CAR-T + code filter
MS-DRG 018 is restricted to designated CAR-T centers. An optional filter_codes param further restricts the ROS for dev/test runs.
tmp_rate_object_space (filtered)
5
ROS Validations (blocking)
Count checks per dimension (payer, network, provider, code). Failures halt the pipeline before any raw data is ingested.
(validation assertions — no output table)

Key Concepts

The ROS as the Pipeline Skeleton

The ROS defines the "question set" for the entire Clear Rates run. Every ROID represents one pricing question: what is the negotiated rate for this payer × network × provider × code combination?

  • Coverage for a specific (payer, provider, code) combination can be answered by querying tmp_rate_object_space
  • ROIDs with missing rates are gap candidates for imputation — they are not dropped from the output
  • Final output row count is driven entirely by the ROS, not by how many raw rates were collected

Every downstream phase (raw data, transformations, imputations, accuracy) LEFT JOINs against the ROS by ROID. No phase changes the row count. Phases only add columns.

QA Validations (Blocking)

Five count checks run after the ROS is built. Any failure halts the DAG:

  1. Provider count within X% of prior run
  2. Payer count within X% of prior run
  3. Network count within X% of prior run
  4. Code count within X% of prior run
  5. Only expected bill_type values present (Inpatient / Outpatient / Professional)

A non-blocking QA trigger also fires to the core_licensable_data_qa DAG for downstream quality reporting.

note

The ROS validation thresholds are intentionally conservative. A 5% drop in provider count can indicate a spine data issue that will silently reduce output coverage. Always investigate ROS validation failures before bypassing.