Stage 2: Rate Object Space (ROS)
The ROS is the complete set of valid (payer × network × provider × code) combinations Clear Rates will try to price. Every downstream phase joins against it.
Pipeline Flow
Key Concepts
The ROS as the Pipeline Skeleton
The ROS defines the "question set" for the entire Clear Rates run. Every ROID represents one pricing question: what is the negotiated rate for this payer × network × provider × code combination?
- Coverage for a specific (payer, provider, code) combination can be answered by querying
tmp_rate_object_space - ROIDs with missing rates are gap candidates for imputation — they are not dropped from the output
- Final output row count is driven entirely by the ROS, not by how many raw rates were collected
Every downstream phase (raw data, transformations, imputations, accuracy) LEFT JOINs against the ROS by ROID. No phase changes the row count. Phases only add columns.
QA Validations (Blocking)
Five count checks run after the ROS is built. Any failure halts the DAG:
- Provider count within X% of prior run
- Payer count within X% of prior run
- Network count within X% of prior run
- Code count within X% of prior run
- Only expected
bill_typevalues present (Inpatient / Outpatient / Professional)
A non-blocking QA trigger also fires to the core_licensable_data_qa DAG for downstream quality reporting.
The ROS validation thresholds are intentionally conservative. A 5% drop in provider count can indicate a spine data issue that will silently reduce output coverage. Always investigate ROS validation failures before bypassing.