Skip to main content
Version: 3.0

Spines

Spines are Stage 1 of the Clear Rates pipeline. They build canonical reference tables for payers, networks, providers, and billing codes — the four dimensions that define the Rate Object Space. Every downstream stage joins against spine tables to resolve identifiers and scope what gets priced.

A payer does not exist in Clear Rates until it appears in the payer spine. A provider type is not priced until its NPIs are enrolled in the provider spine. The spines collectively define the boundaries of Clear Rates's universe.

Pipeline Flow

1
Build Payer + Code + Network Spines (parallel)
Three independent spines built in parallel. Payer spine reads a hardcoded Jinja2 list (~90 payers). Code spine unions 11 billing code sources. Network spine builds from network_mappings.py via 7 SQL files.
spines_payer / spines_code / spines_network
2
Plausibility Tables
asc_plausibility and dme_plausibility cross-reference ASC and DME code sets against provider types. code_plausibility joins all three spines to produce valid (code, provider_type) combinations for the ROS.
asc_plausibility / dme_plausibility / code_plausibility
3
Build Provider Spine
Reads per-type source tables from tq_production.spines. Aggregates NPIs into arrays, assigns internal provider_id. Conditionally includes provider types based on lookback_run param.
spines_provider
4
Provider Spine Manual Additions
Appends manually curated provider rows — edge cases not captured by automated source tables (e.g., recently opened facilities, corrections).
spines_provider (updated)
5
Spine Validations (blocking)
Runs a suite of data quality checks across all spine tables. Any failure blocks the DAG — the Rate Object Space cannot be built on bad spine data.
(validation assertions — no output table)

Dependency Graph

Spine task graph
Parallel first wave: payer_spine, code_spine, and network_spine run simultaneously — they have no mutual dependencies.
Plausibility depends on all three: asc_plausibility and code_plausibility wait for all three spines before running, even though technically only code_spine is needed. The actual Airflow edge is broader than the comment in the SQL suggests.
Provider spine is last: provider_spine waits for code_plausibility and dme_plausibility before running, then manual_additions appends to it.
Validations gate the ROS: spine_validations must pass before rate_object_space is built.
payer_spine ─┐ code_spine ─┤─→ asc_plausibility ─→ code_plausibility ─┐ network_spine─┘ │ └──────────────────────→ dme_plausibility ──┤ ↓ provider_spine ↓ manual_additions ↓ spine_validations (blocking) ↓ rate_object_space

Key Concepts

Spines gate scope — adding a payer to payer.sql makes its rates flow through the full pipeline. Removing a payer removes all its ROIDs from the output. The spine files are the primary way Clear Rates's scope is managed.

Provider ID vs NPIprovider_id is Clear Rates's internal identifier assigned in the provider spine. Each provider_id corresponds to one or more NPIs stored as an ARRAY. Core rates joins against this array, not a scalar NPI.

bypass_validations — a DAG param that skips all validation task groups. Defaults to False. Never bypass in production without explicit approval — failing validations indicate real data problems that will propagate downstream.

lookback_run — when True, the provider spine excludes ASC, PG, Dialysis, DME, and Urgent Care provider types. Only Hospital providers are included. Used for historical backfill runs where non-hospital coverage is not needed.

Check the actual task graph, not the comments

The sub-DAG code has a comment "asc_plausibility only needs code_spines" — but the actual Airflow edge waits for all 3 spines. Always check the actual task graph in __init__.py over comments in individual SQL files.