Version: 3.0

Spines

Build canonical reference tables defining which payers, networks, providers, and codes are in scope. All downstream phases join against these tables to place rates on the Rate Object Space.

Overview

Build sequence:

Build Payer + Code + Network spines (parallel) — Payer spine from hardcoded list; code spine from 11-source UNION ALL; network spine from network_mappings.py (7 steps) → tmp_ref_payer_spine, tmp_ref_code_spine, tmp_ref_network_spine
Build Provider Spine — Waits for payer + code spines. Pulls from tq_production.spines.spines_provider_{type} tables. Filtered by provider_types param. → tmp_ref_provider_spine
Plausibility Tables — ASC, Code (PG), and DME plausibility built from core_rates evidence. No validations — silently feed the Rate Object Space. → tmp_ref_asc_plausibility, tmp_ref_code_plausibility, tmp_ref_dme_plausibility
Provider Spine Manual Additions — Add DPR infusion centers, spine-matched PGs, and synthetic PGs from pg_identifiers.sql → tmp_ref_provider_spine (updated)
Spine Validations (blocking) — Uniqueness and completeness tests. bypass_validations=True skips this. Failures block the run.

Airflow dependency graph:

payer_spine  ─┐
code_spine   ─┤─→ asc_plausibility ─→ code_plausibility ─┐
network_spine─┘                                            │
              └──────────────────────→ dme_plausibility ──┤
                                                           ↓
                                                    provider_spine
                                                           ↓
                                                  manual_additions
                                                           ↓
                                                  spine_validations (blocking)
                                                           ↓
                                                   rate_object_space
                                                           ↓
                                                    ros_validations (blocking)
                                                           ↓
                                                 trigger_qa (non-blocking)

warning

The sub-DAG code has a comment "asc_plausibility only needs code_spines" — but the actual Airflow edge waits for all 3 spines (payer + code + network). Always check the actual task graph in __init__.py over comments.

Key concepts:

Spines gate scope: Adding a payer to payer.sql makes its rates flow through the full pipeline. Removing a provider type from the provider_types param excludes that type from the entire run. Spines are where scope is defined.
Provider ID vs NPI: A provider_id is CLD's internal identifier. A single provider can have multiple NPIs. The spine aggregates NPIs into an array: npi = ['1234567890', '0987654321']. Payer MRF data (keyed by NPI) is joined via WHERE cr.provider_npi = ANY(ps.npi).
bypass_validations: When True, all validation task groups are skipped — useful during development or when validations are known to fail for expected reasons. Never bypass validations in production runs without approval.
lookback_run: When lookback_run=True, the provider spine excludes ASC, PG, Dialysis, DME, and Urgent Care. Lookback runs re-process historical months with updated payer MRF data only.

Payer Spine

Canonical list of payers CLD processes in this version. A hardcoded Jinja2 list defines the universe — add a payer ID here and its rates flow through the entire pipeline. (~90 payers)

How It Works

payer.sql builds the spine in 4 steps, all within a single SQL file:

Jinja2 set (pre-SQL): The {% set payer_ids = [...] %} block is a Jinja2 template directive — it runs before the SQL is sent to Trino. This expands to ~90 UNION ALL rows of SELECT id AS payer_payer_id.
payer_ids CTE: Materializes the list as SQL rows.
payers CTE: Maps each payer_payer_id to definitive_payer_id (Definitive Healthcare entity IDs) and blue_payer (BCBS flag). Both are hardcoded CASE/IN expressions — not derived from any lookup table.
Final SELECT: Joins to tq_production.spines.spines_payer to resolve payer_name. If a payer ID isn't in spines_payer, it drops silently.

Worked Example: Aetna (payer_id = '42')

Input: '42' appears in the Jinja2 payer_ids list
definitive_payer_id: ARRAY[575957, 585334, 575927, 575914, 579862, 575931, 553230, 575890, 575915, 579863, 552470] — 11 Definitive entities because Aetna operates under multiple legal entities
blue_payer: True — Aetna is in the BCBS IN list
payer_name: 'Aetna' — resolved from tq_production.spines.spines_payer via JOIN

Result

Final row: payer_id='42', payer_name='Aetna', definitive_payer_id=[575957,...], blue_payer=True

Worked Example: Multiplan (payer_id = '101')

definitive_payer_id: NULL — Multiplan is a network rental company, not tracked in Definitive Healthcare
blue_payer: False

Result

Final row: payer_id='101', payer_name='Multiplan', definitive_payer_id=NULL, blue_payer=False

Key Columns

Column	Type	Description
`payer_id`	VARCHAR	CLD's stable payer identifier. Cast from integer in the source. E.g., '42'=Aetna, '76'=Cigna, '643'=UHC, '169'=Humana.
`payer_name`	STRING	Human-readable name resolved from `tq_production.spines.spines_payer` via JOIN. Not stored in `payer.sql` itself.
`definitive_payer_id`	ARRAY[BIGINT]	Definitive Healthcare entity IDs for enrichment joins. NULL for ~10 payers. ARRAY because some payers (Aetna: 11, BCBS: 3, Kaiser: 2) span multiple entities.
`blue_payer`	BOOLEAN	True if this is a Blue Cross Blue Shield plan. Hardcoded IN list of ~45 payer IDs.

info

definitive_payer_id is an ARRAY because some payers operate under multiple Definitive Healthcare legal entities. When joining downstream (e.g., in whispers), use CROSS JOIN UNNEST(definitive_payer_id) AS t(def_id) to expand.

warning

payer_id is VARCHAR, not INTEGER. The source column is cast with CAST(payer_payer_id AS VARCHAR). When joining to other tables that store payer_id as bigint (e.g., hospital_rates.payer_id), cast explicitly: CAST(hospital_rates.payer_id AS VARCHAR) = payer_spine.payer_id.

Provider Spine

Enumerate all providers by type, with NPI arrays, geographic attributes, and taxonomy groupings. Built from the Spines DAG's per-type source tables (tq_production.spines.spines_provider_{type}).

How It Works

provider.sql has two layers of CTEs per provider type:

Array NPI CTE (e.g., hospital_array_npi): unions the primary NPI table with the additional NPIs table, then aggregates all NPIs into a sorted array with ARRAY_SORT(ARRAY_AGG(DISTINCT CAST(npi AS VARCHAR))).
Final provider CTE (e.g., hospitals): joins the spine table to the NPI array, adds geo columns (state, cbsa, zip, carrier_locality), taxonomy columns, and a hardcoded provider_type string.

The CTEs are assembled at the bottom with a Jinja2-conditional UNION ALL — only types in the provider_types parameter are included:

{% if 'Hospital' in provider_types %}        SELECT ... FROM hospitals
{% if 'ASC' in provider_types %}             UNION ALL SELECT ... FROM ascs
{% if 'Physician Group' in provider_types %}  UNION ALL SELECT ... FROM physician_groups
{% if 'Laboratory' in provider_types %}       UNION ALL SELECT ... FROM labs
{% if 'Imaging Center' in provider_types %}   UNION ALL SELECT ... FROM imaging_centers
{% if 'Dialysis Center' in provider_types %}  UNION ALL SELECT ... FROM dialysis_centers
{% if 'DME' in provider_types %}              UNION ALL SELECT ... FROM dme
{% if 'Urgent Care' in provider_types %}      UNION ALL SELECT ... FROM urgent_care

Worked Example: Lab Filtering

Stage 1 — core_rates filter: labs_in_core_rates joins labs to core_rates via NPI. Only labs that have actually appeared in payer MRF data pass through.
Stage 2 — national_payer_coverage override: Labs with national_payer_coverage = TRUE are included regardless — Quest/LabCorp have national contracts that may not show up in every payer's MRF.
Quest/LabCorp dedup: These chains have many locations sharing the same provider_id with different states. The spine disambiguates by appending state: provider_name = 'Quest Diagnostics CA', 'Quest Diagnostics TX', etc.

Result

Quest Diagnostics CA (payer_id=..., state='CA'), Quest Diagnostics TX (..., state='TX') — separate ROIDs per state.

carrier_locality column: Joined from REF_CMS_ZIP_CARRIER_LOCALITY by zip code. Used by the Benchmarks phase for Medicare Physician Fee Schedule (MPFS) pricing. MPFS rates vary by carrier locality.

Provider Types

Type	Source Table	Special Handling	Scale
Hospital	`spines_provider_hospitals` + `_additional_npis`	Subtypes: Critical Access, Children's, Rehab, etc. CCN (`medicare_provider_id`). Health system hierarchy.	~6,000
ASC	`spines_provider_ascs` + `_additional_npis`	AmSurg specialty matching. Used for ASC code plausibility.	~5,500
Physician Group	`spines_provider_physician_groups` + TINs	Associated TINs array. Health system membership. Code plausibility required.	~35,000
Laboratory	`spines_provider_labs` + `_additional_npis`	Filtered to `core_rates` presence OR `national_payer_coverage=True`. Quest/LabCorp deduplicated by state.	~500
Imaging Center	`spines_provider_imaging_centers` (no additional NPIs)	Can share codes with PG (infusion center overlap).	~3,000
Dialysis Center	`spines_provider_dialysis_centers` + `_additional_npis`	Added in v2.4.3. Excluded when `lookback_run=True`.	~7,000
DME	`spines_provider_dme` + `_additional_npis`	DME plausibility required from `core_rates` evidence. Excluded when `lookback_run=True`.	~2,000
Urgent Care	`spines_provider_urgent_care` + `_additional_npis`	Added in v2.4.3. Excluded when `lookback_run=True`.	~10,000

info

Default provider_types param: ['Hospital', 'ASC', 'Physician Group', 'Laboratory', 'Imaging Center', 'Dialysis Center', 'DME', 'Urgent Care']. When lookback_run=True, only Hospital is included.

warning

Key gotcha: The npi column is an ARRAY[VARCHAR], not a scalar. When joining from core_rates (which has a single NPI per row as a bigint), use WHERE cr.provider_npi = ANY(CAST(ps.npi AS ARRAY(BIGINT))) — both the ARRAY element type and the type cast matter.

Network Spine

All payer-network combinations: PPO, HMO, Exchange, Narrow, Medicare Advantage. Built in 7 sequential SQL steps from manually curated network mappings.

How It Works

All 7 SQL files share the same {% macro network_row(...) %} Jinja2 macro. Each call computes:

network_id: from_big_endian_64(xxhash64(CAST(payer_id || network_name AS varbinary))) — a deterministic 64-bit integer
network_state: An ARRAY of state strings, or a single-element ARRAY (many networks are national)
payer_data_network: An ARRAY of exact MRF plan name strings
network_class / network_type: Hardcoded per file (e.g., 'Commercial' / 'PPO')

The mappings come from network_mappings.py in the DAG config, passed as Airflow variables. The SQL files themselves contain no raw data — they're pure templates.

Worked Example: UHC Choice Plus PPO

network_mappings.py entry: payer_id='643', network_name='UHC Choice Plus PPO', payer_data_network=['CHOICE PLUS', 'CHOICE PLUS NATIONAL'], states=[NULL] (national)
network_id computed: from_big_endian_64(xxhash64(CAST('643UHC Choice Plus PPO' AS varbinary))) → same integer every run
payer_data_network: ARRAY['CHOICE PLUS', 'CHOICE PLUS NATIONAL']

Result

Spine row: network_id=X, network_name='UHC Choice Plus PPO', payer_id='643', payer_data_network=['CHOICE PLUS', 'CHOICE PLUS NATIONAL'], network_class='Commercial', network_type='PPO'

Where to make changes: When adding or renaming a network, update two places:

network_mappings.py — the single source of truth for network definitions (payer_id, network_name, payer_data_network strings). The network spine SQL is generated directly from this file.
variables_ppo.sql — ppo_payer_keywords — keyword strings used to match hospital MRF plan_name. These are matched against a normalized plan_name: REPLACE(REPLACE(UPPER(plan_name),'_',' '),' ',''). Write keywords ALLCAPS with NO SPACES (e.g., 'CHOICEPLUS', not 'Choice Plus').

warning

Silent failure mode: If the payer_data_network string in network_mappings.py doesn't match the MRF's plan_name (for payer MRF) or keywords don't match the normalized plan_name (for hospital MRF), no error is raised — rates simply aren't assigned to that network. Always verify with a research query on hospital_rates.plan_name after adding a network.

Network Types

SQL File	Network Type	`network_class`	Build Step	Notes
`network_base.sql`	PPO	Commercial	CREATE OR REPLACE	Creates the table; all other steps INSERT into it
`network_hmo.sql`	HMO	Commercial	INSERT INTO	Regional HMO plans (e.g., Kaiser CA)
`network_narrow.sql`	Narrow	Commercial	INSERT INTO	Narrow/tiered networks (smaller provider panels)
`network_exchange.sql`	Exchange	Exchange	INSERT INTO	ACA Marketplace plans — Hospital provider type only in ROS
`network_custom.sql`	Custom PPO + HMO	Commercial	INSERT INTO	Custom commercial networks not in standard PPO/HMO lists
`network_medicare_advantage.sql`	Medicare Advantage	Medicare Advantage	INSERT INTO	`network_class` AND `network_type` both = 'Medicare Advantage'
`network_dedup.sql`	—	—	CREATE OR REPLACE	`SELECT DISTINCT *` removes any duplicate rows from prior INSERTs

payer_data_network — the MRF join key:

The payer_data_network column is an ARRAY of exact plan name strings as they appear in payer MRF files. This is how the Raw Data phase assigns network_id to payer MRF rates.

Network spine row: network_name="UHC Choice Plus PPO", payer_data_network=['CHOICE PLUS', 'CHOICE PLUS NATIONAL']
Payer MRF rate: plan_name="CHOICE PLUS" for provider X, code 99213, rate $180
Join condition: hospital_rates.plan_name = ANY(network_spine.payer_data_network)

Result

Rate is assigned: (provider X, payer 643, network=UHC Choice Plus PPO, code 99213, bill_type=Professional) → ROID → $180

warning

payer_data_network is for payer MRF (core_rates) only — exact match. The join is hospital_rates.plan_name = ANY(payer_data_network) with no normalization. 'CHOICE PLUS' and 'Choice Plus' are different strings. Hospital MRF (hospital_rates) uses a separate keyword matching path with normalized plan_name (ALLCAPS, no spaces) — see ppo_payer_keywords in variables_ppo.sql.

Deterministic network_id:

Because the hash is deterministic, the same (payer_id, network_name) pair always produces the same network_id — even across different CLD versions, different months, or re-runs. This makes network_id a stable foreign key you can compare across versions.

warning

Stability cuts both ways. If you rename a network (e.g., "Aetna Choice PPO" → "Aetna Choice Plus"), the network_id changes. All historical rates keyed by the old network_id become orphaned. Rename networks only intentionally, and be aware of cross-version comparison implications.

info

You can compute network_id offline. SELECT from_big_endian_64(xxhash64(CAST('643' || 'UHC Choice Plus PPO' AS varbinary))) and use that value directly in your WHERE clause.

Code Spine

All billing codes in scope with type classification, bill type assignment, and code flags. Built from 11 source-specific codesets via a UNION ALL in code.sql.

How It Works

code.sql is structured in 3 layers:

Prep CTEs: Compute revenue, filter drug codes, build lab revenue ranking, pull SSP line codes.
all_codes CTE: The 11-source UNION ALL — assembles (bill_type, billing_code_type, billing_code, facility) from every source.
Final SELECT: Deduplicates all_codes, then LEFT JOINs in metadata: service_description, service_line, therapeutic_area, drug_name_array, is_surg_code, is_drug_code, is_lab_code, is_device_dependent.

facility flag — PG codes are duplicated:

Physician Group codes appear twice in the spine — once with facility=True and once with facility=False. This reflects the Medicare two-rate system:

facility=True: The "facility rate" — lower, because the facility covers overhead. Used when a PG physician sees a patient in a hospital.
facility=False: The "non-facility rate" — higher, covering physician's overhead. Used for office visits.

Result

Same HCPCS code 99214 appears twice: (Professional, HCPCS, 99214, facility=True) and (Professional, HCPCS, 99214, facility=False) — each gets its own ROID.

is_surg_code — 3 sources of truth:

A code is flagged surgical if it appears in ANY of:

OPG (Outpatient Procedure Grouper) schedules from UHC, Aetna, Cigna, or BCBS CA
HCPCS codes whose first digit is 1–6 (CMS numeric prefix convention for surgical procedures)
MS-DRGs marked as type = 'SURG' in CMS MS-DRG weights table

APR-DRG — the SOI suffix pattern:

Source: APR-DRG base codes (e.g., '302') from SPINES_SERVICES where service_type='APR-DRG'
CROSS JOIN: Each code is crossed with ['-1', '-2', '-3', '-4'] to produce 4 rows
Result: '302' → '302-1' (minor), '302-2' (moderate), '302-3' (major), '302-4' (extreme)

Result

~600 APR-DRG base codes × 4 SOI levels = ~2,400 APR-DRG rows in the code spine.

11 Code Sources

#	Source	Bill Type	Provider Types	Managed By
1	All MS-DRG codes (excluding transplant + CAR-T/ECMO)	Inpatient	Hospitals	sub-DAG (SPINES_SERVICES)
2	APR-DRG × SOI suffix (-1 to -4)	Inpatient	Hospitals (NY/NJ markets)	sub-DAG (SPINES_SERVICES)
3	OUTPATIENT_CODESET (top ~5K HCPCS by Medicare revenue)	Outpatient	Hospitals	cld-utils DAG (date-stamped table)
4	ASC_CODESET (ASC-payable procedure codes)	Outpatient	ASCs	cld-utils DAG (date-stamped table)
5	SSP line codes (Shoppable Service Packages)	Outpatient + Professional	Hospitals	sub-DAG (SSP_SUB_PACKAGES)
6	J1 HCPCS (OPPS status indicator J1)	Outpatient	Hospitals	sub-DAG (TMP_OPPS_MEDICARE_REFERENCE_TABLE)
7	Life Sciences drug codes (ASP + OPPS G/K)	Outpatient + Professional	All	sub-DAG (ASP_PRICING_TABLE, TMP_OPPS)
8a	PG codeset — `facility=True`	Professional	Physician Groups	cld-utils DAG (date-stamped table)
8b	PG codeset — `facility=False`	Professional	Physician Groups	cld-utils DAG (date-stamped table)
9	Lab codeset (Labs + PGs)	Professional	Labs + Hospitals + PGs	cld-utils DAG (LAB_HOSPITAL_AND_PGS_CODESET)
10	Top 750 lab codes by Komodo revenue	Professional	Labs	sub-DAG (CLAIMS_MEDICAL_SERVICE_LINES)
11	DME codeset	Professional	DME Suppliers	cld-utils DAG (date-stamped table)
—	Dialysis + Urgent Care codesets	Professional	Dialysis, Urgent Care	cld-utils DAG (date-stamped tables)

info

cld-utils vs sub-DAG: Sources 3, 4, 8, 9, 11 are date-stamped tables (e.g., outpatient_codeset_2026_03_24) built by the cld-utils DAG. After a cld-utils rebuild, update the pointer in params.py. Sources 1, 2, 5, 6, 7, 10 are computed fresh each run from live reference tables.

Code Flags

Flag	How it's set	Pipeline impact
`is_drug_code`	`billing_code` appears in `dpr_codes` (ASP pricing table UNION OPPS G/K status)	Drug dosage standardization in Transformations; ASP-based outlier bounds (0.8×–10× ASP); separate BRIT accuracy scoring path
`is_lab_code`	`billing_code` in `LAB_HOSPITAL_AND_PGS_CODESET` OR `CLINICAL_LAB_REFERENCE_PRICING`	Lab outlier bounds (0.2×–4.5× Medicare fee schedule); lab-specific BRIT accuracy
`is_surg_code`	In OPG schedules OR HCPCS prefix 1-6 OR MS-DRG `type='SURG'`	OPG-based imputations in Imputations phase
`is_device_dependent`	`billing_code` in `DEVICE_DEPENDENT_CODES` table	Signals that rate may include implantable device cost

warning

MS-DRG exclusions: Transplant codes (001–017, 019) and CAR-T/ECMO codes (650–652) are excluded — too specialized, too few covered lives. MS-DRG 018 (CAR-T) IS included but restricted to designated CAR-T treatment centers via plausibility filter.

info

Drug codes are inserted twice: Life Sciences codes appear as both Outpatient and Professional bill_type. PG drug codes (in dpr_codes) are EXCLUDED from source 8 (PG codeset) to avoid double-counting — they're covered by source 7.

Overview​

Payer Spine​

How It Works​

Key Columns​

Provider Spine​

How It Works​

Provider Types​

Network Spine​

How It Works​

Network Types​

Code Spine​

How It Works​

11 Code Sources​

Code Flags​

On this page:

Overview

Payer Spine

How It Works

Key Columns

Provider Spine

How It Works

Provider Types

Network Spine

How It Works

Network Types

Code Spine

How It Works

11 Code Sources

Code Flags