Skip to main content
Version: 2.2

4. Canonicalization

Canonicalization is the final step of the CLD sub-DAG pipeline where all processed rate data is combined into a unified table and the "best" rate is selected for each rate object. This process creates the authoritative version of each rate that will be used in the final production tables.

Combined Main Table Creation​

The canonicalization process operates through the combined_main.sql pipeline which brings together all data processing components:

Data Integration​

The combined table integrates multiple data streams:

-- Example structure from combined_main.sql
WITH combined_data AS (
SELECT
r.roid,
r.payer_id,
r.network_id,
r.provider_id,
r.billing_code,
r.billing_code_type,
-- Raw rates from MRF data
r.payer_negotiated_rate,
r.hospital_case_rate_dollar,
-- Transformed rates (percentages to dollars)
t.payer_gc_hosp_perc_to_dol,
t.hospital_per_diem_mult_alos,
-- Imputed rates
i.msdrg_base_rate,
i.opg_base_rate,
-- Benchmark rates
b.medicare_rate,
b.state_avg_medicare_rate,
-- Accuracy scores
a.validation_score_array
FROM rate_objects r
LEFT JOIN transformations t USING (roid)
LEFT JOIN imputations i USING (roid)
LEFT JOIN benchmarks b USING (roid)
LEFT JOIN accuracy a USING (roid)
)

Rate Array Construction​

All available rates for each rate object are organized into parallel arrays to enable systematic comparison:

Core Rate Arrays​

-- Construct arrays of all available rates
rate_array = ARRAY[
NULL, -- Position 0 reserved
payer_negotiated_rate,
payer_fee_schedule_rate,
hospital_case_rate_dollar,
hospital_percent_of_total_billed_charges_dollar,
-- ... all raw rates
payer_gc_hosp_perc_to_dol,
hospital_per_diem_mult_alos,
-- ... all transformed rates
msdrg_base_rate,
opg_base_rate,
-- ... all imputed rates
]

rate_score_array = ARRAY[
NULL,
validation_score_1,
validation_score_2,
validation_score_3,
-- ... corresponding accuracy scores
]

Metadata Arrays​

-- Supporting metadata arrays
source_array = ARRAY[
NULL,
'payer',
'payer',
'hospital',
'hospital',
-- ... source classifications
]

rate_type_array = ARRAY[
NULL,
'raw: payer_negotiated_rate',
'raw: payer_fee_schedule_rate',
'transform: payer_gc_hosp_perc_to_dol',
'impute: msdrg_base_rate',
-- ... rate type classifications
]

methodology_array = ARRAY[
NULL,
'Fee Schedule',
'Case Rate',
'Per Diem',
-- ... contract methodologies
]

Canonical Rate Selection Algorithm​

The canonical rate selection uses array operations to systematically choose the best rate:

Primary Selection Logic​

-- Find the highest accuracy score
max_score = array_max(rate_score_array)

-- Get the position of the highest scoring rate
best_position = ARRAY_POSITION(rate_score_array, max_score)

-- Select the canonical rate and all its metadata
canonical_rate = rate_array[best_position]
canonical_rate_score = rate_score_array[best_position]
canonical_rate_source = source_array[best_position]
canonical_rate_type = rate_type_array[best_position]
canonical_contract_methodology = methodology_array[best_position]

Score Mapping and Validation​

-- Map validation scores to confidence scores (1-5 scale)
canonical_rate_score = CASE
WHEN FLOOR(max_score) = 7 THEN 5 -- Validated
WHEN FLOOR(max_score) = 6 THEN 4 -- Raw Posted Rate
WHEN FLOOR(max_score) = 5 THEN 3 -- Validated Transform/Impute
WHEN FLOOR(max_score) = 4 THEN 2 -- Unvalidated Transform/Impute
WHEN FLOOR(max_score) = 0 THEN 1 -- Outlier
ELSE 0 -- No Rate
END

-- Extract validation method from score mapping
canonical_rate_validation_method = CASE
WHEN FLOOR(max_score) = 7 THEN 'Benchmark Validated'
WHEN FLOOR(max_score) = 6 THEN 'Raw MRF Posted'
WHEN FLOOR(max_score) = 5 THEN 'Transform Validated'
WHEN FLOOR(max_score) = 4 THEN 'Transform Unvalidated'
WHEN FLOOR(max_score) = 0 THEN 'Outlier Flagged'
ELSE 'No Rate Available'
END

Derived Canonical Metrics​

Beyond the primary rate selection, the canonicalization process calculates important derived metrics:

Benchmark Comparisons​

-- Calculate rate as percentage of Medicare benchmarks
canonical_rate_percent_of_medicare =
CASE
WHEN exclude_medicare_calc = false AND medicare_rate > 0
THEN canonical_rate / medicare_rate
ELSE NULL
END

canonical_rate_percent_of_state_avg_medicare =
CASE
WHEN exclude_medicare_calc = false AND state_avg_medicare_rate > 0
THEN canonical_rate / state_avg_medicare_rate
ELSE NULL
END

Traceability Information​

-- Track original data sources for transformed/imputed rates
canonical_rate_original_billing_codes =
original_billing_codes_array[
ARRAY_POSITION(rate_score_array_no_impute, array_max(rate_score_array_no_impute))
]

canonical_crosswalk_method =
CASE
WHEN original_billing_code_type_array[best_position] IS NOT NULL
THEN 'Crosswalked From ' || original_billing_code_type || ' to ' || billing_code_type
END

-- Track gross charge sources for percentage-based rates
canonical_gross_charge_type =
CASE
WHEN rate_type LIKE '%gc_hosp_cbsa%' THEN 'cbsa_market_average'
WHEN rate_type LIKE '%gc_hosp%' THEN 'provider_specific'
WHEN rate_type LIKE '%gc_komodo%' THEN 'claims_based'
ELSE NULL
END

Output Structure​

See components/views/abridged.md for the final output structure of the canonicalized rate table.