Selection Algorithm
Find max score, use its array index to select rate and all metadata.
Core Algorithm
1
Find best score
array_max(rate_score_array) returns the highest accuracy score across all candidate rates for this ROID.
2
Find best index
ARRAY_POSITION(rate_score_array, best_score) returns the 1-based position of the winning rate in the arrays.
3
Extract canonical fields
All canonical_* output fields are populated by indexing into the parallel arrays at best_idx: rate_array[best_idx], source_array[best_idx], rate_type_array[best_idx], etc.
→ tmp_int_combined_no_whisp
Special Rules
- Score = 7 → "payer_hospital": When the best score is 7.x,
canonical_rate_sourceis always set to"payer_hospital"regardless of which specific column won. Score 7 means both sources independently agreed — the source label reflects that bilateral agreement, not a single winner. - Multiple best indices:
best_payer_idx,best_hospital_idx, andbest_idx_no_imputeare tracked separately alongsidebest_idx. These support downstream analysis of what each source individually would have chosen. - Gross charge type derivation:
canonical_gross_charge_typeis inferred from therate_typecolumn name — no separate lookup needed. - NULL rates: ROIDs where every column scores 0 get
canonical_rate = NULLandcanonical_rate_score = 0. The ROID row is preserved in the output — a NULL canonical rate represents a genuine coverage gap, not a processing error.
Determinism
The algorithm is fully deterministic. Given the same input arrays, it always produces the same canonical rate. The CDF and rate/1e8 tiebreakers ensure that no two rates have exactly the same score in practice — ties are theoretically impossible once the decimal component is included.