Skip to main content
Version: 3.0

Selection Algorithm

Find max score, use its array index to select rate and all metadata.

Core Algorithm

1
Find best score
array_max(rate_score_array) returns the highest accuracy score across all candidate rates for this ROID.
2
Find best index
ARRAY_POSITION(rate_score_array, best_score) returns the 1-based position of the winning rate in the arrays.
3
Extract canonical fields
All canonical_* output fields are populated by indexing into the parallel arrays at best_idx: rate_array[best_idx], source_array[best_idx], rate_type_array[best_idx], etc.
tmp_int_combined_no_whisp

Special Rules

  • Score = 7 → "payer_hospital": When the best score is 7.x, canonical_rate_source is always set to "payer_hospital" regardless of which specific column won. Score 7 means both sources independently agreed — the source label reflects that bilateral agreement, not a single winner.
  • Multiple best indices: best_payer_idx, best_hospital_idx, and best_idx_no_impute are tracked separately alongside best_idx. These support downstream analysis of what each source individually would have chosen.
  • Gross charge type derivation: canonical_gross_charge_type is inferred from the rate_type column name — no separate lookup needed.
  • NULL rates: ROIDs where every column scores 0 get canonical_rate = NULL and canonical_rate_score = 0. The ROID row is preserved in the output — a NULL canonical rate represents a genuine coverage gap, not a processing error.
Determinism

The algorithm is fully deterministic. Given the same input arrays, it always produces the same canonical rate. The CDF and rate/1e8 tiebreakers ensure that no two rates have exactly the same score in practice — ties are theoretically impossible once the decimal component is included.