Accuracy Scoring
Every rate column on every ROID row receives a numeric score reflecting how trustworthy it is. Rate selection in Stage 7 simply picks the highest-scored non-NULL value per ROID.
Overview
Why Accuracy Scores Exist
- Scenario A — Validated: Payer MRF reports 145. Agreement within ~3%. → Both rates get score 7. Rate selection picks the payer rate.
- Scenario B — Unvalidated: Payer MRF reports $150. No hospital MRF record for this ROID. Single-source rate within outlier bounds. → Score 4. Usable, but will lose to any validated rate.
- Scenario C — Outlier: Payer MRF reports 120. This is a 40× multiple — almost certainly a data error. → Score 1. Rate selection deprioritizes this rate unless it's the only option.
The score system lets rate selection automatically prefer the most trustworthy value — no manual curation required.
Two Rounds of Scoring
- Raw Accuracy (chunked by payer) — Runs immediately after
combined_rawis built. Scores all raw and transformed rate columns. Computespayer_rates_arrayandhospital_rates_arrayfor counterparty checking. Output is also used to filter long rates for imputation (only non-outlier rates feed in). →tmp_int_accuracy_raw - BRIT Combination — Benchmarks + Raw + Imputations + Transformations are merged into one wide table per ROID. →
tmp_int_combined_brit - BRIT Accuracy (by rate/provider type) — Second scoring pass after BRIT is built. Separate SQL and criteria for Medical, Drugs, Labs, PG, DME, Urgent Care — each type has different outlier bounds and validation rules. Runs in parallel. →
tmp_int_accuracy_brit_medical,_drugs,_labs,_physician_groups,_dme,_urgent_care - BRIT Accuracy Union — All type-specific BRIT accuracy tables merged into the final accuracy table. →
tmp_int_accuracy_brit
Score Scale (0–7) vs Canonical Rate Score (1–5)
These are two different scales — a common point of confusion:
| Scale | Range | Where used | Purpose |
|---|---|---|---|
| Accuracy score | 0–7 (with decimal tiebreaker) | Internal pipeline tables (_validation_score columns) | Determines which rate wins in rate selection |
| Canonical rate score | 1–5 | Output tables (prod_combined_abridged, prod_combined_all) | User-facing confidence indicator for the selected rate |
The canonical rate score is a simplified translation of the winning rate's accuracy score — published so downstream users can filter by confidence without needing to understand the internal 0–7 system.
Outlier Bounds
Rates must fall within Medicare-anchored ranges. Different code types have different acceptable multiplier ranges.
| Rate Type | Lower Bound | Upper Bound | Notes |
|---|---|---|---|
| Medical (Non-IP) | 0.5× Medicare | 30× Medicare | State percentile fallbacks when Medicare unavailable |
| Medical (Inpatient) | 0.9× Medicare | 10× Medicare | Stricter — IP rates cluster tighter around Medicare |
| Medical (HMO/Exchange IP) | 0.5× Medicare | 10× Medicare | Looser lower bound for managed care contracts |
| Drugs (Payer source) | 0.8× Medicare | 10× Medicare | DPR exception: up to 1000% allowed for payer rates |
| Drugs (Hospital source) | 0.8× Medicare | 4× Medicare | Hospital drug rates more constrained |
| Labs | 0.2× Medicare | 4.5× Medicare | Lab rates often very close to Medicare |
| DME | 0.5× Medicare | 5.5× Medicare | |
| Physician Group | 0.5× Medicare | 5.5× Medicare | Anesthesia codes: 200 hard cap |
Why Medicare? Medicare rates (IPPS, OPPS, MPFS, ASP) are publicly available, actuarially grounded, and exist for nearly every billing code. They provide the anchor for "is this negotiated rate plausible?"
Fallback bounds: When Medicare rate is unavailable for a code, state-level percentiles (30th and 70th) from the pre-computed outlier bounds table are used instead. These are computed from the distribution of all observed rates for that code.
Normal CDF Scoring
Uses the log-normal distribution to quantify how "common" a rate is relative to the observed distribution for that code.
Walk-through: CDF computation
- Setup: Code 99213, pre-computed stats: median=4.8 (log-space), stddev=0.6
- Rate A: 150)=5.01 → epsilon=0.05×4.8=0.24
- CDF window:
normal_cdf(4.8, 0.6, 5.01-0.24) - normal_cdf(4.8, 0.6, 5.01+0.24) - Result: CDF = 0.78 (78% of observed rates fall in a similar range → very common)
- Rate B: 500)=6.21 → CDF = 0.05 (only 5% of rates are this high → unusual)
Rate A gets score 6.78, Rate B gets score 6.05. Both are "raw, not outlier" (integer tier 6), but A is preferred as tiebreaker.
Drug exception: Drug codes (is_drug_code=true) get CDF=0, not distribution-based scoring. Drug rates are evaluated purely against ASP bounds because drug pricing distributions are highly bimodal (brand vs generic).
Counterparty Validation (Score = 7)
The highest confidence signal: payer and hospital independently report similar rates for the same ROID.
The Logic
For each payer rate column, check if it matches any hospital rate within ±20% (or ±10% for rates above $15,000). And vice versa for hospital rates.
Walk-through: counterparty validation
- Setup: Code 99213, Provider X, Payer Y
- Payer reports: $1,000 (negotiated rate)
- Hospital reports: [1,050, $2,000] (array from different methodology columns)
- Check payer $1,000 vs hospital array:
- vs 1,000-50 ≤ 20%×200) → MATCH
- Also check: Must pass outlier bounds (0.5-30× Medicare for non-IP) → $1,000 is within bounds
Score = 7 + ($1,000 / 1e8) = 7.00001 (validated). The /1e8 tiebreaker means: among validated rates, higher dollar rates are slightly preferred.
Why ±20%? Payer and hospital MRF files are independently produced. Differences arise from rounding, per-day vs per-case structures, and reporting methodology variations. The 20% tolerance (10% above $15K) accounts for these structural differences while requiring fundamental agreement.
Symmetry: Both directions are checked independently. A payer rate gets score 7 if it matches a hospital rate, AND a hospital rate gets score 7 if it matches a payer rate. Both validated rates compete in the final selection.
BRIT Accuracy
A second scoring pass runs after all rate sources — Benchmarks, Raw, Imputations, and Transformations — are merged into one wide table per ROID. Each provider/rate type gets its own accuracy SQL with type-specific outlier bounds and scoring logic.
Why a Second Pass?
Raw accuracy scores all payer and hospital rate columns immediately after combined_raw is built — before imputations or benchmarks exist. BRIT accuracy re-scores everything after combined_brit merges all four layers, so imputation columns get scored for the first time, and previously scored raw columns can be re-evaluated with more context.
BRIT Accuracy Tasks
Six type-specific tasks run in parallel, each reading from tmp_int_combined_brit. Medical is split into three sequential sub-steps:
- Drugs — Drug-specific outlier bounds (0.8–10× Medicare for payer; 0.8–4× for hospital). DPR exception: payer drug rates up to 1000% of Medicare allowed. CDF scoring disabled (
is_drug_code=true→ CDF=0). →tmp_int_accuracy_brit_drugs - Labs — Tight bounds: 0.2–4.5× Medicare. Lab rates cluster near Medicare, so the narrow window is appropriate. →
tmp_int_accuracy_brit_labs - Medical — Rates — Scores all payer and hospital raw+transformation columns for non-IP (0.5–30×) and IP (0.9–10×) medical codes. Builds
payer_rates_arrayandhospital_rates_arrayfor counterparty checking. →tmp_int_accuracy_brit_medical_rates - Medical — Imputations — Scores imputation columns for medical ROIDs separately. Imputed rates use a different score ceiling (max score 4, not 7) — they can never beat a validated raw rate. →
tmp_int_accuracy_brit_medical_imputations - Medical — Join — Combines medical rates + medical imputations into the final medical accuracy table. →
tmp_int_accuracy_brit_medical - Physician Groups — PG-specific: anesthesia codes get a hard dollar cap (200) in addition to the 0.5–5.5× Medicare multiplier bound. →
tmp_int_accuracy_brit_physician_groups - DME — DME codes: 0.5–5.5× Medicare bounds. →
tmp_int_accuracy_brit_dme - Urgent Care — Urgent care ROIDs scored separately. Applies standard medical bounds within the urgent care provider type. →
tmp_int_accuracy_brit_urgent_care - BRIT Union — All 6 type-specific tables unioned into the final BRIT accuracy table. →
tmp_int_accuracy_brit
What Changes Between Raw and BRIT Accuracy?
| Dimension | Raw Accuracy | BRIT Accuracy |
|---|---|---|
| Input table | tmp_int_combined_raw | tmp_int_combined_brit |
| Columns scored | Payer + hospital raw columns only | Raw + transformed + imputed + benchmark columns |
| Chunking | By payer | By rate/provider type (parallel) |
| Counterparty arrays | Built here — used for raw score 7 | Rebuilt with full data — final validated scores |
| Type-specific logic | None — one SQL for all types | Separate SQL per type (drugs, labs, medical, PG, DME, UC) |
Medical 3-step design: Medical rates and medical imputations are scored separately before joining. This lets the imputation scoring use different score ceilings (imputed rates cannot reach score 7) without complicating the main rates SQL.
Output of BRIT accuracy = input to rate selection. tmp_int_accuracy_brit is the final wide table with every rate column and its _validation_score for every ROID. Rate selection reads this table and picks the column with the highest score.
Score Hierarchy
Complete mapping from internal validation scores to user-facing confidence scores.
| Internal Score | Label | Confidence (0–5) | Meaning | Decimal Component |
|---|---|---|---|---|
| 7.x | Validated | 5 | Payer + hospital agree within ±20% | rate / 1e8 (prefer higher rates) |
| 6.x | Raw, not outlier | 4 | Single source, within bounds | CDF value (prefer common rates) |
| 5.x | Benchmark-validated | 3 | Transformed rate within 95-1000% of Medicare | CDF value |
| 4.x | Not outlier | 2 | Within bounds, no validation | CDF value |
| 3.x | Imputation benchmark | 3 | Imputation, benchmark-based | CDF value |
| 2.x | Imputation not outlier | 2 | Imputation within bounds | CDF value |
| 1.x | Outlier | 1 | Outside all acceptable bounds | CDF value |
| 0 | No rate | 0 | NULL / no rate available | — |
The decimal tiebreaker is everything. Among 100 payer rates that all score 6.x (raw, not outlier), the one with the highest CDF (most common rate value) wins. Among validated rates (7.x), the one with the highest dollar amount wins. This ensures deterministic selection without arbitrary ordering.