Outlier Bounds
Overview​
Outlier bounds are statistical thresholds used to identify and filter pricing anomalies in healthcare rate data. We compute separate outlier bounds for three types of pricing data:
- Negotiated Rates
- List Prices: Gross charges published by hospitals
- Cash Prices: Cash rates offered to uninsured patients, published by hospitals
These bounds serve to exclude rates that are statistically implausible or represent data entry errors.
Statistical Methodology​
Log-Normal Distribution Assumption​
Healthcare pricing data typically follows a log-normal distribution, where the natural logarithm of prices is normally distributed. This makes log-scale transformations appropriate for statistical analysis.
All three pricing types use this fundamental approach:
- Apply natural logarithm transformation to rates
- Calculate quartiles (Q1, Q3) on the log scale
- Compute Interquartile Range (IQR) = Q3 - Q1
- Set bounds using IQR-based rules
- Transform bounds back to original scale using exponential function
IQR Truncation​
To prevent overly permissive bounds in highly variable distributions, the IQR is truncated at 1.0:
CASE
WHEN iqr > 1 THEN 1
ELSE iqr
END as iqr_truncated
Negotiated Rate Outlier Bounds​
Negotiated rate bounds use the most sophisticated methodology, incorporating multiple reference points and fallback strategies.
Data Requirements​
- Minimum observations: 40 validated rates per provider-code combination
- Quality filter: Only rates with
canonical_rate_score = 5(highest confidence) - Initial threshold: Rates must be ≤ $100M to exclude obvious data errors
Hierarchical Boundary Strategy​
The algorithm applies different boundary calculation methods based on data availability:
1. ASP-Based Bounds (Drug Codes)​
For outpatient HCPCS drug codes with ASP (Average Sales Price) reference data:
- Lower bound:
ASP × 0.8(80% of ASP) - Upper bound:
ASP × 4.0(400% of ASP)
This reflects CMS reimbursement patterns where Medicare pays ASP + 6%, while commercial rates vary more widely.
(For payer MRF data, the upper bound is capped at 1000% of ASP.)
2. Medicare-Based Bounds (Inpatient)​
For inpatient services with Medicare reference rates:
- Lower bound:
Medicare Rate × 0.9(90% of Medicare) - Upper bound: Standard log-IQR method
Uses state-level Medicare rates when available, falling back to national averages.
3. Log-IQR Bounds (Standard Case)​
For codes with sufficient validated rate observations (n ≥ 40):
- Lower bound:
exp(Q1 - 2 × IQR_truncated) - Upper bound:
exp(Q3 + 2 × IQR_truncated)
The 2× multiplier is more conservative than the traditional 1.5× outlier rule, accounting for healthcare price volatility.
4. Medicare Percentage Bounds (Sparse Data)​
For codes with insufficient validated rates (n < 40):
- Lower bound:
Medicare Rate × 0.1(10% of Medicare) - Upper bound:
Medicare Rate × 10(1000% of Medicare)
Upper Bound Ceiling​
A maximum upper bound prevents extreme outliers:
WHEN exp(q3 + 2 * iqr_truncated) > COALESCE(avg_medicare_rate, 0) * 100
THEN COALESCE(avg_medicare_rate, 0) * 100
No negotiated rate can exceed 100× the average Medicare rate for that code.
Exceptions​
- Validated Rates
- if drug code, must be within outlier bounds even if validated
- if inpatient, must be between 0.9x and 100x Medicare even if validated
- if outpatient, must be less than 100x Medicare even if validated
- Percent-of-Charge Methodology
- if rate is derived from percent-of-charge with hospital-reported gross charge, allow rates up to 100x Medicare
List Price Outlier Bounds​
List price bounds use a simpler methodology focused on gross charge patterns.
Data Requirements​
- Minimum observations: 40 gross charges per provider-code combination
- Rate range: Between 100M (excludes zero charges and obvious errors)
Boundary Calculation​
Uses log-IQR methodology with expanded multipliers:
- Lower bound:
exp(Q1 - 2.5 × IQR_truncated) - Upper bound:
exp(Q3 + 2.5 × IQR_truncated)
The 2.5× multiplier (vs 2× for negotiated rates) reflects greater variability in published charges across providers.
Cash Price Outlier Bounds​
Cash price bounds follow the same structure as list prices but target self-pay rates.
Data Requirements​
- Minimum observations: 40 cash prices per provider-code combination
- Rate threshold: ≤ $100M (excludes obvious data errors)
Boundary Calculation​
Identical to list prices:
- Lower bound:
exp(Q1 - 2.5 × IQR_truncated) - Upper bound:
exp(Q3 + 2.5 × IQR_truncated)