Skip to main content
Version: Canary - 2.3 🚧

Outlier Bounds


Overview​

Outlier bounds are statistical thresholds used to identify and filter pricing anomalies in healthcare rate data. We compute separate outlier bounds for three types of pricing data:

  • Negotiated Rates
  • List Prices: Gross charges published by hospitals
  • Cash Prices: Cash rates offered to uninsured patients, published by hospitals

These bounds serve to exclude rates that are statistically implausible or represent data entry errors.

Statistical Methodology​

Log-Normal Distribution Assumption​

Healthcare pricing data typically follows a log-normal distribution, where the natural logarithm of prices is normally distributed. This makes log-scale transformations appropriate for statistical analysis.

All three pricing types use this fundamental approach:

  1. Apply natural logarithm transformation to rates
  2. Calculate quartiles (Q1, Q3) on the log scale
  3. Compute Interquartile Range (IQR) = Q3 - Q1
  4. Set bounds using IQR-based rules
  5. Transform bounds back to original scale using exponential function

IQR Truncation​

To prevent overly permissive bounds in highly variable distributions, the IQR is truncated at 1.0:

CASE
WHEN iqr > 1 THEN 1
ELSE iqr
END as iqr_truncated

Negotiated Rate Outlier Bounds​

Negotiated rate bounds use the most sophisticated methodology, incorporating multiple reference points and fallback strategies.

Data Requirements​

  • Minimum observations: 40 validated rates per provider-code combination
  • Quality filter: Only rates with canonical_rate_score = 5 (highest confidence)
  • Initial threshold: Rates must be ≤ $100M to exclude obvious data errors

Hierarchical Boundary Strategy​

The algorithm applies different boundary calculation methods based on data availability:

1. ASP-Based Bounds (Drug Codes)​

For outpatient HCPCS drug codes with ASP (Average Sales Price) reference data:

  • Lower bound: ASP × 0.8 (80% of ASP)
  • Upper bound: ASP × 4.0 (400% of ASP)

This reflects CMS reimbursement patterns where Medicare pays ASP + 6%, while commercial rates vary more widely.

(For payer MRF data, the upper bound is capped at 1000% of ASP.)

2. Medicare-Based Bounds (Inpatient)​

For inpatient services with Medicare reference rates:

  • Lower bound: Medicare Rate × 0.9 (90% of Medicare)
  • Upper bound: Standard log-IQR method

Uses state-level Medicare rates when available, falling back to national averages.

3. Log-IQR Bounds (Standard Case)​

For codes with sufficient validated rate observations (n ≥ 40):

  • Lower bound: exp(Q1 - 2 × IQR_truncated)
  • Upper bound: exp(Q3 + 2 × IQR_truncated)

The 2× multiplier is more conservative than the traditional 1.5× outlier rule, accounting for healthcare price volatility.

4. Medicare Percentage Bounds (Sparse Data)​

For codes with insufficient validated rates (n < 40):

  • Lower bound: Medicare Rate × 0.1 (10% of Medicare)
  • Upper bound: Medicare Rate × 10 (1000% of Medicare)

Upper Bound Ceiling​

A maximum upper bound prevents extreme outliers:

WHEN exp(q3 + 2 * iqr_truncated) > COALESCE(avg_medicare_rate, 0) * 100
THEN COALESCE(avg_medicare_rate, 0) * 100

No negotiated rate can exceed 100× the average Medicare rate for that code.

Exceptions​

  1. Validated Rates
    • if drug code, must be within outlier bounds even if validated
    • if inpatient, must be between 0.9x and 100x Medicare even if validated
    • if outpatient, must be less than 100x Medicare even if validated
  2. Percent-of-Charge Methodology
    • if rate is derived from percent-of-charge with hospital-reported gross charge, allow rates up to 100x Medicare

List Price Outlier Bounds​

List price bounds use a simpler methodology focused on gross charge patterns.

Data Requirements​

  • Minimum observations: 40 gross charges per provider-code combination
  • Rate range: Between 0.01and0.01 and 100M (excludes zero charges and obvious errors)

Boundary Calculation​

Uses log-IQR methodology with expanded multipliers:

  • Lower bound: exp(Q1 - 2.5 × IQR_truncated)
  • Upper bound: exp(Q3 + 2.5 × IQR_truncated)

The 2.5× multiplier (vs 2× for negotiated rates) reflects greater variability in published charges across providers.


Cash Price Outlier Bounds​

Cash price bounds follow the same structure as list prices but target self-pay rates.

Data Requirements​

  • Minimum observations: 40 cash prices per provider-code combination
  • Rate threshold: ≤ $100M (excludes obvious data errors)

Boundary Calculation​

Identical to list prices:

  • Lower bound: exp(Q1 - 2.5 × IQR_truncated)
  • Upper bound: exp(Q3 + 2.5 × IQR_truncated)