Skip to main content
Version: 2.2

Lab Accuracy Scores

We would like to create lab-specific accuracy scoring methodology. Important considerations are that Lab rates are frequently expected to be below Medicare rates, as low as 40% would not be uncommon. Also, since labs do not have hospital-posted rates, they cannot be validated. Lastly, we would generally expect lab rates to be just a percentage of medicare, and not too variable within a provider-network pair.

The following scoring hierarchy is proposed for labs:

First, we define a "consistent percentage of medicare rate" as having at least 30 rates for a given lab/network combination AND having the difference between % of Medicare rates' 5th and 95th percentiles be less than 0.25.

------------------------------------
-- LABS HIERARCHY:
-- SCORE = 5: has_consistent_pct_of_medicare_rate AND score between 0.4 and 1.3 (10th and 90th percentiles of consistent rates)
-- SCORE = 4: has_consistent_pct_of_medicare_rate AND score between 0.3 and 3 (1st and 99th percentiles of consistent rates)
-- SCORE = 3: score between 0.4 and 1.3
-- SCORE = 2: not an outlier 0.3 and 4.5 (captures 99% of posted lab rates)
------------------------------------

Analysis​

The tmp_int_imputations_long_rates_YYYY_MM table is a preprocessed table used in the impuations logic. It just re-orients the raw rates data into a long format, with one column containing rates (instead of casting different contract methodologies and negotiated types wide).

Using this table, we filter to labs and compute the percentiles of lab code rates.

Then, let's look at the distribution of the p50 (median) of lab code rates.

p50
count3472
mean1.0825
std0.971769
min0.000392471
1%0.333998
5%0.42
10%0.488372
25%0.568122
50%0.767877
75%1.07431
90%2.14569
95%3.49657
99%4.58689
max12.1191
df = pd.read_sql(f"""
SELECT
provider_id,
network_id,
i.rate / b.medicare_rate as pct_of_medicare
FROM tq_dev.internal_dev_csong_cld_v2_2_2.tmp_int_imputations_long_rates_2025_08 i
LEFT JOIN tq_dev.internal_dev_csong_cld_v2_2_2.tmp_int_benchmarks_2025_08 b
ON i.roid = b.roid
AND i.payer_id = b.payer_id
WHERE
i.provider_type = 'Laboratory'
ORDER BY RANDOM()
LIMIT 50000
""", con=trino_conn)

# %%
print(
df['p50']
.describe(percentiles=[0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99])
.to_markdown()
)

We can also filter to where there are at least 30 rates for a given lab/network combination AND where the difference between the 5th and 95th percentiles is less than 0.25.

We can consider to be "valid" lab/network combinations and evaluate their distribution for tighter ranges of rates.

pct_of_medicare
count49960
mean0.826693
std0.582168
min1.9876e-06
1%0.25
5%0.399993
10%0.41999
25%0.494755
50%0.670164
75%1
90%1.29941
95%1.95369
99%2.89798
df = pd.read_sql(f"""
WITH
df AS (
SELECT
provider_id,
network_id,
APPROX_PERCENTILE(i.rate / b.medicare_rate, 0.05) as p05,
APPROX_PERCENTILE(i.rate / b.medicare_rate, 0.25) as p25,
APPROX_PERCENTILE(i.rate / b.medicare_rate, 0.5) as p50,
APPROX_PERCENTILE(i.rate / b.medicare_rate, 0.75) as p75,
APPROX_PERCENTILE(i.rate / b.medicare_rate, 0.95) as p95,
COUNT(*) as n_rates
FROM tq_dev.internal_dev_csong_cld_v2_2_1.tmp_int_imputations_long_rates_2025_08 i
LEFT JOIN tq_dev.internal_dev_csong_cld_v2_2_1.tmp_int_benchmarks_2025_08 b
ON i.roid = b.roid
AND i.payer_id = b.payer_id
WHERE
i.provider_type = 'Laboratory'
GROUP BY 1,2
),
consistent AS (
SELECT
*
FROM df
WHERE
p95 - p05 < 0.25
AND n_rates >= 30
)
SELECT
i.rate / b.medicare_rate as pct_of_medicare
FROM tq_dev.internal_dev_csong_cld_v2_2_1.tmp_int_imputations_long_rates_2025_08 i
LEFT JOIN tq_dev.internal_dev_csong_cld_v2_2_1.tmp_int_benchmarks_2025_08 b
ON i.roid = b.roid
AND i.payer_id = b.payer_id
JOIN consistent USING (provider_id, network_id)
ORDER BY RANDOM()
LIMIT 50000
""", con=trino_conn)

print(
df['pct_of_medicare']
.describe(percentiles=[0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99])
.to_markdown()
)
On this page: