Version: 2.2

Lab Accuracy Scores

We would like to create lab-specific accuracy scoring methodology. Important considerations are that Lab rates are frequently expected to be below Medicare rates, as low as 40% would not be uncommon. Also, since labs do not have hospital-posted rates, they cannot be validated. Lastly, we would generally expect lab rates to be just a percentage of medicare, and not too variable within a provider-network pair.

The following scoring hierarchy is proposed for labs:

First, we define a "consistent percentage of medicare rate" as having at least 30 rates for a given lab/network combination AND having the difference between % of Medicare rates' 5th and 95th percentiles be less than 0.25.

------------------------------------
-- LABS HIERARCHY: 
-- SCORE = 5: has_consistent_pct_of_medicare_rate AND score between 0.4 and 1.3 (10th and 90th percentiles of consistent rates)
-- SCORE = 4: has_consistent_pct_of_medicare_rate AND score between 0.3 and 3 (1st and 99th percentiles of consistent rates)
-- SCORE = 3: score between 0.4 and 1.3
-- SCORE = 2: not an outlier 0.3 and 4.5 (captures 99% of posted lab rates)
------------------------------------

Analysis

The tmp_int_imputations_long_rates_YYYY_MM table is a preprocessed table used in the impuations logic. It just re-orients the raw rates data into a long format, with one column containing rates (instead of casting different contract methodologies and negotiated types wide).

Using this table, we filter to labs and compute the percentiles of lab code rates.

Then, let's look at the distribution of the p50 (median) of lab code rates.

	p50
count	3472
mean	1.0825
std	0.971769
min	0.000392471
1%	0.333998
5%	0.42
10%	0.488372
25%	0.568122
50%	0.767877
75%	1.07431
90%	2.14569
95%	3.49657
99%	4.58689
max	12.1191

df = pd.read_sql(f"""
SELECT 
    provider_id,
    network_id,
    i.rate / b.medicare_rate as pct_of_medicare
FROM tq_dev.internal_dev_csong_cld_v2_2_2.tmp_int_imputations_long_rates_2025_08 i
LEFT JOIN tq_dev.internal_dev_csong_cld_v2_2_2.tmp_int_benchmarks_2025_08 b
    ON i.roid = b.roid
    AND i.payer_id = b.payer_id
WHERE 
    i.provider_type = 'Laboratory'
ORDER BY RANDOM()
LIMIT 50000
""", con=trino_conn)

# %%
print(
    df['p50']
    .describe(percentiles=[0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99])
    .to_markdown()
)

We can also filter to where there are at least 30 rates for a given lab/network combination AND where the difference between the 5th and 95th percentiles is less than 0.25.

We can consider to be "valid" lab/network combinations and evaluate their distribution for tighter ranges of rates.

	pct_of_medicare
count	49960
mean	0.826693
std	0.582168
min	1.9876e-06
1%	0.25
5%	0.399993
10%	0.41999
25%	0.494755
50%	0.670164
75%	1
90%	1.29941
95%	1.95369
99%	2.89798

df = pd.read_sql(f"""
WITH 
df AS (
    SELECT 
        provider_id,
        network_id,
        APPROX_PERCENTILE(i.rate / b.medicare_rate, 0.05) as p05,
        APPROX_PERCENTILE(i.rate / b.medicare_rate, 0.25) as p25,
        APPROX_PERCENTILE(i.rate / b.medicare_rate, 0.5) as p50,
        APPROX_PERCENTILE(i.rate / b.medicare_rate, 0.75) as p75,
        APPROX_PERCENTILE(i.rate / b.medicare_rate, 0.95) as p95,
        COUNT(*) as n_rates
    FROM tq_dev.internal_dev_csong_cld_v2_2_1.tmp_int_imputations_long_rates_2025_08 i
    LEFT JOIN tq_dev.internal_dev_csong_cld_v2_2_1.tmp_int_benchmarks_2025_08 b
        ON i.roid = b.roid
        AND i.payer_id = b.payer_id
    WHERE 
        i.provider_type = 'Laboratory'
    GROUP BY 1,2
),
consistent AS (
    SELECT 
        *
    FROM df
    WHERE 
        p95 - p05 < 0.25
        AND n_rates >= 30
)
SELECT 
    i.rate / b.medicare_rate as pct_of_medicare
FROM tq_dev.internal_dev_csong_cld_v2_2_1.tmp_int_imputations_long_rates_2025_08 i
LEFT JOIN tq_dev.internal_dev_csong_cld_v2_2_1.tmp_int_benchmarks_2025_08 b
    ON i.roid = b.roid
    AND i.payer_id = b.payer_id
JOIN consistent USING (provider_id, network_id)
ORDER BY RANDOM()
LIMIT 50000
""", con=trino_conn)

print(
    df['pct_of_medicare']
    .describe(percentiles=[0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99])
    .to_markdown()
)

Analysis​

On this page:

Analysis