Skip to main content
Version: 2.2

Laboratories in v1.2

We're adding labs next because:

  1. Based on a sample of employer claims, they're the highest cost provider types after Hospitals and Physician Groups
  2. They are already in Core Rates
  3. Useful for Hospital Customers

1. Identifying Labs​

As of 07/15/2025, labs are not defined in provider spines.

To identify Labs, we do the following:

a. Filter tq_production.reference_legacy.provider_demographics based on provider_type, taxonomy_code, and npi_type.
b. Filter to top 1000 NPIs with the most encounters based on Komodo medical_headers.
c. Clean up names for Quest and LabCorp.

See list of labs at components/spines/providers by filtering Provider_type to "Laboratory".

Lab CTEs in CLD Provider Spine
-- LABORATORIES
labs_filtered AS (
SELECT
hco_1_npi,
count(distinct encounter_key) as n
FROM tq_intermediate.external_komodo.medical_headers
WHERE hco_1_npi IN (
SELECT npi
FROM tq_production.reference_legacy.provider_demographics
WHERE taxonomy_code IN ('291U00000X', '293D00000X')
)
GROUP BY 1
HAVING count(distinct encounter_key) > 1000
ORDER BY n DESC
LIMIT 1000
),
pd_lab_clean AS (
SELECT
npi,
ein,
CASE
WHEN clean_org_name LIKE 'Quest Diag%' THEN 'Quest Diagnostics'
WHEN
clean_org_name LIKE 'Laboratory Corp%'
OR clean_org_name LIKE 'Labcorp%'
THEN 'LabCorp'
ELSE clean_org_name
END as provider_name,
city,
state,
address,
latitude,
longitude,
cbsa,
cbsa_description as cbsa_name,
county,
taxonomy_code,
taxonomy_grouping
FROM {{ cld_params.Tables.PROVIDER_DEMOGRAPHICS_TABLE.value }}
WHERE taxonomy_grouping = 'Laboratories'
AND taxonomy_code IN (
'291U00000X', -- Clinical Medical Laboratory
'293D00000X' -- Physiological Laboratory
)
AND provider_type IN (
'Clinical Medical Laboratory',
'Physician Group Affiliated Entity',
'Physiological Laboratory',
'Hospital Affiliated Entity',
'Single/Multi-Specialty Physician Group'
)
AND npi_type = '2'
AND (
npi IN (
SELECT hco_1_npi
FROM labs_filtered
)
OR
clean_org_name LIKE 'Quest Diag%'
OR
clean_org_name LIKE 'Laboratory Corp%'
OR
clean_org_name LIKE 'Labcorp%'
)
),
pd_lab_agg AS (
SELECT
CAST(from_big_endian_64(xxhash64(CAST(
provider_name ||
state
AS varbinary
))) AS VARCHAR) as provider_id,
CASE
WHEN provider_name = 'Quest Diagnostics' THEN provider_name || ' ' || state
WHEN provider_name = 'LabCorp' THEN provider_name || ' ' || state
ELSE provider_name
END as provider_name,
ARRAY_DISTINCT(ARRAY_AGG(npi)) as npi,
ARRAY_DISTINCT(ARRAY_AGG(ein)) as ein,
ANY_VALUE(state) as state,
ANY_VALUE(taxonomy_code) as taxonomy_code,
ANY_VALUE(taxonomy_grouping) as taxonomy_grouping,
ANY_VALUE(city) as city,
ANY_VALUE(cbsa) as cbsa,
ANY_VALUE(cbsa_name) as cbsa_name,
ANY_VALUE(county) as county,
ANY_VALUE(address) as address,
ANY_VALUE(latitude) as latitude,
ANY_VALUE(longitude) as longitude
FROM pd_lab_clean
GROUP BY provider_name, state
),

2. Plausibility and Benchmarks​

We consider codes as plausible (and include in Clear Rates) if the code appears in tq_production.reference_internal.clinical_laboratory_reference_pricing.

This table is also used for Medicare benchmarking.

Starting v1_2_6, codeset is restricted to top 750 codes by estimated revenue. We estimate revenue using Komodo encounters and Cliniccal Lab Reference Pricing. See more details below at c. Reducing Code Set.


3. Next Steps​

a. Are 1000 Labs Enough?​

Komodo has good lab coverage. This query returns 7671 distinct NPIs.

SELECT 
hco_1_npi,
count(distinct encounter_key) as n
FROM tq_intermediate.external_komodo.medical_headers
WHERE hco_1_npi IN (
SELECT npi
FROM tq_production.reference_legacy.provider_demographics
WHERE taxonomy_code IN ('291U00000X', '293D00000X')
)
GROUP BY 1
ORDER BY n DESC

The top 1000 labs account for 96.4% of total encounters.

df.head(1000)['n'].sum() / df['n'].sum()

b. Adding More Labs​

Komodo has ~7000 lab NPIs. Provider Demographics has ~16,000. We don't want to add 16,000 labs:

  • many are not valuable to add
  • that would add 3+ billion ROIDs
  • how can we identify labs currently missing that should be added?

Based on the current aproach, we know lab names that come up often, e.g. "Quest" and "Lab Corp". These two in particular are associated with 100s of NPIs. We represent these labs in Clear Rates at the state-level.

We can add manual text searches for these names, so they are included, even if their NPI is not in Komodo, e.g. (added in v1_2_6):

AND (
npi IN (
SELECT hco_1_npi
FROM labs_filtered
)
OR
clean_org_name LIKE 'Quest Diag%'
OR
clean_org_name LIKE 'Laboratory Corp%'
OR
clean_org_name LIKE 'Labcorp%'
)

Are there other important labs (high volume) that are not in Komodo and not Quest/LabCorp?

c. Reducing Code Set​

WITH
lab_util AS (
SELECT encounter_key
FROM tq_intermediate.external_komodo.medical_headers
WHERE hco_1_npi IN (
SELECT npi
FROM tq_production.reference_legacy.provider_demographics
WHERE taxonomy_code IN ('291U00000X', '293D00000X')
AND provider_type IN (
'Clinical Medical Laboratory',
'Physician Group Affiliated Entity',
'Physiological Laboratory',
'Hospital Affiliated Entity',
'Single/Multi-Specialty Physician Group'
)
AND npi_type = '2'
)
)
SELECT
'Professional' AS bill_type,
'HCPCS' AS billing_code_type,
procedure_code as billing_code,
count(*) AS utilization,
clr.rate * count(distinct encounter_key) AS revenue,
True as facility
FROM tq_intermediate.external_komodo.medical_service_lines AS msl
JOIN (
SELECT billing_code, rate
FROM tq_production.reference_internal.clinical_laboratory_reference_pricing
WHERE end_effective_date = (SELECT MAX(end_effective_date) FROM tq_production.reference_internal.clinical_laboratory_reference_pricing)
) AS clr ON msl.procedure_code = clr.billing_code
WHERE EXISTS (
SELECT 1
FROM lab_util AS e
WHERE msl.encounter_key = e.encounter_key
)
AND procedure_code IN (
SELECT billing_code
FROM tq_production.reference_internal.clinical_laboratory_reference_pricing
)
GROUP BY procedure_code, clr.rate
ORDER BY revenue DESC
LIMIT 750

The top 500 codes account for 99.0% of revenue. The top 750 codes account for 99.8% of revenue.

df['percent'] = df['n'] / df['n'].sum()
df.head(500)['percent'].sum()

d. Is "Name" an appropriate way to labs with common ownership?​

For example, we search for strings like "Quest Diag", "Laboratory Corp", "LabCorp" and clean them up to have a standard name. Then we group NPIs by this cleaned name and state. Each distinct name + state gets a unique provider_id.

  • Quest Diagnostics
  • LabCorp
  • Ameripath

e. Is State-Level Representation OK?​

Is State too large of a geography to consolidate the larger labs? It's confusing to list each NPI individually.

As an example, these NPIs: 1003174772, 1538144910, 1902809940, 1104829159, 1215914288, 1265417653, 1750368700 all represent LabCorp NPIs in North Carolina.

Should these be split up?