Skip to main content
Version: Canary - 2.3 🚧

Providers

SchemaMethodology

Data

Data Location in Trino

Final table: redshift.reference.unified_provider_table
Intermediate table prior to duplicate removal: tq_dev.spines.spines_provider_intermediate_upt_initial

Refresh Cadence

Weekly on Sundays

Description

A provider spine assigns standardized provider IDs to the myriad ways providers can be represented (e.g. different data sources (TQ, Komodo, CMS), different provider ID methods (proprietary ID, Type 1 NPI, Type 2 NPI, EIN, Medicare ID)) and captures their associations (horizontal and vertical).

Since this version only includes hopsitals, we can leverage the existing unified_provider_table (UPT) as a "provider spine". This table maps primary and ancillary hospital NPIs to a provider ID.

Work completed for v0.1 CLD includes removal of edge cases causing duplicate roll-up values in UPT. Ancillary hospital NPIs are pulled from the provider_demographics table and are collected in the payer_provider_npi field of the UPT. The provider_id and provider_npi field are from the hospital data. There are some edge cases (~250) where the same payer_provider_npi value rolls up to two different provider_id values in the UPT. While this is a minority of cases, they were removed to ensure no double-counting in downstream logic. Note: there are a small number of "legitimate duplicates" that are maintained in the data and may require an upstream fix, see deuplication logic case type 1 for more.

Further refinement of hospital entities will be captured in the full "provider spine" overhaul, and we will also attempt to take into account whether the "roll ups" make sense in the rates data - i.e. do all associated NPIs tend to have the same rate, indicating that they all contract together?

UPT Deduplication Logic

The duplicates fall into one of three buckets:

  1. Two different provider_id values have the same provider_npi value → this means that the associated NPI in hospital data is duplicated across provider_idvalues, leading to legitimate cases of the samepayer_provider_npirolling up to multipleprovider_id` values

Examples:

  • provider_id values 3423 and 8070 both have provider_npi 1013969179
  • provider_id values 6845 and 6847 both have provider_npi 1144266024

Resolution: None → this originates in hospital data and the dual-rollups are legitimate because the same NPI is being used to represent two different provider_id values. Requires an upstream fix or if two hospitals legitimately share the same NPI, then downstream logic should handle how to pick one over the other or use both. As of the initial deduplication, this type of duplicate accounts for 14 of the NPIs that roll up to 2 different provider_id values

  1. An issue with the provider_id level provider because one of the hospitals is closed or the hospital was duplicated

Examples:

  • One hospital closed: payer_provider_npi 1003908443 rolls up to provider_id 2096 (npi 1336458009 for Sister Emmanuel Hospital) and provider_id 73 (npi 1306825351 for for Mercy Miami Hospital) Sister Emmanuel Hospital closed in 2018 and now HCA Mercy Miami is the active hospital Sister Emmanuel Hospital closing | HCA Florida Mercy Hospital
  • Hospital is duplicated: payer_provider_npi 1255654430 rolls up to provider_id 117 and provider_id 2822 which both have npi 1871919381 for OSF Saint Luke Medical Center and OSF Saint Luke Medical Center - Kewanee

Resolution: Inner join to the glue.hospital_rates.hospital_provider table to only retain provider_id values that are active in the Hospital Data As of the initial deduplication, this accounts for 125 of the NPIs that roll up to 2 different provider_id values

  1. The two different provider_id hospitals are related or share the same location, causing multiple matches with a single payer_provider_npi value

Examples: This typically happens with a General Acute Care Hospital + Long Term or Childrens or Rehabilitation Hospital combination where the hospitals are co-located but are considered different entities.

Resolution: Use the following logic steps

  • If match_reason = direct_npi_name then prioritize this match over other match reasons
  • Pull in hospital_type from glue.hospital_data.hospital_provider
    • If the hospital_type = 'Childrens Hospital' and the payer_provider_npi value has taxonomy_grouping = ‘Hospitals’ and taxonomy_specialization = ‘Children’ then that hospital gets the match
    • If the payer_provider_npi value has taxonomy_grouping = ‘Hospitals’ and taxonomy_specialization is NOT ‘Children’ AND the regex '%children%' has a match in the provider_name value, then the other provider_id gets the match (Example provider_id values 6295 + 6357 and payer_provider_npi 1023243631 for Loma Linda)
  • Manual revisions (these were not turned into logic b/c it is not worth the effort in an interim solution)
    • If the payer_provider_npi value is in '1073815676' or '1316289416', these NPIs should be manually excluded because they do not accurately represent hospitals
    • If the tie is between provider_id values 830 and 914, the provider_id 914 should “win” the match (Manhattan P.C.)
    • If the tie is between provider_id values 1003 and 1103, then provider_id 1003 (Trinitas Regional Medical Center) should win the match
    • If the payer_provider_npi = 1922097369, the "winning" provider_id is 4742 (St. Francis Medical Center)
    • If the payer_provider_npi = 1598996480, the "winning" clean_org_name is "Corewell Health Helen Devos Childrens Hospital" (This is actually a duplicate due to provider_demographics so choosing a "winning" row of data but not a "winning" provider_id)
    • If the payer_provider_npi = 1174679625, the "winning" provider_id is 4648 (Cornerstone Hospital Of Bossier City)
    • If the payer_provider_npi = 1679686455, the "winning" provider_id is 1893 (Roosevelt Warm Springs)
    • If the payer_provider_npi = 1750384368, the "winning" provider_id is 2016 (Roosevelt Warm Springs Rehab Hospital)
    • If the payer_provider_npi = 1982090486, the "winning" provider_id is 2016 (Roosevelt Warm Springs Rehab Hospital)
  • For all other cases, if the provider_type = ‘Short Term Acute Care Hospital' that hospital “wins” the match. This also covers the following sub-cases, which are listed for documentation purposes but don’t need to be accounted for with separate logic:
    • If only one hospital_type is populated, the provider_id with the populated value “wins” the duplicate - the payer_provider_npi match is removed from the other provider_id value
    • If the taxonomy_grouping value = 'Hospital Units' match the payer_provider_npi only to the "Short Term Acute Care Hospital" hospital_type

The DAG updates include a QA check to ensure duplicates have been appropriately removed - if this does not occur or a net-new duplicate arises that is not addressed by the logic above, the code will raise an exception so the DAG will fail, and we can further investigate.