Providers
Data
Data Location in Trino
Final table: redshift.reference.unified_provider_table
Intermediate table prior to duplicate removal: tq_dev.spines.spines_provider_intermediate_upt_initial
Refresh Cadence
Weekly on Sundays
Description
A provider spine assigns standardized provider IDs to the myriad ways providers can be represented (e.g. different data sources (TQ, Komodo, CMS), different provider ID methods (proprietary ID, Type 1 NPI, Type 2 NPI, EIN, Medicare ID)) and captures their associations (horizontal and vertical).
Since this version only includes hopsitals, we can leverage the existing
unified_provider_table (UPT) as a "provider spine". This table maps primary and
ancillary hospital NPIs to a provider ID.
Work completed for v0.1 CLD includes removal of edge cases causing duplicate roll-up values in UPT.
Ancillary hospital NPIs are pulled from the provider_demographics table and are collected in the payer_provider_npi field of the UPT.
The provider_id and provider_npi field are from the hospital data.
There are some edge cases (~250) where the same payer_provider_npi value rolls up to two different provider_id values in the UPT.
While this is a minority of cases, they were removed to ensure no double-counting in downstream logic.
Note: there are a small number of "legitimate duplicates" that are maintained in the data and may require an upstream fix, see deuplication logic case type 1 for more.
Further refinement of hospital entities will be captured in the full "provider spine" overhaul, and we will also attempt to take into account whether the "roll ups" make sense in the rates data - i.e. do all associated NPIs tend to have the same rate, indicating that they all contract together?
UPT Deduplication Logic
The duplicates fall into one of three buckets:
- Two different
provider_idvalues have the sameprovider_npi value → this means that the associated NPI in hospital data is duplicated acrossprovider_idvalues, leading to legitimate cases of the samepayer_provider_npirolling up to multipleprovider_id` values
Examples:
provider_idvalues 3423 and 8070 both haveprovider_npi1013969179provider_idvalues 6845 and 6847 both haveprovider_npi1144266024
Resolution:
None → this originates in hospital data and the dual-rollups are legitimate because the same NPI is being used to represent two different provider_id values.
Requires an upstream fix or if two hospitals legitimately share the same NPI, then downstream logic should handle how to pick one over the other or use both.
As of the initial deduplication, this type of duplicate accounts for 14 of the NPIs that roll up to 2 different provider_id values
- An issue with the
provider_idlevel provider because one of the hospitals is closed or the hospital was duplicated
Examples:
- One hospital closed:
payer_provider_npi1003908443 rolls up toprovider_id2096 (npi1336458009 for Sister Emmanuel Hospital) andprovider_id73 (npi1306825351 for for Mercy Miami Hospital) Sister Emmanuel Hospital closed in 2018 and now HCA Mercy Miami is the active hospital Sister Emmanuel Hospital closing | HCA Florida Mercy Hospital - Hospital is duplicated:
payer_provider_npi1255654430 rolls up toprovider_id117 andprovider_id2822 which both havenpi1871919381 for OSF Saint Luke Medical Center and OSF Saint Luke Medical Center - Kewanee
Resolution:
Inner join to the glue.hospital_rates.hospital_provider table to only retain provider_id values that are active in the Hospital Data
As of the initial deduplication, this accounts for 125 of the NPIs that roll up to 2 different provider_id values
- The two different
provider_idhospitals are related or share the same location, causing multiple matches with a singlepayer_provider_npivalue
Examples: This typically happens with a General Acute Care Hospital + Long Term or Childrens or Rehabilitation Hospital combination where the hospitals are co-located but are considered different entities.
Resolution: Use the following logic steps
- If
match_reason=direct_npi_namethen prioritize this match over other match reasons - Pull in
hospital_typefromglue.hospital_data.hospital_provider- If the
hospital_type= 'Childrens Hospital' and thepayer_provider_npivalue hastaxonomy_grouping= ‘Hospitals’ andtaxonomy_specialization= ‘Children’ then that hospital gets the match - If the
payer_provider_npivalue hastaxonomy_grouping= ‘Hospitals’ andtaxonomy_specializationis NOT ‘Children’ AND the regex '%children%' has a match in theprovider_namevalue, then the otherprovider_idgets the match (Exampleprovider_idvalues 6295 + 6357 andpayer_provider_npi1023243631 for Loma Linda)
- If the
- Manual revisions (these were not turned into logic b/c it is not worth the effort in an interim solution)
- If the
payer_provider_npivalue is in '1073815676' or '1316289416', these NPIs should be manually excluded because they do not accurately represent hospitals - If the tie is between
provider_idvalues 830 and 914, theprovider_id914 should “win” the match (Manhattan P.C.) - If the tie is between
provider_idvalues 1003 and 1103, thenprovider_id1003 (Trinitas Regional Medical Center) should win the match - If the
payer_provider_npi= 1922097369, the "winning"provider_idis 4742 (St. Francis Medical Center) - If the
payer_provider_npi= 1598996480, the "winning"clean_org_nameis "Corewell Health Helen Devos Childrens Hospital" (This is actually a duplicate due toprovider_demographicsso choosing a "winning" row of data but not a "winning"provider_id) - If the
payer_provider_npi= 1174679625, the "winning"provider_idis 4648 (Cornerstone Hospital Of Bossier City) - If the
payer_provider_npi= 1679686455, the "winning"provider_idis 1893 (Roosevelt Warm Springs) - If the
payer_provider_npi= 1750384368, the "winning"provider_idis 2016 (Roosevelt Warm Springs Rehab Hospital) - If the
payer_provider_npi= 1982090486, the "winning"provider_idis 2016 (Roosevelt Warm Springs Rehab Hospital)
- If the
- For all other cases, if the provider_type = ‘Short Term Acute Care Hospital' that hospital “wins” the match. This also covers the following sub-cases, which are listed for documentation purposes but don’t need to be accounted for with separate logic:
- If only one hospital_type is populated, the provider_id with the populated value “wins” the duplicate - the payer_provider_npi match is removed from the other provider_id value
- If the taxonomy_grouping value = 'Hospital Units' match the payer_provider_npi only to the "Short Term Acute Care Hospital" hospital_type
The DAG updates include a QA check to ensure duplicates have been appropriately removed - if this does not occur or a net-new duplicate arises that is not addressed by the logic above, the code will raise an exception so the DAG will fail, and we can further investigate.