Skip to main content
Version: 2.1

PG Graph

Quickstart​

  • Explore the graph database here:
  • Output Tables are here:
    • tq_dev.internal_dev_csong_sandbox.prod_pac_connections
      • each row is a PG, columns contain TINs, HCOs, and HCPs
    • tq_dev.internal_dev_csong_sandbox.prod_hs_pac_links
      • each row is a HS; columns contain PGs and intermediary HCOs/TINs
    • tq_dev.internal_dev_csong_sandbox.prod_hcp_best_links
      • each row is a Physician, columns contain primary HCO and primary PG

current outputs use thresholds:

thresholds = {
'jaccard': 0.3,
'min_containment': 0.5,
'intersection_count': 50,
}

Goals:​

Primary goals are:

  1. add more physician groups to Clear Rates
  2. set up scalable methodology to add hospital unaffiliated physician groups
  3. support Network Check, which frequently needs to "reprice" individual physician claims

To do this, we need:

  1. Create a table of Physician Groups with:
    • a name
    • a list of TINs
    • a list of NPIs
  2. Create a table of Health System + State pairs with:
    • a list of affiliated physician groups
  3. Create a table of individual physicians with:
    • primary HCO
    • primary physician group

#1 and #2 are required for Clear Rates. #3 provides a bridge that Network Check can use (i.e. given individual physician, which group's rates should it search?)

Background: What is PECOS?​

When an individual physician joins a physician group, they submit a form to CMS letting them know which organization to submit payment. So instead of sending payment to the individual, the payment is "reassigned" to the physician group.

PECOS maintains a public reassignments database, which we can use to identify individual physician and physician group relationships.

A PAC ID represents a physician group.

Inclusion Criteria: What Data are we Using?​

  1. PECOS
  2. Payer Data: NPI-TIN pairings
    • Only store if NPI appears in PECOS data as an individual physician
    • At least 4 payers have posted the TIN-NPI pair
  3. Komodo Data: HCO-HCP pairings
    • Only store if HCP appears in PECOS data as an individual physician
    • All claims after 2023-01-01
    • Store # of encounters for each pair
  4. Hospital Data: Health System - HCO pairings

Data Loading Queries:

Why a Graph Database?​

Below is a simplified example of the complicated data relationships we are working with.

  • The top two physicians in the black box:
    • are affiliated hospital 1 and hospital 2, based on claims
    • are affiliated with TIN 1 based on MRF data
    • are affiliated with PAC 1 AND PAC 2 based on PECOS data
  • The bottom two physicians in the red box:
    • are affiliated with hospital 2 and hospital 3, based on claims
    • are affiliated with TIN 2 based on MRF data
    • are affiliated with PAC 2 based on PECOS

alt text

Then, we can construct edges that compare organizations' similarities based on physician affiliations:

alt text

How would we use this to accomplish our goals?

  • For each Health System, we can look for Physician Groups that are N degrees away, e.g.:
    • [Health System] -> [Hospital] -> Physician Group
    • [Health System] -> [Hospital] -> [TIN] -> Physician Group
  • For each physician, we can look for the primary HCO (e.g. most encounters) then the physician group that is most closely associated with that HCO

Methodology​

Load Graph Database​

Nodes​

  • PAC (Physician Groups) (pac)
  • Individual physicians (hcp)
  • Healthcare organizations (hco)
  • Tax IDs (tin)
  • Health systems (hs)

Raw Data Edges​

  • PAC ↔ HCP
  • HCO ↔ HCP
    • number of encounters
  • TIN ↔ HCP
  • Health system ↔ HCO

Inferred Edges​

  • PAC ↔ HCO
  • TIN ↔ HCO
  • PAC ↔ TIN

We infer these organization-similarity edges by comparing their arrays of affiliated NPIS.

For each pair of organizations (PAC, HCO, TIN), we compute similarity measures over their sets of affiliated NPIs. Specifically:

  • Jaccard Similarity: Measures the proportion of shared NPIs relative to the union of their NPIs.

Jaccard(A,B)=∣A∩B∣∣A∪B∣\text{Jaccard}(A, B) = \frac{|A \cap B|}{|A \cup B|}

Example:
If PAC A has NPIs {1, 2, 3, 4} and HCO B has {3, 4, 5, 6}, then:

∣A∩B∣=2,∣A∪B∣=6|A \cap B| = 2, \quad |A \cup B| = 6

Jaccard(A,B)=26=0.3‾\text{Jaccard}(A, B) = \frac{2}{6} = 0.\overline{3}


  • Containment Measures: Fraction of shared NPIs relative to each entity's total NPIs.

ContainmentA=∣A∩B∣∣A∣\text{Containment}_{A} = \frac{|A \cap B|}{|A|}

Example:

∣A∣=5,∣B∣=4,∣A∩B∣=2|A| = 5, \quad |B| = 4, \quad |A \cap B| = 2

ContainmentA=25=0.4\text{Containment}_{A} = \frac{2}{5} = 0.4

ContainmentB=24=0.5\text{Containment}_{B} = \frac{2}{4} = 0.5

  • Maximum Containment: The maximum of the two containment scores.
  • Intersection Counts: The raw count of shared NPIs.

We apply thresholds on these measures (e.g., minimum Jaccard or minimum containment) to retain only meaningful similarity edges.