2. Data Collection
1. Payer MRF
CLD pulls Payer MRF data from Core Rates and Core Rates Physician Groups.
a. Filter and Map Core Rates to Rate Object Space
We join Core Rates to the Rate Object Space. In order to do this, we need to create a few columns:
network_idis converted to at least oneproduct_network_label, (the field used to identify networks in payer MRF Data)provider_idis obtained using the CLD provider spine, which maps provider_id to NPIs; and Core Rates are filtered using these NPIS- for hospitals, the provider_id in CLD ROS is consistent with the provider_id in Core Rates, since they both use TQ's unified_provider_table to map hospital NPIs to provider_id
bill_typeconsists of ip, op, and pro. This field does not exist in Core Rates, but can be derived using just billing_class and billing_code_typeip: billing_class is institutional and billing_code_type is one of MS-DRG or APR-DRGop: billing_class is institutional and billing_code_type is one of HCPCS or APCpro: billing_class is professional and billing_code_type is one of HCPCS or APC
billing_code_type: a couple billing code transformations occur when we pull Core Rates- APR-DRG -> MS-DRG: see components/transformations/drg_crosswalk.md
- MS-DRG -> APR-DRG: we convert in both directions
- APC -> HCPCS: each APC code is exploded to all the possible HCPCS for that APC code
b. Handle Multiple Rates
If there are multiple rates for a given payer, network, provider, code,
bill_type and it is due to different negotiated_type, we keep all the rates.
They will share the same rate object ID and each of these rates will be stored
in its own column, to be evaluated separately for accuracy.
If there are multiple rates due to modifiers, we try to identify the GLOBAL rate. Typically, the global rate will be stored with a NULL array or as the longest array.
The example below illustrates the hierarchy used to select a single rate. Here,
the "NULL" modifier would be preferred over the array containing TC, 26, 52, 53
| billing_code_modifier | hierarchy |
|---|---|
| NULL | 1 |
| TC, 26, 52, 53 | 2 |
| TC | 3 |
| 26 | 4 |
If there are NO global modifier rates identifiable, the max(rate) is selected. We believe the max would be most appropriate to identify the broad-network rate.
When querying from core_rates_physician_groups, we select the rate with the
highest provider_npi_count prior to selecting the max() rate.
2. Hospital MRF
a. Filter and Map Hospital Rates to the Rate Object Space (ROS)
We join Core Rates to the Rate Object Space. In order to do this, we need to create a few columns:
network_idis mapped to plan representations in the Hospital MRF data using the "plan bridge" (see in section below)provider_idis consistent between CLD and Hospital Ratesbill_typeconsists of ip, op, and pro. This field does not exist in Hospital Rates, but can be derived using just billing_class and billing_code_typeip: billing_class is institutional and billing_code_type is one of MS-DRG or APR-DRGop: billing_class is institutional and billing_code_type is one of HCPCS or APCpro: billing_class is professional and billing_code_type is one of HCPCS or APC
billing_code_type: a couple billing code transformations occur when we pull Hospital Rates- APR-DRG -> MS-DRG: see components/transformations/drg_crosswalk.md
- MS-DRG -> APR-DRG: we convert in both directions
- APC -> HCPCS: each APC code is exploded to all the possible HCPCS for that APC code
b. Plan Bridge
The plan_name field in hospital data is not standardized and cannot be
reliably used to identify the rate's network.
To map hospital rates to networks, we build a "plan bridge":
- Scope:
- The objective of the plan bridge is to identify the PPO or HMO network rates. We don't need to be able to distinguish various product names.
- Data Cleaning:
- We filter out "plan_name" values that indicate networks that are clearly not PPO or HMO (such as "indemnity", "international", or government plans like Medicare/Medicaid)
- The list of exclusions is available on this page. Search for "exclusions" and "payer_exclusions".
- Assumptions:
- If there is exactly one plan_name (e.g. "CIGNA"), we assume that this is a "catch-all" or "global" rate, shared across all of the products between the payer and provider
- If there is exactly one distinct rate, we assume that this is a "catch-all" or "global" rate, shared across all of the products between the payer and provider
c. Handle Multiple Rates
If there are multiple rates for a given payer, network, provider, code,
bill_type and it is due to different contract_methodology, we keep all the
rates. They will share the same rate object ID and each of these rates will be
stored in its own column, to be evaluated separately for accuracy.
If there are multiple rates due to modifiers or revenue codes, we select the rate with a NULL modifier and/or NULL revenue code. This exploratory analysis shows the prevelance of these NULL rates. If there are NO rates with NULL modifiers or revenue codes, the max(rate) is selected. We believe the max would be most appropriate to identify the broad-network rate.
Modifiers:
We try to identify the GLOBAL rate. Typically, the global rate will be stored with a NULL array or as the longest array.
The example below illustrates the hierarchy used to select a single rate. Here,
the "NULL" modifier would be preferred over the array containing TC, 26, 52, 53
| billing_code_modifier | hierarchy |
|---|---|
| NULL | 1 |
| TC, 26, 52, 53 | 2 |
| TC | 3 |
| 26 | 4 |
3. Gross Charges
Hospital MRF Gross Chages
The source table is hospital_rates, filtered to commercial payer class
Preprocessing
- Codes are normalized (e.g., DRG codes are zero-padded, APR-DRG split-parts are standardized).
- Bill type (Inpatient, Outpatient, Professional) is derived from billing code type and class.
Crosswalk for DRG Codes
- A crosswalk table maps MS-DRG to APR-DRG and vice versa.
- If a crosswalked code is not already present in the hospital rates, a synthetic row is added with the average charge for the original code.
Median Aggregations
- Hospital gross charges are reduced to provider-level rows using the highest posted value.
- Medians are calculated at:
- CBSA level
- State level
CCR Adjustments
When computing CBSA/state-level benchmarks, cost-report-based charges-relative-to-CBSA ratios can adjust gross charges:
- CBSA-level:
cbsa_ip_ratio_relativity - State-level:
state_ip_ratio_relativity
You can find these values at tq_intermediate.cld_utils.cost_report_charge_relativities_2025_04_03.
On the MRF (we'll never know for sure, but)
- For OP Surgery it gets tricky because of soft codes. I'd say typically OP surgery wouldn't have a charge rate, and if it does, it's probably based on some sort of claims estimate (which makes me think claim level). That could be a totally wrong assumption, though. There are probably ways to test this. If you took a MRF gross charge & compared it to the market benchmarks charges on a claim level vs line level, what does it come closer to?
- For OP non surgery i'd guess it's line level if it's on the MRF. Labs, imaging, etc. I doubt they are reporting claim level cases since these are generally hard coded.
- For IP MS-DRGs, it's claim level. In fact, hospitals don't typically have set charges for inpatient cases, so they likely won't have rates. If they do, it's almost certainly a claim based average.
On CBSA/State averages:
If we're using CBSA averages, we may want to think about incorporating medicare cost report factors. If the market is on average at $1000 for a charge, but the hospital is generally 20% above the market, we shouldn't just use $1000. I put this together last week (attached) that shows where hospitals are relative to their CBSA / State on charges.
Komodo Claims-Based Gross Charges
- Provider-level:
tq_intermediate.claims_benchmarks.claims_benchmarks_gross_charges_npi - CBSA-level:
tq_intermediate.claims_benchmarks.claims_benchmarks_gross_charges_cbsa - State-level:
tq_intermediate.claims_benchmarks.claims_benchmarks_gross_charges_state
Each source provides:
- Median billed units
- Median gross charges at both line-item and claim level
- Count of encounters
Aggregated Columns
For each level (provider, CBSA, state), the pipeline outputs:
- Komodo gross charge (final selected value based on standardization/adjustment logic)
- Count of encounters
CCR Adjustments
Dosage Standardization for Drugs (Codes with ASP Payment Limits)
Dosage standardization ensures comparability of posted or claims-derived charges for drugs with varying package sizes or doses.
ASP Payment Limits
- ASP payment limit: A benchmark unit price from CMS.
- ASP lower/upper bounds: Defined as payment_limit × config multipliers (e.g., 200%–2200%).
Standardization Formula
For hospital-posted MRF rates:
For Komodo claims-based rates:
Acceptance Criteria
- The standardized value must fall between ASP lower and upper bounds.
- If outside these bounds:
- Hospital rates are set to NULL.
- Komodo line-item rates are set to NULL.
- For claims with claim-level charges, these are used only for inpatient or surgical codes.
Output Metadata
- JSON objects record:
- Raw gross charge
- Parsed or median billed quantity
- ASP quantity and unit label