Technical One-Pager
This section serves to define concepts with precision to ensure data scientists are working with consistent logic and definitions.
1. Defining the Rate Object Space
Let be the set of all Rate Objects. Each Rate Object is defined as:
where:
- : A unique Rate Object ID.
- : Network ID (unique payer-network identifier).
- : Provider Group ID (unique set of providers sharing identical rates).
- : Code Object ID (e.g., HCPCS, MS-DRG, code-version, place of service, plus any relevant modifiers).
- : Time (month).
- : A set of rate values, each indexed by a methodology key .
Defining the Methodology Key
A methodology key (sometimes called a source for simplicity) represents a combination of:
- Data source (e.g., "Hospital MRF","Payer MRF", "Komodo")
- Contract methodology (e.g., "case rate", "per diem")
- Rate type (e.g., "dollar", "percent").
2. Rate Values
Inside each Rate Object , the rate values are:
For example:
- is the hospital’s percentage case rate for a given procedure.
- is Komodo’s allowed amount for a given procedure or code.
Multiple Source Entries
In the raw source data, there can be multiple entries for a specific methodology key . To produce a single rate value , define an aggregation function
This ensures that although data may come from multiple records, it is ultimately consolidated into a single numerical rate value associated with each methodology key .
3a. Transformations
Define transformations as functions acting on individual rate values:
e.g. APR-DRG crosswalk, percent -> dollar, per diem * expected days, ...
3b. Imputations
An imputation function is used to generate alternate sources for rate values:
where:
- may be missing or partially observed.
- represents covariates/parameters (e.g., provisions, payer-specific trends, geographic-trends etc.).
For instance, consider a provider-payer-network with inpatient (IP) rates for 25 DRG codes. Suppose 18 of these codes have rates that, when normalized by CMS weights, yield the same base value :
This suggests a base rate structure: the observed IP rates are proportionally scaled by CMS DRG weights.
Thus, for the missing DRG codes, we can impute their rate values using:
4. Accuracy Metrics
This score assesses reliability and validity of the rate value, taking into account various factors such as data lineage, plausibility, benchmarks, or alignment with trends.
Compute for each in {}
5. Canonical Rate Selection within Sub-Version
Ties are broken with the following hierarchy.
6. Canonical Rates Across Sub-Versions (Recent Historic with Threshold)
Let be array of recent sub-versions (e.g. ["2024_11", "2024_10", ...], where current sub-version is "2024_12")
Select the current or historic rate with the highest score.
Output:
Then, the CLD is : the rate object space, enhanced with analysis columns. So each Rate Object is defined as: