Version: 2.1

Technical One-Pager

This section serves to define concepts with precision to ensure data scientists are working with consistent logic and definitions.

1. Defining the Rate Object Space

Let $\Omega$ be the set of all Rate Objects. Each Rate Object $O \in \Omega$ is defined as:

O = \bigl(o,\, n,\, g,\, c,\, \tau,\, \{\,v_s\}\bigr),

where:

$o$ : A unique Rate Object ID.
$n \in N$ : Network ID (unique payer-network identifier).
$g \in G$ : Provider Group ID (unique set of providers sharing identical rates).
$c \in C$ : Code Object ID (e.g., HCPCS, MS-DRG, code-version, place of service, plus any relevant modifiers).
$\tau \in T$ : Time (month).
$\{v_s\}$ : A set of rate values, each indexed by a methodology key $s$ .

note

Defining the Methodology Key $s$

A methodology key $s$ (sometimes called a source for simplicity) represents a combination of:

Data source (e.g., "Hospital MRF","Payer MRF", "Komodo")
Contract methodology (e.g., "case rate", "per diem")
Rate type (e.g., "dollar", "percent").

2. Rate Values

Inside each Rate Object $O$ , the rate values are:

v_s = \quad \text{for each methodology key } s \in S.

For example:

$v_{\text{hospital-mrf-case-rate-percent}}$ is the hospital’s percentage case rate for a given procedure.
$v_{\text{komodo-allowed-amount}}$ is Komodo’s allowed amount for a given procedure or code.

note

Multiple Source Entries

In the raw source data, there can be multiple entries for a specific methodology key $\{r_1, r_2, \ldots, r_k\}$ . To produce a single rate value $v_s$ , define an aggregation function

v_s = \text{Agg}(r_1,\, r_2,\, \ldots,\, r_k).

This ensures that although data may come from multiple records, it is ultimately consolidated into a single numerical rate value $v_s$ associated with each methodology key $s$ .

3a. Transformations

Define transformations as functions acting on individual rate values:

t: v_s \;\to\; v_s'

e.g. APR-DRG crosswalk, percent -> dollar, per diem * expected days, ...

3b. Imputations

An imputation function is used to generate alternate sources for rate values:

v_{i} = \text{Impute}\bigl(\{v_s\};\,\theta\bigr),

where:

$\{v_s\}$ may be missing or partially observed.
$\theta$ represents covariates/parameters (e.g., provisions, payer-specific trends, geographic-trends etc.).

For instance, consider a provider-payer-network with inpatient (IP) rates for 25 DRG codes. Suppose 18 of these codes have rates that, when normalized by CMS weights, yield the same base value $X$ :

\frac{v_s^{(1)}}{w_1} \approx \frac{v_s^{(2)}}{w_2} \approx \cdots \approx \frac{v_s^{(18)}}{w_{18}} \approx X

This suggests a base rate structure: the observed IP rates are proportionally scaled by CMS DRG weights.

Thus, for the missing DRG codes, we can impute their rate values using:

v_i = X \cdot w_i

4. Accuracy Metrics

A(v) = f\bigl(\text{data quality},\, \text{benchmarks},\, \text{historical fit},\, \ldots\bigr).

This score assesses reliability and validity of the rate value, taking into account various factors such as data lineage, plausibility, benchmarks, or alignment with trends.

Compute for each $v$ in { ${v_s, v_s', v_i}$ } $\forall \left( s,i \right)$

5. Canonical Rate Selection within Sub-Version

v_{\text{best}} = \arg\max_{v \,\in\, \{v_s,\, v_s',\, v_i\}} \, A(v).

Ties are broken with the following hierarchy.

6. Canonical Rates Across Sub-Versions (Recent Historic with Threshold)

Let $T$ be array of recent sub-versions (e.g. ["2024_11", "2024_10", ...], where current sub-version is "2024_12")

Select the current or historic rate with the highest score.

v_{\text{best}}' = \arg\max_{v \,\in\, \{v_{\text{best},t}\},\forall t \in T} \, A(v).

Output:

Then, the CLD is $\Omega'$ : the rate object space, enhanced with analysis columns. So each Rate Object $O' \in \Omega'$ is defined as:

O' = \bigl(o, n, g, c, \tau, v_{\text{best}}'\bigr)

1. Defining the Rate Object Space​

2. Rate Values​

3a. Transformations​

3b. Imputations​

4. Accuracy Metrics​

5. Canonical Rate Selection within Sub-Version​

6. Canonical Rates Across Sub-Versions (Recent Historic with Threshold)​

Output:​

On this page: