Skip to main content
Version: 2.1

Technical One-Pager

This section serves to define concepts with precision to ensure data scientists are working with consistent logic and definitions.


1. Defining the Rate Object Space

Let Ω\Omega be the set of all Rate Objects. Each Rate Object OΩO \in \Omega is defined as:

O=(o,n,g,c,τ,{vs}),O = \bigl(o,\, n,\, g,\, c,\, \tau,\, \{\,v_s\}\bigr),

where:

  • oo: A unique Rate Object ID.
  • nNn \in N: Network ID (unique payer-network identifier).
  • gGg \in G: Provider Group ID (unique set of providers sharing identical rates).
  • cCc \in C: Code Object ID (e.g., HCPCS, MS-DRG, code-version, place of service, plus any relevant modifiers).
  • τT\tau \in T: Time (month).
  • {vs}\{v_s\}: A set of rate values, each indexed by a methodology key ss.
note

Defining the Methodology Key ss

A methodology key ss (sometimes called a source for simplicity) represents a combination of:

  1. Data source (e.g., "Hospital MRF","Payer MRF", "Komodo")
  2. Contract methodology (e.g., "case rate", "per diem")
  3. Rate type (e.g., "dollar", "percent").

2. Rate Values

Inside each Rate Object OO, the rate values are:

vs=for each methodology key sS.v_s = \quad \text{for each methodology key } s \in S.

For example:

  • vhospital-mrf-case-rate-percentv_{\text{hospital-mrf-case-rate-percent}} is the hospital’s percentage case rate for a given procedure.
  • vkomodo-allowed-amountv_{\text{komodo-allowed-amount}} is Komodo’s allowed amount for a given procedure or code.
note

Multiple Source Entries

In the raw source data, there can be multiple entries for a specific methodology key {r1,r2,,rk}\{r_1, r_2, \ldots, r_k\}. To produce a single rate value vsv_s, define an aggregation function

vs=Agg(r1,r2,,rk).v_s = \text{Agg}(r_1,\, r_2,\, \ldots,\, r_k).

This ensures that although data may come from multiple records, it is ultimately consolidated into a single numerical rate value vsv_s associated with each methodology key ss.


3a. Transformations

Define transformations as functions acting on individual rate values:

t:vs    vst: v_s \;\to\; v_s'

e.g. APR-DRG crosswalk, percent -> dollar, per diem * expected days, ...

3b. Imputations

An imputation function is used to generate alternate sources for rate values:

vi=Impute({vs};θ),v_{i} = \text{Impute}\bigl(\{v_s\};\,\theta\bigr),

where:

  • {vs}\{v_s\} may be missing or partially observed.
  • θ\theta represents covariates/parameters (e.g., provisions, payer-specific trends, geographic-trends etc.).

For instance, consider a provider-payer-network with inpatient (IP) rates for 25 DRG codes. Suppose 18 of these codes have rates that, when normalized by CMS weights, yield the same base value XX:

vs(1)w1vs(2)w2vs(18)w18X\frac{v_s^{(1)}}{w_1} \approx \frac{v_s^{(2)}}{w_2} \approx \cdots \approx \frac{v_s^{(18)}}{w_{18}} \approx X

This suggests a base rate structure: the observed IP rates are proportionally scaled by CMS DRG weights.

Thus, for the missing DRG codes, we can impute their rate values using:

vi=Xwiv_i = X \cdot w_i

4. Accuracy Metrics

A(v)=f(data quality,benchmarks,historical fit,).A(v) = f\bigl(\text{data quality},\, \text{benchmarks},\, \text{historical fit},\, \ldots\bigr).

This score assesses reliability and validity of the rate value, taking into account various factors such as data lineage, plausibility, benchmarks, or alignment with trends.

Compute for each vv in {vs,vs,vi{v_s, v_s', v_i}} (s,i)\forall \left( s,i \right)


5. Canonical Rate Selection within Sub-Version

vbest=argmaxv{vs,vs,vi}A(v).v_{\text{best}} = \arg\max_{v \,\in\, \{v_s,\, v_s',\, v_i\}} \, A(v).

Ties are broken with the following hierarchy.


6. Canonical Rates Across Sub-Versions (Recent Historic with Threshold)

Let TT be array of recent sub-versions (e.g. ["2024_11", "2024_10", ...], where current sub-version is "2024_12")

Select the current or historic rate with the highest score.

vbest=argmaxv{vbest,t},tTA(v).v_{\text{best}}' = \arg\max_{v \,\in\, \{v_{\text{best},t}\},\forall t \in T} \, A(v).

Output:

Then, the CLD is Ω\Omega': the rate object space, enhanced with analysis columns. So each Rate Object OΩO' \in \Omega' is defined as:

O=(o,n,g,c,τ,vbest)O' = \bigl(o, n, g, c, \tau, v_{\text{best}}'\bigr)