Multiple Rates: Hospital Data
Background:
When mapping raw data to "rate objects", there may be multiple raw rates for the particular payer-provider-code. This is often due to different modifiers, contract methodologies or revenue codes.
This analysis explores:
- frequencies of multiple modifiers, contract methodologies, or revenue codes
- how often modifiers, contract methodologies, and revenue codes are NULL
Findings:
- A single payer-provider-code is reported with different contract methodologies > 16% of the time.
- For MS-DRG payer-provider-codes, there exists at least one rate with a NULL revenue code 97.6% of the time and at least one rate with a NULL modifier 99.9% of the time (as expected).
- For HCPCS payer-provider-codes, there exists at least one rate with a NULL revenue code 61.3% of the time and at least one rate with a NULL modifier 98.5% of the time
In general, the NULL values likely represent the "base" case and are desirable, although there may be exceptions.
CLD Implications:
These results help to inform how the rate object should be defined and how much effort we want to put into carefully mapping raw rate values to rate objects. As examples:
- Contract Methodology: Unless we have strong provisions coverage, we should not try to determine the "correct" contract methodology, since multiple rates are so common in this field. Further, there are few distinct values of contract methodology (case rate, percentage of total, fee schedule, other, per diem, NULL). So casting these values wide in the CLD (a column for each contract methodology) is feasible.
- For MS-DRG, we can filter to rates where modifier and revenue code are NULL. We can investigate why 2.5% of payer-provider-codes don't have a NULL revenue code -- sometimes it's because they are "nursery" codes and sometimes the provider simply reports the same rate copied 20+ times for each revenue code from 100-159.
- For HCPCS:
- modifiers: We can filter to rates where modifier is NULL in the general case, and filter to specific modifiers as needed.
- revenue codes: A payer-provider-code has 0 revenue codes 56% of the time and at most one revenue code 90% of the time. Further investigation would be helpful here -- perhaps we can prioritize surgery codes with 0360, 0361, 0369, 0481, 0490, 0499, 0750 and 0790.
Methodology​
Randomly sample 100k rates from hospital data. Then group by payer, provider, code and compute:
- The distinct counts of 'contract_methodology', 'billing_code_modifiers', 'revenue_code'
- The count of payer-provider-code where there is a NULL value for 'contract_methodology', 'billing_code_modifiers', 'revenue_code'
Click to see code
# %%
from IPython import get_ipython
if get_ipython() is not None:
get_ipython().run_line_magic("load_ext", "autoreload")
get_ipython().run_line_magic("autoreload", "2")
import pandas as pd
from jinja2 import Template
from tqds.sql import db, sql
pd.options.display.max_columns = None
pd.options.display.max_rows = 500
trino_conn = db.TrinoClient().conn
redshift_conn = db.RedshiftClient().conn
query = sql.Query(TEMPLATE_DIR="../sql")
# %%
def get_sample(_type='MS-DRG'):
if _type == "MS-DRG":
_filter = "WHERE billing_code_type = 'MS-DRG'"
elif _type == "HCPCS":
_filter = """
WHERE billing_code_type = 'HCPCS'
"""
q = """
WITH
df AS (
SELECT
provider_id
, payer_id
, billing_code_type
, billing_code
{% for column in ['billing_class', 'setting', 'contract_methodology', 'billing_code_modifiers', 'revenue_code'] %}
, COUNT(DISTINCT NULLIF({{ column }}, '')) AS n_{{ column }}
{% endfor %}
{% for column in ['contract_methodology', 'billing_code_modifiers', 'revenue_code'] %}
, SUM(CASE WHEN NULLIF({{ column }}, '') IS NULL THEN 1 ELSE 0 END) AS n_null_{{ column }}
{% endfor %}
FROM glue.hospital_data.hospital_rates
{{_filter}}
AND payer_class_name = 'Commercial'
AND (
negotiated_dollar > 0
OR negotiated_percentage > 0
)
GROUP BY 1,2,3,4
)
SELECT *
FROM df
ORDER BY random()
LIMIT 100000
"""
query = Template(q).render(_filter=_filter)
df = pd.read_sql(query, con=trino_conn)
return df
# %%
distinct_cols = [
"n_contract_methodology",
"n_billing_code_modifiers",
"n_revenue_code",
]
null_cols = [
"n_null_contract_methodology",
"n_null_billing_code_modifiers",
"n_null_revenue_code",
]
# %%
df = get_sample(_type="MS-DRG")
for col in distinct_cols:
tmp = df[col].astype(str).copy()
tmp[tmp.astype(int) > 3] = "4+"
table = tmp.value_counts() / len(tmp)
print(table.to_markdown())
for col in null_cols:
tmp = df[col].astype(str).copy()
tmp[tmp.astype(int) > 0] = "1+"
table = tmp.value_counts() / len(tmp)
print(table.to_markdown())
# %%
df = get_sample(_type="HCPCS")
for col in distinct_cols:
tmp = df[col].astype(str).copy()
tmp[tmp.astype(int) > 3] = "4+"
table = tmp.value_counts() / len(tmp)
print(table.to_markdown())
for col in null_cols:
tmp = df[col].astype(str).copy()
tmp[tmp.astype(int) > 0] = "1+"
table = tmp.value_counts() / len(tmp)
print(table.to_markdown())
MS-DRG​
Distinct Counts​
16% of the time, there are multiple contract methodologies
| n_contract_methodology | count |
|---|---|
| 1 | 0.78608 |
| 2 | 0.12123 |
| 0 | 0.0529 |
| 3 | 0.03009 |
| 4+ | 0.0097 |
we'd expect n_billing_code_modifiers=0 to be 100%, so the non-zeros are likely
errors in hospital data
| n_billing_code_modifiers | count |
|---|---|
| 0 | 0.99857 |
| 1 | 0.00141 |
| 2 | 1e-05 |
| 3 | 1e-05 |
| n_revenue_code | count |
|---|---|
| 0 | 0.97635 |
| 4+ | 0.01987 |
| 1 | 0.00359 |
| 3 | 0.00016 |
| 2 | 3e-05 |
Count of NULLs​
0 indicates the count of codes where there are NO rates with a NULL value in
that column
(e.g. 94.5% of payer-provider-codes have NO rates with a NULL contract_methodology)
| n_null_contract_methodology | count |
|---|---|
| 0 | 0.94537 |
| 1+ | 0.05463 |
| n_null_billing_code_modifiers | count |
|---|---|
| 1+ | 0.99869 |
| 0 | 0.00131 |
| n_null_revenue_code | count |
|---|---|
| 1+ | 0.98859 |
| 0 | 0.01141 |
HCPCS​
Distinct Counts​
21% of the time, there are multiple contract methodologies
| n_contract_methodology | count |
|---|---|
| 1 | 0.79152 |
| 2 | 0.1229 |
| 0 | 0.072 |
| 3 | 0.01255 |
| 4+ | 0.00103 |
| n_billing_code_modifiers | count |
|---|---|
| 0 | 0.95787 |
| 1 | 0.02807 |
| 2 | 0.00918 |
| 3 | 0.00258 |
| 4+ | 0.0023 |
| n_revenue_code | count |
|---|---|
| 0 | 0.56202 |
| 1 | 0.34308 |
| 4+ | 0.05288 |
| 2 | 0.03422 |
| 3 | 0.0078 |
Count of NULLs​
| n_null_contract_methodology | count |
|---|---|
| 0 | 0.92447 |
| 1+ | 0.07553 |
| n_null_billing_code_modifiers | count |
|---|---|
| 1+ | 0.98466 |
| 0 | 0.01534 |
NULLs are not as frequent with HCPCS revenue codes. There are ~39% of provider-payer-codes with NO rates that have a NULL revenue code.
| n_null_revenue_code | count |
|---|---|
| 1+ | 0.61308 |
| 0 | 0.38692 |