Skip to main content
Version: 2.1

Code Crosswalks

Introduction​

This document outlines our approach for constructing a description-based crosswalk between MS-DRGs and APR-DRGs

Stats / Coverage

  • Mapped MS-DRGs: 633 (83% total MS-DRGs)
  • Mapped base APR-DRGs (pre-SOI): 258 (82% total base APR-DRGs)
  • Mapped APR-DRGs (post-SOI): 1024 (77% total post-SOI APR-DRGs)
  • Count of MS-DRGs that map to 1, 2, 3, 4, etc. APR-DRGs (pre-SOI)
    • e.g. in the table below, 413 MS-DRGs map to 1 pre-SOI APR-DRG
    • e.g. 154 MS-DRGs map to 2 pre-SOI APR-DRGs
    Num. APRDRGs Mapped toTotal MS-DRGs
    1413
    2154
    345
    414
    56
    61
  • Count of APR-DRGs (pre-SOI) that map to 1, 2, 3, 4, etc. MS-DRGs
    • e.g. in the table below, 29 pre-SOI APR-DRGs map to 1 MS-DRG
    Num. MS-DRGs Mapped toTotal APR-DRGs
    130
    251
    3104
    421
    522
    619
    86
    91
    113
    121

Known Gaps / Areas for Improvement

  1. Newborn Services -
    1. APR-DRGs classify neonates based on age (under 29 days), regardless of principal diagnosis. In contrast, MS-DRGs rely on the principal diagnosis to place patients in neonatal categories. Because some “neonatal” diagnoses can also apply to older patients, it’s difficult to match APR-DRGs (age-driven) to MS-DRGs (diagnosis-driven). We’ve tagged these services as ones to revisit and re-assign manually
  2. Orthopedic Services -
    1. Given different structures of APR-DRGs / MS-DRGs for MSK services (body parts, complexity, elective vs. non-elective), we’ve bookmarked these services as next-up for a more in-depth manual review and assignment of crosswalk items
  3. Psych Services -
    1. We’ve noticed that our methodology below falls short due to psych services that can have similar meanings (but not use a similar language to describe them).

Crosswalk Logic

  1. Gather Source Data Files

    • Utilize the most recent official descriptors for both MS-DRGs and APR-DRGs
    • Ensure each file passes data integrity checks (e.g., consistent row counts, valid code formats, matching disclaimers to official CMS/3M lists).
  2. Create an embedding based crosswalk

    • Generate Embeddings for all MS-DRGs and APR-DRGs
      • We use an AI model (OpenAI’s “text-embedding-ada-002”) that reads each DRG description and turns it into a vector—a long list of numbers that captures the essential meaning of the text.
        • For example, if two descriptions are very similar in meaning (e.g., “Craniotomy with major comorbidities” vs. “Craniotomy procedure with significant secondary conditions”), their embeddings will look similar.
      • To mitigate purely “generic text” interpretations, we supplement embedding generation with domain-specific references (e.g., synonyms, common abbreviations, and expansions) to help the model more accurately capture nuanced clinical language.
    • Correlate DRG Embeddings with Cosine Similarity
      • Cosine similarity allows us to measure the distance between two sets of embeddings. A smaller distance means that two embeddings (and, in effect, two DRG descriptions) are close in meaning. A larger distance means that they are farther in meaning.
      • Utilizing a cosine similarity function, we add up to 10 MS-DRG <> APR-DRG relationships to the embedding-based crosswalk that are above our minimum distance criteria.
      • In practice, almost all MS-DRGs correspond to fewer than 10 APR-DRGs; however, we allow up to 10 in the raw output to capture edge cases or borderline mappings that might require manual or secondary validation. Subsequent steps (described below) trim this list significantly.
  3. Create Validation Thresholds in Crosswalk Base

    The initial embedding-based crosswalk is intentionally broad. This step implements two additional filters that refine it to create a “final crosswalk"

    1. Overlapping Words
      1. We compare the number of matching or closely related words in each MS-DRG and APR-DRG description. This helps identify pairs where the language aligns closely beyond the purely numeric similarity score (and helps us eliminate many records where the numeric cosine similarity gives a false positive)
      2. Synonyms, Abbreviations and Noise: To ensure fair comparisons, we maintain a controlled vocabulary that handles common clinical abbreviations and synonymous terms (e.g., “sever” vs. “major”). We also maintain a list of ‘noisy’ words that have no contextual meaning and shouldn’t count as matches (e.g. of, and, etc.)
      3. Grab record(s) that have the most correlated words together. If none, we grab record(s) that have the highest cosine similarity score present in the embedding-based crosswalk
    2. MGB-Specific Validation & Correlation
      1. Utilizing fy22 MGB claims, we create an MGB-reference crosswalk to evaluate the accuracy of our embedding based crosswalk. We create this reference crosswalk by:
        1. Sum the total number of claims dual-coded to each MS-DRG and APR-DRG combo (utilizing APR-DRG v40)
        2. Limit the dataset to ≄ 20 claims per MS-DRG: Each MS-DRG must be dual-coded with an APR-DRG at least 20 times in order to be eligible
        3. Limit the dataset to ≄15% MS-DRG <> APR-DRG combo: If an MS-DRG has at least 20 claims, we only include APR-DRG mappings that are billed across 15% or more of total claims billed for that MSDRG.
      2. Records in the embedding crosswalk that are validated by this reference crosswalk are flagged. Embedding crosswalk mappings that are not validated by the the reference crosswalk are manually reviewed for accuracy
  4. DRG Family Alignment

    1. An MS-DRG family refers to DRGs sharing the same core clinical intent but varying by severity (e.g., “with CC,” “with MCC,” or “no CC/MCC”).
    2. Each MS-DRG that belongs to the same family must be mapped to the same set of APR-DRGs
  5. Manual Review

    1. Crosswalk entries passing all automated checks undergo manual review by internal team members (coders, billers and medical auditors) who validate that both DRG terms are applicable to the same set of procedures
  6. Map Severity of Illnesses

    1. Typically, an MS-DRG with “no CC/MCC” is closer to a lower Severity of Illness (SOI=1), while an MS-DRG with “MCC” is closer to a higher SOI (up to SOI=4)
    2. Our initial mapping objective maps the following MS-DRG types to the following APR-DRG SOIs:
      1. MS-DRG “No CC/MCC” → APR-DRG with SOI = 1
      2. MS-DRG “with CC” → APR-DRG with SOI = 2 or 3 (depending on clinical alignment)
      3. MS-DRG “with MCC” → APR-DRG with SOI = 4
    3. Note - not all MS-DRG families are created equal. They can contain different combinations of MDC assignments. Generally our SOI assignment follows logic above, but our current methodology prioritizes a full range of crosswalk items; meaning each pre-MDC DRG family should map to all SOIs of an APR-DRG
On this page: