Skip to main content
Version: 3.0

Stage 8: Production Output

The sub-DAG produces one combined table per month. The orchestrator stitches multiple months, applies cross-month selection, and publishes versioned production tables.

Sub-DAG vs Orchestrator

Sub-DAG (per month)

Runs the full pipeline for a single month of MRF data:

  1. Produces tmp_int_combined_no_whisp via rate selection
  2. Runs 8 whisper computation tasks in parallel
  3. Joins all whisper tables to produce tmp_int_combined

Orchestrator (cross-month)

Coordinates all sub-DAGs and produces the final versioned output:

  1. Triggers N sub-DAGs (one per month), all in parallel
  2. Waits for all to complete, then merges all tmp_int_combined tables
  3. Applies cross-month canonical selection
  4. Publishes prod_combined_abridged, prod_combined_all, rollup views, traceability, and external API table

Sub-DAG Output Flow

1
Canonical Rate Selection (chunked by payer)
Rate arrays → array_max → canonical_rate. The winning rate and all metadata written to output.
tmp_int_combined_no_whisp_{sub_version}
2
Whisper Computation (8 parallel tasks)
Dimensional aggregations and reference data joins computed in parallel across provider, payer, network, code, and combination dimensions.
tmp_whisper_provider, tmp_whisper_payer, tmp_whisper_code, …
3
Add Whispers to Main
All 8 whisper tables joined onto the rate selection output to produce the final combined table for this sub-version.
tmp_int_combined_{sub_version}

Orchestrator Merge Flow

1
Trigger Sub-DAGs
One sub-DAG per month, all triggered in parallel. Each produces its own tmp_int_combined.
tmp_int_combined × N months
2
Merge Combined Chunks
Approximately 35M rows per chunk. Cross-month selection: highest score wins across all sub-versions. Same-score ties resolved by rate type hierarchy: Posted > Real-World > Enhanced > Benchmark.
merged_combined_chunks
3
prod_combined_abridged
Lean, API-ready subset of columns. Primary consumer-facing table.
prod_combined_abridged
4
prod_combined_all
Full column set including every rate column, score column, and internal metadata.
prod_combined_all
5
Rollup Views + Traceability
Pre-aggregated views built on prod_combined_abridged for fast analytics. Traceability tables link each final canonical rate back to its raw MRF source record.
prod_rollup_*, prod_traceability_*

prod_combined_abridged vs prod_combined_all

Featureabridgedall
Row countSame — one row per ROID across all merged sub-versionsSame
ColumnsSubset: canonical fields + whisper enrichments + key metadataComplete: every rate column, score column, metadata
Raw rate arraysNot includedIncluded
Primary consumersExternal APIs, customer-facing productsInternal QA, analytics, debugging
NULL handling

When canonical_rate IS NULL, metadata fields are also NULL. canonical_rate_score = 0. Filter on canonical_rate_score > 1 for non-outlier rates, or canonical_rate_score > 0 to include outlier-flagged rates while still excluding empty ROIDs.