Stage 8: Production Output
The sub-DAG produces one combined table per month. The orchestrator stitches multiple months, applies cross-month selection, and publishes versioned production tables.
Sub-DAG vs Orchestrator
Sub-DAG (per month)
Runs the full pipeline for a single month of MRF data:
- Produces
tmp_int_combined_no_whispvia rate selection - Runs 8 whisper computation tasks in parallel
- Joins all whisper tables to produce
tmp_int_combined
Orchestrator (cross-month)
Coordinates all sub-DAGs and produces the final versioned output:
- Triggers N sub-DAGs (one per month), all in parallel
- Waits for all to complete, then merges all
tmp_int_combinedtables - Applies cross-month canonical selection
- Publishes
prod_combined_abridged,prod_combined_all, rollup views, traceability, and external API table
Sub-DAG Output Flow
1
Canonical Rate Selection (chunked by payer)
Rate arrays → array_max → canonical_rate. The winning rate and all metadata written to output.
→ tmp_int_combined_no_whisp_{sub_version}
2
Whisper Computation (8 parallel tasks)
Dimensional aggregations and reference data joins computed in parallel across provider, payer, network, code, and combination dimensions.
→ tmp_whisper_provider, tmp_whisper_payer, tmp_whisper_code, …
3
Add Whispers to Main
All 8 whisper tables joined onto the rate selection output to produce the final combined table for this sub-version.
→ tmp_int_combined_{sub_version}
Orchestrator Merge Flow
1
Trigger Sub-DAGs
One sub-DAG per month, all triggered in parallel. Each produces its own tmp_int_combined.
→ tmp_int_combined × N months
2
Merge Combined Chunks
Approximately 35M rows per chunk. Cross-month selection: highest score wins across all sub-versions. Same-score ties resolved by rate type hierarchy: Posted > Real-World > Enhanced > Benchmark.
→ merged_combined_chunks
3
prod_combined_abridged
Lean, API-ready subset of columns. Primary consumer-facing table.
→ prod_combined_abridged
4
prod_combined_all
Full column set including every rate column, score column, and internal metadata.
→ prod_combined_all
5
Rollup Views + Traceability
Pre-aggregated views built on prod_combined_abridged for fast analytics. Traceability tables link each final canonical rate back to its raw MRF source record.
→ prod_rollup_*, prod_traceability_*
prod_combined_abridged vs prod_combined_all
| Feature | abridged | all |
|---|---|---|
| Row count | Same — one row per ROID across all merged sub-versions | Same |
| Columns | Subset: canonical fields + whisper enrichments + key metadata | Complete: every rate column, score column, metadata |
| Raw rate arrays | Not included | Included |
| Primary consumers | External APIs, customer-facing products | Internal QA, analytics, debugging |
NULL handling
When canonical_rate IS NULL, metadata fields are also NULL. canonical_rate_score = 0. Filter on canonical_rate_score > 1 for non-outlier rates, or canonical_rate_score > 0 to include outlier-flagged rates while still excluding empty ROIDs.