Version: 3.0

Stage 8: Production Output

The sub-DAG produces one combined table per month. The orchestrator stitches multiple months, applies cross-month selection, and publishes versioned production tables.

Sub-DAG vs Orchestrator

Sub-DAG (per month)

Runs the full pipeline for a single month of MRF data:

Produces tmp_int_combined_no_whisp via rate selection
Runs 8 whisper computation tasks in parallel
Joins all whisper tables to produce tmp_int_combined

Orchestrator (cross-month)

Coordinates all sub-DAGs and produces the final versioned output:

Triggers N sub-DAGs (one per month), all in parallel
Waits for all to complete, then merges all tmp_int_combined tables
Applies cross-month canonical selection
Publishes prod_combined_abridged, prod_combined_all, rollup views, traceability, and external API table

Sub-DAG Output Flow

Canonical Rate Selection (chunked by payer)

Rate arrays → array_max → canonical_rate. The winning rate and all metadata written to output.

→ tmp_int_combined_no_whisp_{sub_version}

Whisper Computation (8 parallel tasks)

Dimensional aggregations and reference data joins computed in parallel across provider, payer, network, code, and combination dimensions.

→ tmp_whisper_provider, tmp_whisper_payer, tmp_whisper_code, …

Add Whispers to Main

All 8 whisper tables joined onto the rate selection output to produce the final combined table for this sub-version.

→ tmp_int_combined_{sub_version}

Orchestrator Merge Flow

Trigger Sub-DAGs

One sub-DAG per month, all triggered in parallel. Each produces its own tmp_int_combined.

→ tmp_int_combined × N months

Merge Combined Chunks

Approximately 35M rows per chunk. Cross-month selection: highest score wins across all sub-versions. Same-score ties resolved by rate type hierarchy: Posted > Real-World > Enhanced > Benchmark.

→ merged_combined_chunks

prod_combined_abridged

Lean, API-ready subset of columns. Primary consumer-facing table.

→ prod_combined_abridged

prod_combined_all

Full column set including every rate column, score column, and internal metadata.

→ prod_combined_all

Rollup Views + Traceability

Pre-aggregated views built on prod_combined_abridged for fast analytics. Traceability tables link each final canonical rate back to its raw MRF source record.

→ prod_rollup_*, prod_traceability_*

prod_combined_abridged vs prod_combined_all

Feature	abridged	all
Row count	Same — one row per ROID across all merged sub-versions	Same
Columns	Subset: canonical fields + whisper enrichments + key metadata	Complete: every rate column, score column, metadata
Raw rate arrays	Not included	Included
Primary consumers	External APIs, customer-facing products	Internal QA, analytics, debugging

NULL handling

When canonical_rate IS NULL, metadata fields are also NULL. canonical_rate_score = 0. Filter on canonical_rate_score > 1 for non-outlier rates, or canonical_rate_score > 0 to include outlier-flagged rates while still excluding empty ROIDs.

Sub-DAG vs Orchestrator​

Sub-DAG (per month)​

Orchestrator (cross-month)​

Sub-DAG Output Flow​

Orchestrator Merge Flow​

prod_combined_abridged vs prod_combined_all​

On this page: