Data Flow Pipeline
This page traces data from each external source system through the Bronze, Gold, Semantic, and Reports layers. Understanding this flow is essential for debugging data quality issues, adding new sources, and reasoning about data freshness.
End-to-End Data Flow
graph TD
subgraph Sources["Source Systems"]
AX["Dynamics AX<br/><small>ERP: sales, inventory,<br/>logistics, finance</small>"]
DV["Dataverse<br/><small>CRM: contacts, accounts,<br/>system users</small>"]
VS["Vesper API<br/><small>Cheese futures, spot prices,<br/>exchange rates</small>"]
SP["SharePoint<br/><small>List data</small>"]
DC["Datacollect<br/><small>Internal data collection<br/>forms</small>"]
BD["Broker Systems<br/><small>Marex, StoneX<br/>futures positions</small>"]
end
subgraph Ingestion["Ingestion Layer"]
SH["Lakehouse Shortcuts<br/><small>Zero-ETL, Microsoft-managed<br/>sync to OneLake</small>"]
AF["Azure Functions<br/><small>Python, scheduled +<br/>on-demand ingestion</small>"]
end
subgraph Bronze["Bronze Lakehouse<br/><small>Raw data, no transforms</small>"]
BT["Bronze Tables<br/><small>~100 source tables<br/>exposed via SQL endpoint</small>"]
end
subgraph Gold["Gold Warehouse<br/><small>dbt transforms, schema: dbo</small>"]
STG["Staging (78 models)<br/><small>1:1 with sources<br/>Light rename + cast</small>"]
INT["Intermediate (14 models)<br/><small>Joins, business logic<br/>Cross-source enrichment</small>"]
MRT["Marts (84 models)<br/><small>Dims + Facts<br/>Business-ready tables</small>"]
VW["Consumer Views<br/><small>Tagged: consumer_view</small>"]
MON["Monitoring Models<br/><small>ETL health metrics</small>"]
end
subgraph Semantic["Semantic Layer"]
DL["5 DirectLake Models<br/><small>Auto-refresh from Gold<br/>LH_Gold_Full, Datacollect,<br/>3 market data models</small>"]
IM["2 Import/Mixed Models<br/><small>Manual credential refresh<br/>Report Owners + 1 other</small>"]
end
subgraph Output["Output Layer"]
RPT["22 Power BI Reports"]
EXP["Excel Exports<br/><small>SQL -> Excel -> Email<br/>Config-driven, scheduled</small>"]
end
AX -->|shortcuts| SH
DV -->|shortcuts| SH
SP -->|shortcuts| SH
VS -->|scheduled| AF
DC -->|on-demand| AF
BD -->|scheduled| AF
SH --> BT
AF --> BT
BT --> STG
STG --> INT
INT --> MRT
MRT --> VW
MRT --> MON
MRT -->|DirectLake| DL
MRT --> IM
DL --> RPT
IM --> RPT
MRT --> EXP
Source Systems
Six source systems feed into Bronze via shortcuts (zero-copy references) or Azure Functions (scheduled/on-demand ingestion). For the full source inventory with ingestion methods, schema organization, and Bronze architecture, see Data Sources.
dbt Transformation Layers
Data flows through three dbt layers: Staging (1:1 views on source tables), Intermediate (cross-source joins and business logic), and Marts (materialized dimensions and facts consumed by semantic models). See dbt Pipeline Overview for the full layer architecture, source groups, macros, and critical business logic.
Export System
Alongside reports, the platform provides config-driven exports: SQL query → Excel generation → email delivery. Exports can be scheduled or triggered on-demand via a web UI. See Export System for configuration, authorization, and the full architecture.
Data Freshness
| Layer | Freshness | Trigger |
|---|---|---|
| Bronze (shortcuts) | Near real-time | Microsoft-managed sync |
| Bronze (API ingestion) | Daily or on-demand | Azure Functions timer/HTTP triggers |
| Gold Warehouse | Nightly (full build) + PR builds | dbt-dev-build.yml pipeline |
| Semantic (DirectLake) | Immediate after Gold build | No separate refresh needed |
| Semantic (Import) | Manual refresh | Requires credential owner to trigger |
| Reports | Real-time from Semantic | No caching layer |
The practical implication: for most reports, data is at most one day old (reflecting the nightly dbt build). Dataverse/AX source data in Bronze is near real-time, but Gold transformation is batch.