Development Documentation
View as:

Data Flow Pipeline

This page traces data from each external source system through the Bronze, Gold, Semantic, and Reports layers. Understanding this flow is essential for debugging data quality issues, adding new sources, and reasoning about data freshness.

End-to-End Data Flow

graph TD
    subgraph Sources["Source Systems"]
        AX["Dynamics AX<br/><small>ERP: sales, inventory,<br/>logistics, finance</small>"]
        DV["Dataverse<br/><small>CRM: contacts, accounts,<br/>system users</small>"]
        VS["Vesper API<br/><small>Cheese futures, spot prices,<br/>exchange rates</small>"]
        SP["SharePoint<br/><small>List data</small>"]
        DC["Datacollect<br/><small>Internal data collection<br/>forms</small>"]
        BD["Broker Systems<br/><small>Marex, StoneX<br/>futures positions</small>"]
    end

    subgraph Ingestion["Ingestion Layer"]
        SH["Lakehouse Shortcuts<br/><small>Zero-ETL, Microsoft-managed<br/>sync to OneLake</small>"]
        AF["Azure Functions<br/><small>Python, scheduled +<br/>on-demand ingestion</small>"]
    end

    subgraph Bronze["Bronze Lakehouse<br/><small>Raw data, no transforms</small>"]
        BT["Bronze Tables<br/><small>~100 source tables<br/>exposed via SQL endpoint</small>"]
    end

    subgraph Gold["Gold Warehouse<br/><small>dbt transforms, schema: dbo</small>"]
        STG["Staging (78 models)<br/><small>1:1 with sources<br/>Light rename + cast</small>"]
        INT["Intermediate (14 models)<br/><small>Joins, business logic<br/>Cross-source enrichment</small>"]
        MRT["Marts (84 models)<br/><small>Dims + Facts<br/>Business-ready tables</small>"]
        VW["Consumer Views<br/><small>Tagged: consumer_view</small>"]
        MON["Monitoring Models<br/><small>ETL health metrics</small>"]
    end

    subgraph Semantic["Semantic Layer"]
        DL["5 DirectLake Models<br/><small>Auto-refresh from Gold<br/>LH_Gold_Full, Datacollect,<br/>3 market data models</small>"]
        IM["2 Import/Mixed Models<br/><small>Manual credential refresh<br/>Report Owners + 1 other</small>"]
    end

    subgraph Output["Output Layer"]
        RPT["22 Power BI Reports"]
        EXP["Excel Exports<br/><small>SQL -> Excel -> Email<br/>Config-driven, scheduled</small>"]
    end

    AX -->|shortcuts| SH
    DV -->|shortcuts| SH
    SP -->|shortcuts| SH
    VS -->|scheduled| AF
    DC -->|on-demand| AF
    BD -->|scheduled| AF

    SH --> BT
    AF --> BT

    BT --> STG
    STG --> INT
    INT --> MRT
    MRT --> VW
    MRT --> MON

    MRT -->|DirectLake| DL
    MRT --> IM
    DL --> RPT
    IM --> RPT
    MRT --> EXP

Source Systems

Six source systems feed into Bronze via shortcuts (zero-copy references) or Azure Functions (scheduled/on-demand ingestion). For the full source inventory with ingestion methods, schema organization, and Bronze architecture, see Data Sources.

dbt Transformation Layers

Data flows through three dbt layers: Staging (1:1 views on source tables), Intermediate (cross-source joins and business logic), and Marts (materialized dimensions and facts consumed by semantic models). See dbt Pipeline Overview for the full layer architecture, source groups, macros, and critical business logic.

Export System

Alongside reports, the platform provides config-driven exports: SQL query → Excel generation → email delivery. Exports can be scheduled or triggered on-demand via a web UI. See Export System for configuration, authorization, and the full architecture.

Data Freshness

LayerFreshnessTrigger
Bronze (shortcuts)Near real-timeMicrosoft-managed sync
Bronze (API ingestion)Daily or on-demandAzure Functions timer/HTTP triggers
Gold WarehouseNightly (full build) + PR buildsdbt-dev-build.yml pipeline
Semantic (DirectLake)Immediate after Gold buildNo separate refresh needed
Semantic (Import)Manual refreshRequires credential owner to trigger
ReportsReal-time from SemanticNo caching layer

The practical implication: for most reports, data is at most one day old (reflecting the nightly dbt build). Dataverse/AX source data in Bronze is near real-time, but Gold transformation is batch.