Data Sources
The Smart Data Platform ingests data from multiple source systems into a Bronze Lakehouse, where it serves as the immutable foundation for all downstream transformations. This page covers each source system, its ingestion method, and the Bronze layer's architecture.
Ingestion Architecture
graph LR DV[Dataverse D365] -->|Shortcut| BRZ[Bronze Lakehouse] AX[Dynamics AX 2012] -->|Cross-workspace Shortcut| BRZ SP[SharePoint Lists] -->|Shortcut| BRZ VS[Vesper API] -->|Azure Function| BRZ DC[Datacollect Excel] -->|Azure Function| BRZ BD[Broker Data] -->|Shortcut| BRZ MK[Market Data APIs] -->|Azure Function| BRZ MON[dbt Artifacts] -->|Post-build Script| BRZ
Data arrives in Bronze through two primary mechanisms: shortcuts (zero-copy references that incur no ingestion cost) and Azure Functions (scheduled or event-driven code that writes Parquet files to OneLake).
Source Inventory
| Source | Owner | Refresh | Ingestion Method | Schema Prefix | Tables |
|---|---|---|---|---|---|
| Dynamics AX 2012 | IT (legacy) | Static archive | Cross-workspace shortcut to Bronze_AX | ax | 221 tables |
| Dataverse (D365) | Business Apps | Real-time sync | Dataverse shortcut | dataverse_d365 | CRM entities |
| SharePoint | Various departments | Near real-time | SharePoint shortcut | sharepoint_ad | Application data lists |
| Broker Data (BD) | Trading desk | Daily | Shortcut | bd | Marex, Stonex futures |
| Vesper API | Data Engineering | Scheduled (daily) | Azure Function | vesper | Futures, spot prices |
| Datacollect | Data Engineering | On-demand | Azure Function | datacollect | Vendor data, market forms |
| Market Data | Data Engineering | Scheduled (daily) | Azure Function | market_data | Exchange rates, futures |
| Monitoring | dbt Pipeline | Post-build | Python script | monitoring | Run results, test results |
Why Shortcuts Over ETL?
The platform favors OneLake shortcuts over traditional ETL pipelines for most sources. The reasoning:
- Zero maintenance -- Shortcuts are declarative references, not code. There are no transformation jobs to monitor, no retry logic to maintain, no scheduling to configure.
- Real-time sync -- Dataverse shortcuts reflect changes within seconds. Traditional ETL would introduce batch latency.
- No data duplication -- Shortcuts point to the source data in-place. No storage cost for the Bronze copy.
- Automatic schema evolution -- When source tables add columns, shortcuts pick them up automatically. ETL pipelines would require schema change detection and handling.
Azure Functions are used only for sources that require active ingestion: external APIs (Vesper, market data) and file-based uploads (Datacollect Excel files).
Shortcut Lifecycle
Shortcuts follow a Git-managed lifecycle from creation to production deployment:
graph TD CREATE[Create shortcut in DEV Fabric UI] --> COMMIT[Commit via Source Control panel] COMMIT --> METADATA[shortcuts.metadata.json updated on main branch] METADATA --> PR_UAT[PR: main to release/uat] PR_UAT --> UAT_SYNC[UAT workspace auto-syncs from Git] UAT_SYNC --> PR_PROD[PR: release/uat to release/prod] PR_PROD --> PROD_SYNC[PROD workspace auto-syncs from Git]
Key details:
workspaces/bronze/Lakehouse_Bronze.Lakehouse/shortcuts.metadata.jsonis the IaC source of truth for all shortcuts deployed byscripts/deploy_shortcuts.py. The pipeline runs this script on every infra-deploy.- For shortcuts whose target item ID differs per environment (e.g. the 7 OneLake shortcuts pointing to the local
Application_Data_ListsMirroredDatabase), the manifest uses"oneLakeLookupName": "Application_Data_Lists"instead of a hardcoded item ID.deploy_shortcuts.pyresolves the ID at deploy time by listing workspace items — no per-environment config needed. - The pipeline also calls
scripts/deploy_dataverse_shortcuts.pyseparately for the 96 PROD Dataverse shortcuts managed viadeployment/bronze-dataverse-shortcuts.json. - The Git-based promotion flow (PR + merge) moves shortcuts across environments.
- Always use "Update" (pull) before "Commit" (push) in the Fabric UI to avoid merge conflicts.
Schema Organization
Bronze Lakehouse uses schema-enabled mode. Shortcuts are organized into schemas that map to source systems:
| Schema | Source | Contents |
|---|---|---|
dataverse_d365 | Dataverse — PROD only (operations-geris-prod.crm4.dynamics.com) | All D365 F&O business tables used by marts. 96 shortcuts. Canonical source for every downstream dbt model. |
dataverse_d365_uat | Dataverse — UAT (operations-geris-uat.crm4.dynamics.com) | Parallel copies of 11 core business tables (salestable, salesline, purchtable, purchline, lgslogisticfiletable, lgslogisticfileline, inventtrans, inventdim, inventbatch, inventtransorigin, prodtable) for UAT-data inspection. Never joined by production dbt models. |
sharepoint | SharePoint + local MirroredDatabase | 5 OneDriveSharePoint shortcuts (Stonex/Marex broker data) and 7 OneLake shortcuts pointing to the local Application_Data_Lists MirroredDatabase (budget, KPI, report-owner reference data). |
datacollect | Datacollect | Market data collection forms |
sharepoint_ad | SharePoint | Legacy application data lists (pre-2026 path) |
ax | Dynamics AX 2012 | Cross-workspace shortcut to shared Bronze_AX lakehouse |
dbo | Various | Legacy and miscellaneous tables |
Environment split: before 2026-04 the 11 UAT-sourced tables sat under
dataverse_d365alongside the PROD shortcuts, silently winning the create-race and making DEV Bronze serve UAT data for those tables. The split into two schemas makes the source environment explicit and is enforced byscripts/deploy_dataverse_shortcuts.py(duplicate(path, name)keys now fail fast). See the Bronze shortcut env-split runbook for the one-time reconciliation procedure.
Bronze_AX: Shared Lakehouse
Bronze_AX is a special case. It is a shared lakehouse containing the AX 2012 archive (221 tables) that all environments reference via cross-workspace shortcuts. Unlike other shortcuts, AX shortcuts:
- Are identical across DEV, UAT, and PROD (same
workspaceId+itemId) - Should NOT be included in environment promotion PRs
- Point to the same physical data regardless of environment
If Bronze_AX needs to move or be recreated, all three environments must be updated simultaneously using the migration script, not through the standard Git promotion flow.
Bronze Immutability
Bronze data is treated as immutable -- the raw record of what was received from source systems. This is enforced through multiple layers:
SQL Write Protection
The security pipeline (security-deploy.yml) deploys DENY grants that block INSERT, UPDATE, and DELETE operations on all Bronze schemas for non-admin roles. This is automated and deployed on every push to main, release/uat, or release/prod.
OneLake Soft-Delete
Soft-delete is enabled on all three Bronze lakehouses with a 30-day retention period. Accidentally deleted files are retained and can be recovered via the OneLake File Explorer or REST API. This is a manual Fabric portal configuration (not automatable via Terraform).
| Lakehouse | Purpose | Soft-Delete |
|---|---|---|
Lakehouse_Bronze | D365, logistics, BD tables | 30-day retention |
Lakehouse_Bronze_AX | AX 2012 historical data (221 tables) | 30-day retention |
Lakehouse_Datacollect | Excel files, market data APIs | 30-day retention |
AX Archive Immutability
The AX 2012 archive in Azure Blob Storage (gerisdbtartifacts/ax-archive) has a 7-year (2555-day) time-based retention policy. Once locked, blobs cannot be deleted until the retention period expires. This satisfies financial data retention requirements.
Validation
Run the Bronze immutability validation script to verify all protections are in place:
python scripts/validate_bronze_immutability.py
This checks: OneLake soft-delete on all Bronze lakehouses, SQL DENY grants deployed, and ax-archive container immutability policy configured.
Related Pages
- dbt Pipeline Overview -- How Bronze data flows through staging to marts
- Model Inventory -- All staging models that read from Bronze
- Developer Workflow -- Adding new shortcuts as part of feature development
- Architecture -- System-level view of all platform components