Data Sources

The Smart Data Platform ingests data from multiple source systems into a Bronze Lakehouse, where it serves as the immutable foundation for all downstream transformations. This page covers each source system, its ingestion method, and the Bronze layer's architecture.

Ingestion Architecture

graph LR
  DV[Dataverse D365] -->|Shortcut| BRZ[Bronze Lakehouse]
  AX[Dynamics AX 2012] -->|Cross-workspace Shortcut| BRZ
  SP[SharePoint Lists] -->|Shortcut| BRZ
  VS[Vesper API] -->|Azure Function| BRZ
  DC[Datacollect Excel] -->|Azure Function| BRZ
  BD[Broker Data] -->|Shortcut| BRZ
  MK[Market Data APIs] -->|Azure Function| BRZ
  MON[dbt Artifacts] -->|Post-build Script| BRZ

Data arrives in Bronze through two primary mechanisms: shortcuts (zero-copy references that incur no ingestion cost) and Azure Functions (scheduled or event-driven code that writes Parquet files to OneLake).

Source Inventory

Source	Owner	Refresh	Ingestion Method	Schema Prefix	Tables
Dynamics AX 2012	IT (legacy)	Static archive	Cross-workspace shortcut to Bronze_AX	`ax`	221 tables
Dataverse (D365)	Business Apps	Real-time sync	Dataverse shortcut	`dataverse_d365`	CRM entities
SharePoint	Various departments	Near real-time	SharePoint shortcut	`sharepoint_ad`	Application data lists
Broker Data (BD)	Trading desk	Daily	Shortcut	`bd`	Marex, Stonex futures
Vesper API	Data Engineering	Scheduled (daily)	Azure Function	`vesper`	Futures, spot prices
Datacollect	Data Engineering	On-demand	Azure Function	`datacollect`	Vendor data, market forms
Market Data	Data Engineering	Scheduled (daily)	Azure Function	`market_data`	Exchange rates, futures
Monitoring	dbt Pipeline	Post-build	Python script	`monitoring`	Run results, test results

Why Shortcuts Over ETL?

The platform favors OneLake shortcuts over traditional ETL pipelines for most sources. The reasoning:

Zero maintenance -- Shortcuts are declarative references, not code. There are no transformation jobs to monitor, no retry logic to maintain, no scheduling to configure.
Real-time sync -- Dataverse shortcuts reflect changes within seconds. Traditional ETL would introduce batch latency.
No data duplication -- Shortcuts point to the source data in-place. No storage cost for the Bronze copy.
Automatic schema evolution -- When source tables add columns, shortcuts pick them up automatically. ETL pipelines would require schema change detection and handling.

Azure Functions are used only for sources that require active ingestion: external APIs (Vesper, market data) and file-based uploads (Datacollect Excel files).

Shortcut Lifecycle

Shortcuts follow a Git-managed lifecycle from creation to production deployment:

graph TD
  CREATE[Create shortcut in DEV Fabric UI] --> COMMIT[Commit via Source Control panel]
  COMMIT --> METADATA[shortcuts.metadata.json updated on main branch]
  METADATA --> PR_UAT[PR: main to release/uat]
  PR_UAT --> UAT_SYNC[UAT workspace auto-syncs from Git]
  UAT_SYNC --> PR_PROD[PR: release/uat to release/prod]
  PR_PROD --> PROD_SYNC[PROD workspace auto-syncs from Git]

Key details:

workspaces/bronze/Lakehouse_Bronze.Lakehouse/shortcuts.metadata.json is the IaC source of truth for all shortcuts deployed by scripts/deploy_shortcuts.py. The pipeline runs this script on every infra-deploy.
For shortcuts whose target item ID differs per environment (e.g. the 7 OneLake shortcuts pointing to the local Application_Data_Lists MirroredDatabase), the manifest uses "oneLakeLookupName": "Application_Data_Lists" instead of a hardcoded item ID. deploy_shortcuts.py resolves the ID at deploy time by listing workspace items — no per-environment config needed.
The pipeline also calls scripts/deploy_dataverse_shortcuts.py separately for the 96 PROD Dataverse shortcuts managed via deployment/bronze-dataverse-shortcuts.json.
The Git-based promotion flow (PR + merge) moves shortcuts across environments.
Always use "Update" (pull) before "Commit" (push) in the Fabric UI to avoid merge conflicts.

Schema Organization

Bronze Lakehouse uses schema-enabled mode. Shortcuts are organized into schemas that map to source systems:

Schema	Source	Contents
`dataverse_d365`	Dataverse — PROD only (`operations-geris-prod.crm4.dynamics.com`)	All D365 F&O business tables used by marts. 96 shortcuts. Canonical source for every downstream dbt model.
`dataverse_d365_uat`	Dataverse — UAT (`operations-geris-uat.crm4.dynamics.com`)	Parallel copies of 11 core business tables (`salestable`, `salesline`, `purchtable`, `purchline`, `lgslogisticfiletable`, `lgslogisticfileline`, `inventtrans`, `inventdim`, `inventbatch`, `inventtransorigin`, `prodtable`) for UAT-data inspection. Never joined by production dbt models.
`sharepoint`	SharePoint + local MirroredDatabase	5 OneDriveSharePoint shortcuts (Stonex/Marex broker data) and 7 OneLake shortcuts pointing to the local `Application_Data_Lists` MirroredDatabase (budget, KPI, report-owner reference data).
`datacollect`	Datacollect	Market data collection forms
`sharepoint_ad`	SharePoint	Legacy application data lists (pre-2026 path)
`ax`	Dynamics AX 2012	Cross-workspace shortcut to shared Bronze_AX lakehouse
`dbo`	Various	Legacy and miscellaneous tables

Environment split: before 2026-04 the 11 UAT-sourced tables sat under dataverse_d365 alongside the PROD shortcuts, silently winning the create-race and making DEV Bronze serve UAT data for those tables. The split into two schemas makes the source environment explicit and is enforced by scripts/deploy_dataverse_shortcuts.py (duplicate (path, name) keys now fail fast). See the Bronze shortcut env-split runbook for the one-time reconciliation procedure.

Bronze_AX: Shared Lakehouse

Bronze_AX is a special case. It is a shared lakehouse containing the AX 2012 archive (221 tables) that all environments reference via cross-workspace shortcuts. Unlike other shortcuts, AX shortcuts:

Are identical across DEV, UAT, and PROD (same workspaceId + itemId)
Should NOT be included in environment promotion PRs
Point to the same physical data regardless of environment

If Bronze_AX needs to move or be recreated, all three environments must be updated simultaneously using the migration script, not through the standard Git promotion flow.

Bronze Immutability

Bronze data is treated as immutable -- the raw record of what was received from source systems. This is enforced through multiple layers:

SQL Write Protection

The security pipeline (security-deploy.yml) deploys DENY grants that block INSERT, UPDATE, and DELETE operations on all Bronze schemas for non-admin roles. This is automated and deployed on every push to main, release/uat, or release/prod.

OneLake Soft-Delete

Soft-delete is enabled on all three Bronze lakehouses with a 30-day retention period. Accidentally deleted files are retained and can be recovered via the OneLake File Explorer or REST API. This is a manual Fabric portal configuration (not automatable via Terraform).

Lakehouse	Purpose	Soft-Delete
`Lakehouse_Bronze`	D365, logistics, BD tables	30-day retention
`Lakehouse_Bronze_AX`	AX 2012 historical data (221 tables)	30-day retention
`Lakehouse_Datacollect`	Excel files, market data APIs	30-day retention

AX Archive Immutability

The AX 2012 archive in Azure Blob Storage (gerisdbtartifacts/ax-archive) has a 7-year (2555-day) time-based retention policy. Once locked, blobs cannot be deleted until the retention period expires. This satisfies financial data retention requirements.

Validation

Run the Bronze immutability validation script to verify all protections are in place:

python scripts/validate_bronze_immutability.py

This checks: OneLake soft-delete on all Bronze lakehouses, SQL DENY grants deployed, and ax-archive container immutability policy configured.

dbt Pipeline Overview -- How Bronze data flows through staging to marts
Model Inventory -- All staging models that read from Bronze
Developer Workflow -- Adding new shortcuts as part of feature development
Architecture -- System-level view of all platform components