Fabric Datalake Platform

Operations & Developer Guide — How Everything Works Together
dbt Azure DuckDB ADO dbt + Fabric Warehouse + DuckDB + Azure DevOps
Code-first · Dual-target · Two-layer CI
Monorepo Structure — What Lives Where
datalake/
Git-synced → Fabric workspace
Notebooks · Lakehouse config
semantic/
Git-synced → Fabric workspace
6 semantic models (TMDL)
reports/
Git-synced → Fabric workspace
20+ Power BI reports
dbt/
NOT git-synced
External project · CI deploys
functions/
Azure Functions (Python)
Config-driven ingestion
terraform/
IaC modules (templated)
Not yet implemented
pipelines/ scripts/ security/ docs/
CI/CD YAML · Dev tooling
RBAC grants · Runbook
Git sync: datalake/, semantic/, reports/ are directly synced to Fabric workspaces via Azure DevOps git integration. dbt/ is not synced — it runs externally via CI pipelines and targets the Gold_Warehouse via SQL endpoint. All code is in one repo; each folder maps to a distinct deployment target.
Developer Workflow — How You Build
1. Local Setup (One-Time)
scripts/setup-local.sh
Checks Python 3.11–3.12 (rejects 3.13+) · creates .venv/ · installs dbt-core + dbt-duckdb + dbt-fabric · runs dbt deps
scripts/export_dev_data.py
Exports ~70 Bronze tables from Fabric SQL endpoints as Parquet into dev-data/ (~266 MB) · SPN or az login auth
2. Daily Development Loop (Zero CU)
Edit SQL models
dbt/models/staging|intermediate|marts
dbt build --target local
DuckDB + Parquet snapshots · < 5 min
--export-parquet
Optional: export marts for PBI Desktop preview
git push
Triggers Layer 1 CI (DuckDB smoke)
scripts/dev-build.sh # full local build
scripts/dev-build.sh --select dim_item+ # selective build
scripts/dev-build.sh --export-parquet # build + PBI export
How Local DuckDB Works
dev-data/*.parquet
~70 Bronze table snapshots exported from Fabric
on-run-start: load_parquet_sources()
Creates DuckDB tables from each Parquet file · idempotent
fabric_datalake.duckdb
Persisted file · staging=views, marts=tables · full dbt tests
Data Flow — Source to Consumer
Sources
D365 / Dataverse
49 tables · Synapse Link → Bronze
AX 2012
2 tables · Copy Job → Bronze_AX
Business Data (BD)
3 tables · product config, allocations
Vendor Files & APIs
13 tables · Vesper, Marex, StoneX, DevOps
Bronze (Immutable)
Lakehouse_Bronze
Dataverse + BD + datacollect
Lakehouse_Bronze_AX
AX 2012 historical tables
Lakehouse_Datacollect
Vendor files · market data
Delta / Parquet · Open Format
dbt Transformation
STG
Staging (views, zero CU)
1:1 source · rename, cast · cross-db macros (safe_cast, string_agg, etc.)
INT
Intermediate (views)
Business logic · AX+D365 merge (union_ax_d365 macro) · domain organized
MRT
Marts (materialized tables)
Gold output · dim_* + fact_* · schema contracts · dbt tests
Gold → Consumers
Gold_Warehouse
Fabric Warehouse (T-SQL) · SQL endpoint for all consumers
6 Semantic Models (DirectLake)
LH_Gold_Full · CheeseFutures · HistoricalCurves · SpreadAnalysis · Datacollect · ReportOwners
20+ Power BI Reports
Traders · Supply Chain · Finance · Logistics
Power Apps + SQL Endpoint
Schema contracts · direct SQL queries
CI/CD Pipeline — From Push to Production
git push
any feature/* branch
Layer 1: DuckDB Smoke
every push · ~30s · 0 CU · dbt build + docs generate + validate_descriptions
Open PR to main
triggers Layer 2
Layer 2: Fabric Slim CI
PRs only · state:modified+1 · per-PR schema (ci_pr_<id>) · ~5–10 min
Merge to main
both CI layers must pass
DEV
auto-sync on merge
UAT
promote-to-uat.yml
PROD
approval gate · serialized
Layer 1 — DuckDB Smoke
dbt build --target duckdb --profiles-dir .
dbt docs generate
python validate_descriptions.py
python validate_schema_dependencies.py
Pipeline: pipelines/dbt-ci-smoke.yml
Catches: SQL syntax, broken refs, missing descriptions, DAG issues
Expected baseline: WARN=10 (Bronze data quality)
Layer 2 — Fabric Slim CI
dbt build --select state:modified+1 \
  --target ci --defer --state target-state
# per-PR schema: ci_pr_<PR_ID>
# auto-cleanup in condition: always()
Pipeline: pipelines/dbt-ci-slim.yml
Manifest baseline: downloaded from nightly build (Azure Blob)
Only builds changed models + 1 downstream
Nightly Full Build
dbt build --target prod --profiles-dir .
# uploads manifest.json to Azure Blob
# baseline for next day's slim CI
Pipeline: pipelines/dbt-nightly-full.yml
Trigger: 02:00 UTC daily on main
Full model set validation + manifest for slim CI
Environments — Where Code Runs
Local
Target: local
DuckDB + Parquet snapshots
Persisted file: fabric_datalake.duckdb
80% of dev work · 0 CU
CI
Target: duckdb + ci
In-memory DuckDB (Layer 1)
Fabric Warehouse (Layer 2)
Per-PR schema isolation
DEV
Target: dev
Own Warehouse + Lakehouse
Shortcuts to prod Bronze (RO)
Auto-pause 30 min idle
UAT
Promotion: pipeline
fabric-cicd parametrized deploy
Daily refresh schedule
Stakeholder validation
PROD
Approval gate required
Max 1 concurrent build
Nightly full build 02:00 UTC
1-hour rollback via parallel run
dbt Targets (profiles.yml)
TargetAdapterAuthWhenKey Flag
local DuckDB (file)NoneDev machine--profiles-dir . always required
duckdb DuckDB (memory)NoneCI Layer 1In-memory, no state persisted
ci Fabric WarehouseSPN (data-worker)CI Layer 2 (PRs)--defer --state + per-PR schema
dev Fabric WarehouseSPN (data-worker)DEV workspacePer-developer warehouse
prod Fabric WarehouseSPN (prod)PROD nightlySeparate SPN credentials
Fabric Workspaces — Production Layout
WS: Datalake
Lakehouse_Bronze (Dataverse + BD)
Lakehouse_Bronze_AX (AX 2012)
Lakehouse_Datacollect (vendor files)
Gold_Warehouse (dbt target)
Legacy notebooks (parallel run)
HIDDEN from business users
WS: Semantic
LH_Gold_Full (primary)
CheeseFuturePrices
Datacollect (stub)
Report_Owners
SPN auth to Datalake WS
WS: Reports
20+ Power BI reports
Traders · Supply Chain
Finance · Logistics
ONLY workspace visible to users
Interaction flow: Datalake WS (Bronze lakehouses + Gold_Warehouse) → Semantic WS reads from Gold_Warehouse via SPN-authenticated DirectLake connection → Reports WS binds to Semantic models → Users only see Reports WS
Parallel Run Validation (Pre-Cutover)
Old System
Legacy notebooks → Lakehouse_Gold
Runs in parallel during transition
New System
dbt build → Gold_Warehouse
Must match before cutover
compare_parallel_runs.py
Modes: exact / tolerance / row_count
Config: docs/parallel_run_config.yml
Cross-Database Compatibility — DuckDB ↔ T-SQL
Dual-target design: All dbt models must compile and run on both DuckDB (local/CI) and Fabric Warehouse (T-SQL). Source of truth: docs/dialect-compatibility-log.md
Target Detection in Source YAML
database: "{{ 'Lakehouse_Bronze' if target.name not in ('local', 'duckdb') else '' }}"
schema: "{{ 'dataverse_d365' if target.name not in ('local', 'duckdb') else 'main' }}"
Always use not in ('local', 'duckdb') — never != 'local' alone. Both DuckDB targets must be excluded.
Custom Cross-DB Macros (macros/cross_db/)
safe_cast() try_cast() string_agg() concat_ws() iif_case() extract_date_from_filename() digits_only() union_ax_d365()
PatternDuckDBT-SQL (Fabric)Solution
Recursive CTEWITH RECURSIVE (required)WITH (implicit)Jinja if/else on target.name
TimestampNOW()GETDATE(){{ dbt.current_timestamp() }}
Left-padlpad(s, n, '0')No native LPADright('00' || cast(s as varchar), n)
IdentifiersNo brackets[col name]Always use "col name" double quotes
ConditionalNo IIFIIF(cond, t, f)CASE WHEN or iif_case()
Null coalesceCOALESCEISNULL or COALESCEAlways use COALESCE
Infrastructure & Credentials
spn-fabric-platform-admin
Role: Workspace Admin
Used by: Terraform, fabric-cicd, Git APIs
Scope: Workspace creation · item deployment
spn-fabric-data-worker
Role: Workspace Contributor
Used by: dbt CI/CD builds, Azure Functions
Scope: Data read/write · model execution
Azure DevOps Variable Group: fabric-ci-vars
FABRIC_SERVER — DEV SQL endpoint FABRIC_CI_SERVER — CI SQL endpoint FABRIC_DATABASE — DEV database FABRIC_CI_DATABASE — CI database FABRIC_TENANT_ID — Azure AD tenant FABRIC_PROD_SERVER — PROD endpoint FABRIC_DATA_WORKER_CLIENT_ID — SPN ID FABRIC_DATA_WORKER_CLIENT_SECRET — SPN secret
Key Operational Files
FilePurpose
dbt/profiles.ymlAll target definitions (local, duckdb, ci, dev, prod)
dbt/dbt_project.ymlProject config, materialization rules, on-run-start hook
dbt/macros/load_parquet_sources.sqlOn-run-start: creates DuckDB tables from Parquet
docs/dialect-compatibility-log.mdSQL pattern reference (DuckDB ↔ T-SQL)
docs/runbook.mdOperational procedures & setup checklists
docs/parallel_run_config.ymlPer-table comparison rules for parallel run
docs/semantic_model_dependencies.ymlGold columns referenced by semantic models
scripts/setup-local.shBootstrap local dev environment
scripts/dev-build.shLocal build + selective/Parquet export
scripts/export_dev_data.pyExport Bronze tables from Fabric as Parquet
Operational Rules & Guardrails
Python Version
Must be 3.11–3.12. Python 3.13 breaks dbt-fabric silently. setup-local.sh rejects 3.13+ automatically.
--profiles-dir .
Required on EVERY dbt command. profiles.yml is in the dbt/ project dir, not ~/.dbt/. Missing this = "profile not found."
packages.yml
Only for dbt Hub packages (dbt_utils). Never add pip packages here. dbt-fabric and dbt-duckdb go in requirements.txt / pip install.
PR Pipeline Trigger
PR-only pipelines MUST have trigger: none at top. Without it, default CI trigger runs on every push to every branch.
WARN=10 Baseline
10 test warnings are expected (pre-existing Bronze data quality). Non-blocking. Should not increase without investigation.
Story Completion Gate
A story is NOT done until dbt build --target local passes all tests. "Compiles" ≠ "tests pass."
Parallel Run = Safety Net
Old and new systems run side-by-side for 2 weeks post-cutover. 1-hour rollback to legacy if issues found.
Side-by-Side Only
Never modify existing production system. All new work in new workspaces. Cutover = switch users to new workspace.
dbt Model Structure — What Gets Built
Model Layers
staging/ (views)
1:1 with Bronze tables
Rename, cast, type normalize
Zero CU · cross-db macros
stg_dataverse__* stg_ax__* stg_bd__*
intermediate/ (views)
Business logic · aggregations
AX 2012 + D365 merge unions
Domain folders: trading/ finance/ logistics/
int_allocations_* int_exchange_rates_* int_trade_*
marts/ (materialized tables)
Gold output · schema contracts
dim_* dimensions + fact_* facts
Business descriptions · dbt tests
dim_batch dim_logistics fact_trade fact_inventory_snapshot fact_sca_allocations +60 more
Bronze Sources (~70 tables)
SourceDatabaseTables
dataverseLakehouse_Bronze49
datacollectLakehouse_Bronze13
bdLakehouse_Bronze3
axLakehouse_Bronze_AX2
ax_bronzeLakehouse_Bronze2
Seeds (Reference Data)
param_direction_split.csv param_inventory_type.csv param_value_unit.csv
CI Validation Scripts
validate_descriptions.py validate_schema_dependencies.py compare_parallel_runs.py
Sources
Bronze
dbt / CI
Gold
Semantic / Workspaces
Consumers
Guardrails
Infrastructure
Fabric Datalake Platform · Operations Guide