Fabric Datalake Platform
Operations & Developer Guide — How Everything Works Together
Monorepo Structure — What Lives Where
datalake/
Git-synced → Fabric workspace
Notebooks · Lakehouse config
semantic/
Git-synced → Fabric workspace
6 semantic models (TMDL)
reports/
Git-synced → Fabric workspace
20+ Power BI reports
dbt/
NOT git-synced
External project · CI deploys
functions/
Azure Functions (Python)
Config-driven ingestion
terraform/
IaC modules (templated)
Not yet implemented
pipelines/ scripts/ security/ docs/
CI/CD YAML · Dev tooling
RBAC grants · Runbook
Git sync: datalake/,
semantic/,
reports/ are directly synced to Fabric workspaces via Azure DevOps git integration.
dbt/ is
not synced — it runs externally via CI pipelines and targets the Gold_Warehouse via SQL endpoint.
All code is in one repo; each folder maps to a distinct deployment target.
Developer Workflow — How You Build
1. Local Setup (One-Time)
scripts/setup-local.sh
Checks Python 3.11–3.12 (rejects 3.13+) · creates .venv/ · installs dbt-core + dbt-duckdb + dbt-fabric · runs dbt deps
scripts/export_dev_data.py
Exports ~70 Bronze tables from Fabric SQL endpoints as Parquet into dev-data/ (~266 MB) · SPN or az login auth
2. Daily Development Loop (Zero CU)

Edit SQL models
dbt/models/staging|intermediate|marts
→

dbt build --target local
DuckDB + Parquet snapshots · < 5 min
→

--export-parquet
Optional: export marts for PBI Desktop preview
→

git push
Triggers Layer 1 CI (DuckDB smoke)
scripts/dev-build.sh # full local build
scripts/dev-build.sh --select dim_item+ # selective build
scripts/dev-build.sh --export-parquet # build + PBI export
How Local DuckDB Works
dev-data/*.parquet
~70 Bronze table snapshots exported from Fabric
→
on-run-start: load_parquet_sources()
Creates DuckDB tables from each Parquet file · idempotent
→
fabric_datalake.duckdb
Persisted file · staging=views, marts=tables · full dbt tests
Data Flow — Source to Consumer
Sources
D365 / Dataverse
49 tables · Synapse Link → Bronze
AX 2012
2 tables · Copy Job → Bronze_AX
Business Data (BD)
3 tables · product config, allocations
Vendor Files & APIs
13 tables · Vesper, Marex, StoneX, DevOps
Bronze (Immutable)
Lakehouse_Bronze
Dataverse + BD + datacollect
Lakehouse_Bronze_AX
AX 2012 historical tables
Lakehouse_Datacollect
Vendor files · market data
Delta / Parquet · Open Format
dbt Transformation
STG
Staging (views, zero CU)
1:1 source · rename, cast · cross-db macros (safe_cast, string_agg, etc.)
INT
Intermediate (views)
Business logic · AX+D365 merge (union_ax_d365 macro) · domain organized
MRT
Marts (materialized tables)
Gold output · dim_* + fact_* · schema contracts · dbt tests
Gold → Consumers
Gold_Warehouse
Fabric Warehouse (T-SQL) · SQL endpoint for all consumers
6 Semantic Models (DirectLake)
LH_Gold_Full · CheeseFutures · HistoricalCurves · SpreadAnalysis · Datacollect · ReportOwners
20+ Power BI Reports
Traders · Supply Chain · Finance · Logistics
Power Apps + SQL Endpoint
Schema contracts · direct SQL queries
CI/CD Pipeline — From Push to Production

git push
any feature/* branch
→

Layer 1: DuckDB Smoke
every push · ~30s · 0 CU · dbt build + docs generate + validate_descriptions
→

Open PR to main
triggers Layer 2
→

Layer 2: Fabric Slim CI
PRs only · state:modified+1 · per-PR schema (ci_pr_<id>) · ~5–10 min
→

Merge to main
both CI layers must pass
→

DEV
auto-sync on merge
→

UAT
promote-to-uat.yml
→
approval gate · serialized
Layer 1 — DuckDB Smoke
dbt build --target duckdb --profiles-dir .
dbt docs generate
python validate_descriptions.py
python validate_schema_dependencies.py
Pipeline: pipelines/dbt-ci-smoke.yml
Catches: SQL syntax, broken refs, missing descriptions, DAG issues
Expected baseline: WARN=10 (Bronze data quality)
Layer 2 — Fabric Slim CI
dbt build --select state:modified+1 \
--target ci --defer --state target-state
# per-PR schema: ci_pr_<PR_ID>
# auto-cleanup in condition: always()
Pipeline: pipelines/dbt-ci-slim.yml
Manifest baseline: downloaded from nightly build (Azure Blob)
Only builds changed models + 1 downstream
Nightly Full Build
dbt build --target prod --profiles-dir .
# uploads manifest.json to Azure Blob
# baseline for next day's slim CI
Pipeline: pipelines/dbt-nightly-full.yml
Trigger: 02:00 UTC daily on main
Full model set validation + manifest for slim CI
Environments — Where Code Runs

Local
Target: local
DuckDB + Parquet snapshots
Persisted file: fabric_datalake.duckdb
80% of dev work · 0 CU

CI
Target: duckdb + ci
In-memory DuckDB (Layer 1)
Fabric Warehouse (Layer 2)
Per-PR schema isolation

DEV
Target: dev
Own Warehouse + Lakehouse
Shortcuts to prod Bronze (RO)
Auto-pause 30 min idle

UAT
Promotion: pipeline
fabric-cicd parametrized deploy
Daily refresh schedule
Stakeholder validation
Approval gate required
Max 1 concurrent build
Nightly full build 02:00 UTC
1-hour rollback via parallel run
Fabric Workspaces — Production Layout

WS: Datalake
Lakehouse_Bronze (Dataverse + BD)
Lakehouse_Bronze_AX (AX 2012)
Lakehouse_Datacollect (vendor files)
Gold_Warehouse (dbt target)
Legacy notebooks (parallel run)
HIDDEN from business users

WS: Semantic
LH_Gold_Full (primary)
CheeseFuturePrices
Datacollect (stub)
Report_Owners
SPN auth to Datalake WS

WS: Reports
20+ Power BI reports
Traders · Supply Chain
Finance · Logistics
ONLY workspace visible to users
Interaction flow:
Datalake WS (Bronze lakehouses + Gold_Warehouse)
→ Semantic WS reads from Gold_Warehouse via SPN-authenticated DirectLake connection
→ Reports WS binds to Semantic models
→ Users only see Reports WS
Parallel Run Validation (Pre-Cutover)
Old System
Legacy notebooks → Lakehouse_Gold
Runs in parallel during transition
↔
New System
dbt build → Gold_Warehouse
Must match before cutover
→
compare_parallel_runs.py
Modes: exact / tolerance / row_count
Config: docs/parallel_run_config.yml
Cross-Database Compatibility —

DuckDB ↔

T-SQL
Dual-target design: All dbt models must compile and run on both DuckDB (local/CI) and Fabric Warehouse (T-SQL).
Source of truth: docs/dialect-compatibility-log.md
Target Detection in Source YAML
database: "{{ 'Lakehouse_Bronze' if target.name not in ('local', 'duckdb') else '' }}"
schema: "{{ 'dataverse_d365' if target.name not in ('local', 'duckdb') else 'main' }}"
Always use not in ('local', 'duckdb') — never != 'local' alone. Both DuckDB targets must be excluded.
Custom Cross-DB Macros (macros/cross_db/)
safe_cast()
try_cast()
string_agg()
concat_ws()
iif_case()
extract_date_from_filename()
digits_only()
union_ax_d365()
| Pattern | DuckDB | T-SQL (Fabric) | Solution |
| Recursive CTE | WITH RECURSIVE (required) | WITH (implicit) | Jinja if/else on target.name |
| Timestamp | NOW() | GETDATE() | {{ dbt.current_timestamp() }} |
| Left-pad | lpad(s, n, '0') | No native LPAD | right('00' || cast(s as varchar), n) |
| Identifiers | No brackets | [col name] | Always use "col name" double quotes |
| Conditional | No IIF | IIF(cond, t, f) | CASE WHEN or iif_case() |
| Null coalesce | COALESCE | ISNULL or COALESCE | Always use COALESCE |
Infrastructure & Credentials

spn-fabric-platform-admin
Role: Workspace Admin
Used by: Terraform, fabric-cicd, Git APIs
Scope: Workspace creation · item deployment

spn-fabric-data-worker
Role: Workspace Contributor
Used by: dbt CI/CD builds, Azure Functions
Scope: Data read/write · model execution

Azure DevOps Variable Group: fabric-ci-vars
FABRIC_SERVER — DEV SQL endpoint
FABRIC_CI_SERVER — CI SQL endpoint
FABRIC_DATABASE — DEV database
FABRIC_CI_DATABASE — CI database
FABRIC_TENANT_ID — Azure AD tenant
FABRIC_PROD_SERVER — PROD endpoint
FABRIC_DATA_WORKER_CLIENT_ID — SPN ID
FABRIC_DATA_WORKER_CLIENT_SECRET — SPN secret
Key Operational Files
| File | Purpose |
dbt/profiles.yml | All target definitions (local, duckdb, ci, dev, prod) |
dbt/dbt_project.yml | Project config, materialization rules, on-run-start hook |
dbt/macros/load_parquet_sources.sql | On-run-start: creates DuckDB tables from Parquet |
docs/dialect-compatibility-log.md | SQL pattern reference (DuckDB ↔ T-SQL) |
docs/runbook.md | Operational procedures & setup checklists |
docs/parallel_run_config.yml | Per-table comparison rules for parallel run |
docs/semantic_model_dependencies.yml | Gold columns referenced by semantic models |
scripts/setup-local.sh | Bootstrap local dev environment |
scripts/dev-build.sh | Local build + selective/Parquet export |
scripts/export_dev_data.py | Export Bronze tables from Fabric as Parquet |
Operational Rules & Guardrails
Python Version
Must be 3.11–3.12. Python 3.13 breaks dbt-fabric silently. setup-local.sh rejects 3.13+ automatically.
--profiles-dir .
Required on EVERY dbt command. profiles.yml is in the dbt/ project dir, not ~/.dbt/. Missing this = "profile not found."
packages.yml
Only for dbt Hub packages (dbt_utils). Never add pip packages here. dbt-fabric and dbt-duckdb go in requirements.txt / pip install.
PR Pipeline Trigger
PR-only pipelines MUST have trigger: none at top. Without it, default CI trigger runs on every push to every branch.
WARN=10 Baseline
10 test warnings are expected (pre-existing Bronze data quality). Non-blocking. Should not increase without investigation.
Story Completion Gate
A story is NOT done until dbt build --target local passes all tests. "Compiles" ≠ "tests pass."
Parallel Run = Safety Net
Old and new systems run side-by-side for 2 weeks post-cutover. 1-hour rollback to legacy if issues found.
Side-by-Side Only
Never modify existing production system. All new work in new workspaces. Cutover = switch users to new workspace.

dbt Model Structure — What Gets Built
Model Layers
staging/ (views)
1:1 with Bronze tables
Rename, cast, type normalize
Zero CU · cross-db macros
stg_dataverse__*
stg_ax__*
stg_bd__*
intermediate/ (views)
Business logic · aggregations
AX 2012 + D365 merge unions
Domain folders: trading/ finance/ logistics/
int_allocations_*
int_exchange_rates_*
int_trade_*
marts/ (materialized tables)
Gold output · schema contracts
dim_* dimensions + fact_* facts
Business descriptions · dbt tests
dim_batch
dim_logistics
fact_trade
fact_inventory_snapshot
fact_sca_allocations
+60 more
Bronze Sources (~70 tables)
| Source | Database | Tables |
| dataverse | Lakehouse_Bronze | 49 |
| datacollect | Lakehouse_Bronze | 13 |
| bd | Lakehouse_Bronze | 3 |
| ax | Lakehouse_Bronze_AX | 2 |
| ax_bronze | Lakehouse_Bronze | 2 |
Seeds (Reference Data)
param_direction_split.csv
param_inventory_type.csv
param_value_unit.csv
CI Validation Scripts
Fabric Datalake Platform · Operations Guide