Development Documentation
View as:

Technology Stack

This page documents every major technology in the platform, why it was chosen, and the constraints you need to know when working with it.

Core Technologies

TechnologyVersion / ConstraintPurposeAuth Method
dbt CoreLatest (pip)Data transformation framework. Dual-target: DuckDB local, Fabric Warehouse remote.CLI (az login)
Python3.10 -- 3.12 onlyAll scripts, Azure Functions, dbt adapters. 3.13 is incompatible with dbt-fabric.--
Terraform>= 1.8 (with Fabric provider)Infrastructure provisioning: workspaces, warehouses, lakehouses, shortcuts, role assignments.CLI (FABRIC_USE_CLI=true)
Microsoft FabricCloud serviceLakehouse (Bronze), Warehouse (Gold), Semantic Models (DirectLake), Power BI Reports, OneLake.SPN + CLI
Azure DevOpsCloud serviceGit hosting, CI/CD pipelines, service connections. Org: geris-devops, project: insights-requests.Service connection
Azure FunctionsPython runtime (Linux)Data ingestion (Datacollect, broker tables), observability (pipeline metrics, CU monitoring), exports.Managed Identity
Azure Key Vaultkv-fabric-dbt-keysSingle vault for all environment secrets. SPN credentials and connection strings.SPN Get + List
SWA Managed Functions (Node 20 + TypeScript)v4 programming model (@azure/functions)Cloud-side portal API: reads feature envs, triggers actions. Authenticates to Fabric via platform-admin SPN read from Key Vault via system-assigned MI.System-assigned MI → Key Vault → MSAL

Key Libraries

LibraryInstall MethodPurpose
dbt-fabricpipFabric Warehouse adapter for dbt. Uses CLI auth (not SPN -- ODBC timeouts in CI).
dbt-duckdbpipLocal development adapter. Fast iteration without Fabric connection.
dbt_utilspackages.yml (dbt Hub)Utility macros. The ONLY dbt Hub package -- do not add pip packages here.
fabric-cicdpipDeploys semantic models and reports from git to Fabric with parameter substitution.
pyodbcpipDirect Fabric Warehouse connections. Uses access token auth (attrs_before=\{1256: token_struct\}).
azure-identitypipDefaultAzureCredential for Azure Functions, AzureCliCredential for local scripts.
azure-keyvault-secretspipKey Vault secret retrieval in deployment scripts.

Key Architectural Decisions

Why dbt (not Azure Data Factory)?

dbt provides version-controlled SQL transformations with built-in testing, documentation, and lineage tracking. ADF would require managing JSON pipeline definitions with limited testability and no local development story. With dbt, developers iterate on model logic in seconds against DuckDB, run the full test suite locally, and only deploy to Fabric when ready.

Why DuckDB for Local Development (not Docker Fabric)?

There is no Docker image for Microsoft Fabric Warehouse. The alternative -- always developing online against the DEV Fabric Warehouse -- was rejected because it is slow (minutes per build vs seconds), blocks on network availability, and consumes shared Fabric CU capacity.

DuckDB provides a fast, zero-dependency local environment. The cost is maintaining dual-dialect SQL:

FeatureDuckDBFabric Warehouse (T-SQL)
Case sensitivityCase-insensitiveCase-sensitive for quoted identifiers
datetime2Use \{\{ cast_timestamp() \}\} macroRequires explicit datetime2(6)
bit (boolean)Use \{\{ cast_boolean() \}\} macroResolves to bit
lpad()SupportedNot available -- use right('00' || cast(...), n)
Recursive CTEsWITH RECURSIVE requiredWITH (implicit recursion)
Bracket identifiersNot supported[column] works
varchar defaultUnlimitedvarchar(30) -- silently truncates! Always specify length.

Why Terraform (not Bicep)?

Terraform has a first-party Microsoft Fabric provider that supports workspaces, warehouses, lakehouses, git connections, and role assignments. Bicep has no Fabric resource types -- it only covers ARM resources. Since the platform's infrastructure is primarily Fabric resources (not ARM), Terraform is the natural fit.

Why CLI Auth (not SPN in CI)?

ODBC Driver 18's ActiveDirectoryServicePrincipal authentication consistently times out on Azure DevOps Ubuntu hosted agents due to a libmsal library issue. The az account get-access-token workaround is reliable on the same agents. This means:

  • All dbt profiles.yml targets use authentication: CLI
  • CI pipeline dbt steps must run inside AzureCLI@2 tasks
  • Local development uses az login sessions
  • No SPN environment variables are needed locally

Why Cherry-Pick Promotion (not Branch Merges)?

Changes flow from main to release/uat to release/prod via cherry-pick PRs, not full merges. This gives granular control: a critical bug fix can be promoted to PROD immediately without carrying along an unfinished feature. UAT requires 1 reviewer; PROD requires 2 reviewers plus an approval gate.

MCP Servers (AI Tool Integration)

The project includes 9 MCP servers configured in .mcp.json at the repo root, providing AI tools with direct access to platform knowledge and operations.

Documentation and Knowledge Servers

ServerTransportAuthCapabilities
Microsoft Learn (microsoft-learn)Remote HTTPNoneSemantic search across all Microsoft docs, code samples
Fabric Pro-Dev (fabric-prodev)npx (stdio)NoneFull Fabric OpenAPI specs, JSON schemas, best practices. Knowledge only -- no live Fabric connection.
Terraform (terraform)Docker (stdio)NoneLive Terraform Registry: provider docs, module search, config validation

Data Platform Operations Servers

ServerTransportAuthCapabilities
Power BI Modeling (powerbi-modeling)npx (stdio)Browser loginTMDL import/export, DAX queries, measures, relationships, calculation groups
Fabric Ops (fabric-ops)uvx (stdio)az loginRead-only operational intel: workspace listing, lakehouse schemas, lineage, CU usage
DuckDB (duckdb)uvx (stdio)NoneSQL queries against local ./dbt/fabric_datalake.duckdb
dbt Core (dbt-core)uvx (stdio)NoneLineage, impact analysis, column-level tracing, SQL execution with ref()/source()

Infrastructure and DevOps Servers

ServerTransportAuthCapabilities
Azure DevOps (azure-devops)npx (stdio)Browser loginWork items, pipelines, builds, PRs, repos, wiki for geris-devops org
Azure (azure)npx (stdio)az login276 tools across 57 Azure services including Key Vault, Storage, ARM

What We Deliberately Do NOT Use

These are not oversights -- they are conscious decisions with specific reasoning.

TechnologyWhy Not
dbt-fabric / dbt-duckdb in packages.ymlThey are pip packages, not dbt Hub packages. Adding them to packages.yml breaks dbt deps.
ODBC SPN auth in CIActiveDirectoryServicePrincipal times out on Azure DevOps Ubuntu agents. CLI auth is reliable.
Per-environment Key VaultsUnnecessary complexity for this project's scale. One kv-fabric-dbt-keys for all environments.
Azure DevOps variable groupsAll config lives in deployment/ENV.yml -- single source of truth, version-controlled, auditable.
PySpark notebooksLegacy datalake/ notebooks are reference-only. All new transforms are dbt SQL.
ADF / Synapse pipelinesdbt provides better testability, version control, and local development.
BicepNo Fabric resource types. Terraform's Fabric provider covers the full infrastructure.
Docker for local devNo Fabric Warehouse Docker image exists. DuckDB is faster and simpler.

Version Constraints

These constraints have caused production issues in the past and must be respected:

ConstraintDetailConsequence of Violation
Python 3.10 -- 3.12dbt-fabric is incompatible with 3.13pip install dbt-fabric fails on 3.13
datetime2(6) explicit precisionFabric Warehouse requires the (6) suffixBare datetime2 fails with error 24597
No lpad() in T-SQLNot available in Fabric SQLUse right('00' || cast(value as varchar), n)
varchar needs explicit lengthcast(x as varchar) defaults to varchar(30) in T-SQLSilent data truncation, collapsed rows after GROUP BY
Case-sensitive quoted identifiersFabric Warehouse is case-sensitive; DuckDB is notWorks locally, fails in CI/DEV. Invisible bug.
WITH RECURSIVE vs WITHDuckDB requires RECURSIVE keyword; T-SQL does notUse Jinja target checks for recursive CTEs