Emergency Procedures
This runbook covers recovery procedures for the most common operational failures. The platform is designed around three self-healing layers: Terraform for infrastructure, git sync for content, and dbt for data. Most failures can be resolved by re-running the appropriate layer.
SPN Credential Rotation
Service principal secrets expire after 1 year. Rotate before expiry to avoid CI/CD pipeline failures. See SPN Access Map — Credential Rotation for the full step-by-step procedure covering both sp-fabric-data-worker and sp-fabric-platform-admin.
Pipeline Failure Triage
Quick Diagnosis
| Symptom | Likely Cause | Action |
|---|---|---|
| Auth error (401/403) | SPN secret expired or permissions changed | Check secret expiry, verify service connection |
| Terraform error | State drift or API change | Run terraform plan locally to inspect |
| fabric-cicd error | GUID mismatch or TMDL syntax | Check parameter files, validate TMDL locally |
| dbt error | SQL syntax or source data issue | Run dbt build --target local first |
Pipeline Chain Recovery
The pipeline chain runs sequentially: infra-deploy > fabric-deploy + security-deploy + functions-deploy > dbt-dev-build. If a pipeline fails mid-chain:
- Fix the root cause (do not re-trigger blindly)
- Push the fix to the appropriate branch — the chain restarts automatically
- Do not manually trigger Azure DevOps pipelines
All shared-resource pipelines use lockBehavior: sequential. Manually triggered runs can cause queue conflicts.
dbt Build Failure Recovery
Local Diagnosis First
Always reproduce and fix locally before pushing:
cd dbt
dbt build --target local --profiles-dir .
Common dbt Failures
| Error | Cause | Fix |
|---|---|---|
15151 Cannot find the schema | Schema doesn't exist in Warehouse | Check dbt custom schema generation; may need CREATE SCHEMA |
| Case-sensitive column error | Fabric is case-sensitive for quoted identifiers | Use bracket notation [Column] or normalizing CTE |
varchar(30) truncation | Bare cast(x as varchar) defaults to 30 chars | Always specify explicit length: varchar(500) |
| Contract mismatch | Column name in SELECT doesn't match contract | Ensure aliases match contract name: field exactly |
| Source not found | Bronze shortcut missing or renamed | Verify shortcuts via git sync, check sources.yml |
Full Rebuild
If Gold data is corrupted or missing:
cd dbt
dbt build --target <env> --profiles-dir . --full-refresh
This drops and recreates all tables. Use only as a last resort — normal incremental builds are preferred.
Workspace Access Emergency
If a workspace becomes inaccessible (permissions removed, workspace deleted):
Workspace Deleted
- Run
terraform apply— Terraform recreates the workspace and all child resources - Re-sync content from git (portal: Source control > Update all, or
python scripts/fabric_git_sync.py --env ENV) - Rebuild Gold data:
dbt build --target ENV
Permissions Lost
- Check Terraform state:
terraform plan -var-file="environments/ENV/terraform.tfvars" - If roles are missing,
terraform applyrestores them from tfvars - For emergency access, use the Azure Portal to manually add
geris_fabric_admin@geris.nlas Admin
Data Refresh Failure Recovery
Symptoms
- Reports show stale data (check
last_refreshtimestamps in Fabric portal) - dbt build succeeded but semantic model shows old data
DirectLake Models
DirectLake models auto-refresh from the warehouse — no action needed after a successful dbt build. If stale:
- Verify the dbt build actually completed (check pipeline logs)
- Check if the semantic model fell back to Import mode (Fabric portal > Model settings)
- If in Import fallback, check for missing columns or table schema mismatches
Import Models
Import models require manual refresh or a scheduled refresh owned by geris_fabric_admin@geris.nl:
- Log in to Fabric portal as
geris_fabric_admin@geris.nl - Navigate to the semantic model > Settings > Scheduled refresh
- Verify credentials are valid (Take over if needed)
- Trigger a manual refresh
Monitoring and Alerting
Application Insights (Function App)
The Function App logs to Application Insights. Key queries:
-- Recent function executions
traces
| where message contains 'Executed' and message contains 'Succeeded'
| project timestamp, message
| order by timestamp desc
-- Export failures
traces
| where message contains 'export_failed'
| project timestamp, message
| order by timestamp desc
If the requests table is empty, check host.json — the "Host.Results": "Error" setting suppresses successful request telemetry. Change to "Information" to see all invocations.
Pipeline Monitoring
Pipeline run status is visible in Azure DevOps (org: geris-devops, project: insights-requests). dbt does NOT produce JUnit XML — do not add PublishTestResults@2 to dbt pipelines. Use run_results.json for programmatic inspection.
CU Utilization
Currently blocked — the Fabric Admin API requires Capacity Admin role, which cannot be granted on trial capacity. Will auto-activate when paid capacity (F2+) is provisioned. No code changes needed.
Self-Healing Recovery Matrix
| What Failed | Recovery Tool | Command |
|---|---|---|
| Workspace deleted | Terraform | terraform apply |
| Gold Warehouse deleted | Terraform + dbt | terraform apply, then dbt build |
| Lakehouse deleted | Terraform + git sync | terraform apply, then sync from git |
| Shortcuts missing | Git sync | python scripts/fabric_git_sync.py --env ENV |
| Semantic model deleted | Git sync | python scripts/fabric_git_sync.py --env ENV |
| Report deleted | Git sync | Re-sync from Git in Fabric portal |
| Gold data missing | dbt | dbt build --target ENV |
| RBAC wrong | Terraform | terraform apply |
| Everything destroyed | Full sequence | terraform apply > git sync > dbt build |
Related Pages
- Operations Overview — operational model and escalation
- Troubleshooting — platform gotchas
- Go-Live Checklist — cutover procedures
- Script Reference — utility scripts