Development Documentation
View as:

Local dbt Development Workflow

This page covers how to iterate on dbt transformations (Bronze → Silver → Gold) locally against DuckDB, how to inspect results with DBeaver, and how to run models selectively for fast feedback loops.

Use this workflow for any dbt change — new mart, new column, refactor, bug fix.

Layer Map

The project has three logical layers, each mapped to a DuckDB schema:

LayerDuckDB SchemaFolderExample
Bronze (sources)mainloaded from dev-data/*.parquetmain.inventdim, main.marex
Silver (staging + intermediate)prepdbt/models/staging/ + dbt/models/intermediate/prep.stg_dataverse__inventdim, prep.int_batch_built
Gold (marts)dbodbt/models/marts/dbo.dim_item, dbo.fact_trade

Every dbt model materialises into DuckDB as a table or view in one of these schemas. You can query each layer directly to inspect transformations step by step.

Prerequisites

Before starting, complete the Local Development Setup (Python 3.12, .venv, setup-local.sh).

Then install DBeaver Community (download) — a free SQL IDE that speaks DuckDB natively.

Connect DBeaver to the DuckDB Database

The DuckDB file is written to dev-data/fabric_datalake.duckdb whenever you run dbt build --target local.

  1. Open DBeaver
  2. Database → New Database Connection
  3. Search for DuckDB → select → Next
  4. Set Path to:
    C:\Users\<you>\Downloads\monorepo fabric\fabric_monorepo\dev-data\fabric_datalake.duckdb
    
  5. Click Test Connection — download the driver if prompted
  6. Finish

Expand the connection in the left panel. You'll see schemas main, prep, and dbo after your first build.

Important: DuckDB locks the file when dbt is running. Either close the DBeaver connection during dbt build, or connect in read-only mode via the connection settings.

Local-first loop (full profile)

When using --profile full, the canonical iteration loop is three commands:

StepCommandWhat it does
1fabric dev refresh-bronze-localPull Bronze → dev-data/*.parquet (skips unchanged tables). Downloads run in parallel — default 6 tables concurrently.
2fabric dev build-gold-localRun dbt build --target local in DuckDB — zero Warehouse CU.
3fabric dev push-gold --feature <name>Stage Parquet into the feat Bronze Lakehouse and COPY INTO the feat Gold Warehouse.

Use fabric dev refresh-gold --feature <name> to run steps 2 + 3 together.

For model-only iteration (no push to Warehouse needed), just run step 2 repeatedly. The DuckDB file at dev-data/fabric_datalake.duckdb is updated after every build.

Tuning bronze download concurrency

refresh-bronze-local runs 6 downloads in parallel by default. OneLake reads are network-bound, so parallelism gives a ~4–6× wall-clock speedup over the previous serial implementation. You can override with --parallel N:

python scripts/download_bronze_to_local.py --source dev-bronze --parallel 8   # faster
python scripts/download_bronze_to_local.py --source dev-bronze --parallel 1   # serial, for debugging

Raising beyond ~8 usually doesn't help further and risks Fabric throttling.

Warehouse staging for push-gold

push-gold writes Parquet to the feat Bronze Lakehouse (Files/_refresh_staging/) and then runs COPY INTO on the feat Gold Warehouse pointed at that path. Feature-branch Gold workspaces only contain a Warehouse item (no Lakehouse), so staging is resolved via Fabric REST against the Bronze workspace rather than being read from the deployment yml. This requires the platform SPN (Contributor on both workspaces) to be loaded; fabric dev start takes care of that via scripts/_lib_spn.sh.

The Development Loop

1. Find the model to edit

ls dbt/models/marts/            # Gold dims/facts
ls dbt/models/staging/          # Silver — Bronze → Silver
ls dbt/models/intermediate/     # Silver — Silver → Silver

2. Edit the SQL file

Open the relevant file in VS Code and make your changes. For example, dbt/models/marts/dim_item.sql.

3. Rebuild the model

From the dbt/ directory with .venv activated:

# Activate venv once per terminal session
source .venv/Scripts/activate
cd dbt

# Build only your model
dbt run --target local --profiles-dir . --select dim_item

# Build your model + everything downstream (anything that depends on it)
dbt run --target local --profiles-dir . --select dim_item+

# Build your model + everything upstream (the full lineage leading to it)
dbt run --target local --profiles-dir . --select +dim_item

# Full lineage (both directions)
dbt run --target local --profiles-dir . --select +dim_item+

Each rebuild takes ~1–3 seconds for a typical mart.

4. Inspect results in DBeaver

Refresh the schema tree (F5) and query your model:

SELECT * FROM dbo.dim_item LIMIT 10;
SELECT COUNT(*) FROM dbo.dim_item;

For debugging, you can join across layers:

-- See how many Gold rows came from each Bronze source row
SELECT
    b.item_id,
    COUNT(g.item_id) AS gold_rows
FROM main.inventtable b
LEFT JOIN dbo.dim_item g USING (item_id)
GROUP BY b.item_id;

5. Iterate

Edit → rebuild → refresh DBeaver → query. Repeat until the output matches what you want.

Useful Selectors

SelectorEffect
--select dim_itemJust that model
--select dim_item+Model + downstream
--select +dim_itemModel + upstream
--select +dim_item+Full lineage
--select martsEverything in marts/ folder
--select staging.dataverseEverything in staging/dataverse/
--select tag:requires_sharepointEverything tagged requires_sharepoint
--select source:dataverse+All models downstream of Dataverse sources
--select dim_item fact_tradeMultiple specific models

Viewing the Lineage Graph

dbt ships with an interactive docs site that renders the full model DAG (sources → staging → intermediate → marts). Use it whenever you need to see how a model connects to the rest of the project.

Start it

From the dbt/ directory with .venv activated:

# Build the manifest + catalog (queries DuckDB for column stats)
dbt docs generate --target local --profiles-dir .

# Or skip the warehouse queries — faster, lineage-only, avoids file locks
dbt docs generate --target local --profiles-dir . --empty-catalog

# Serve it
dbt docs serve --profiles-dir .

dbt docs serve starts a local web server on http://localhost:8080 and stays running until you Ctrl+C. First generate takes ~30–90 seconds; subsequent runs are faster. serve is instant.

If generate fails with "Cannot open file ... being used by another process": DBeaver or another dbt run has the DuckDB file open. Either close it, or re-run with --empty-catalog (lineage graph still works, column stats are skipped).

View the full lineage

  1. Open http://localhost:8080.

  2. Click the blue circle in the bottom-right corner — this opens the full-screen lineage graph.

  3. In the selector box at the bottom, use dbt node selectors to scope the view:

    SelectorShows
    (empty)Entire project DAG
    +dim_item+Full lineage of dim_item (upstream and downstream)
    +dim_itemJust upstream (ancestors)
    dim_item+Just downstream (descendants)
    +source:dataverse.accounts+Everything touching a specific source
    martsAll Gold marts
    tag:requires_sharepoint+Everything downstream of a tag
  4. Click Update Graph.

Export the graph

Right-click the graph → Save image as…, or use the PNG/SVG export button in the top-right of the graph view. Useful for PR descriptions and architecture reviews.

Running Tests

dbt tests validate column contracts (not null, unique, relationships). Run them after a rebuild:

dbt test --target local --profiles-dir . --select dim_item

Or run build + test in one go:

dbt build --target local --profiles-dir . --select dim_item+

dbt build is run + test combined, in dependency order.

Full Local Validation Before PR

Before pushing, run the full pipeline to make sure nothing upstream broke:

cd dbt && dbt build --target local --profiles-dir .

Expected result: ~440 PASS, ~17 WARN, 0–2 ERROR, ~15 SKIP on a fresh DuckDB. Warnings are pre-existing data-quality signals on upstream data, not real failures. build_gold_local.py matches the DevOps CI contract: only failing models / seeds / snapshots roll back provisioning — failing tests are surfaced loudly but don't block, matching what the DevOps dbt-dev-build pipeline does. If you see new model-level errors, investigate before pushing.

Performance notes for local builds

Profile local has two optimisations applied in dbt/profiles.yml:

  • threads: 8 — dbt worker parallelism (multiple models in flight at once).
  • settings.preserve_insertion_order: false — unlocks DuckDB's parallel hash aggregation, typically 2–4× on heavy GROUP BYs and window queries (fact_inventory_snapshot, dim_logistics).

Do NOT add settings.temp_directory here — dbt-duckdb re-issues SET temp_directory='...' on every new connection, which DuckDB rejects once spill has started (Cannot switch temporary directory after use). To ensure DuckDB's default <db>.tmp spill path resolves correctly on Windows, scripts/build_gold_local.py exports an absolute DBT_DUCKDB_PATH to the dbt subprocess.

fact_inventory_snapshot — monolith on Fabric, parquet sidecar on local

fact_inventory_snapshot uses a single inlined-CTE body on Fabric targets (dev, uat, prod, feat-*). The four separate int_inventory_* intermediate tables that briefly existed after PR #73 have been removed (perf/inventory-flatten-fabric).

The split was reverted because each intermediate was materialized='table', meaning Fabric wrote a full table and read it back for every step in an almost entirely serial chain — making the Fabric dev build significantly slower than the monolith. The inlined approach runs as one statement and eliminates the intermediate I/O cost on all targets.

What about window functions for running totals? Two attempts to rewrite Financial RT with SUM() OVER + LEAD() interval-match (partition by the 14-dim tuple AND partition by a hash(dims) combo_id) both spilled ≥60 GB on this dataset and were slower than the simple range-join. DuckDB's piecewise-merge join + hash aggregate plan for the range-join is the fastest shape here. Do not re-attempt this optimisation.

cloud_only tag — local reads pre-built parquet

On the local / duckdb target fact_inventory_snapshot does not recompute the running totals. The model is tagged cloud_only, and its body switches on target.name:

{{ config(tags=['cloud_only']) }}
{%- if target.name in ('local', 'duckdb') -%}
    select * from read_parquet('{{ var("parquet_dir", "../dev-data") }}/{{ this.name }}.parquet')
{%- else -%}
    -- full SQL, unchanged
{%- endif -%}

The parquet sidecar at dev-data/fact_inventory_snapshot.parquet is populated by scripts/sync_cloud_parquets.py, which runs automatically as step 6c.1b of fabric dev start (between refresh-bronze-local and build-gold-local) for any profile that builds Gold locally. It pulls from the Fabric Gold Warehouse of the selected source env (dev by default, or uat when --seed-strategy uat is passed).

To refresh the sidecar manually: fabric dev sync-cloud-parquets [--source dev|uat|prod].

To add another cloud-only model: add {{ config(tags=['cloud_only']) }} and wrap the body in the same {% if target.name == 'local' %} … {% else %} … {% endif %} switch. The sync script enumerates the tag via dbt ls — no script changes needed.

Rationale: full-history running totals over 14 dims produce a working set around 100 GB on a laptop; on Fabric's distributed MPP they land in ~4–5 min. Local dev should work with the already-computed snapshot rather than try to recompute it.

VS Code / DBeaver file locks

Any tool that keeps an open DuckDB connection — DBeaver, a "Power User for dbt" lineage pane, a duckdb CLI — holds the file lock and blocks dbt build and build_gold_local.py with errors like:

IO Error: Cannot open file "...fabric_datalake.duckdb":
The process cannot access the file because it is being used by another process.

Close the viewer (or connect in read-only mode) before running a dbt build.

Source Data Changes

If you edit the stub column definitions in dbt/macros/load_parquet_sources.sql (adding a new column to a Bronze stub), the existing DuckDB tables won't pick up the change until a full refresh:

dbt build --target local --profiles-dir . --full-refresh

This drops and recreates all tables. Takes ~35 seconds for the full pipeline.

Querying the __filepath__ Column

Sources loaded from Parquet include a virtual __filepath__ column (matching Fabric's Lakehouse behaviour). This is useful for extracting file dates or debugging ingestion:

-- See what files loaded into a source
SELECT DISTINCT "__filepath__" FROM main.inventdim;

-- Count rows per source file
SELECT "__filepath__", COUNT(*) FROM main.marex GROUP BY "__filepath__";

Common Workflow: Add a New Measure

  1. Edit dbt/models/marts/fact_trade.sql — add your calculation as a new column
  2. Run: dbt run --target local --profiles-dir . --select fact_trade
  3. In DBeaver: SELECT trade_id, new_measure FROM dbo.fact_trade WHERE new_measure IS NOT NULL LIMIT 20
  4. Verify the values look right
  5. If downstream semantic models reference it, also rebuild those: dbt run --select fact_trade+
  6. Run tests: dbt test --select fact_trade
  7. Commit → push → PR

Common Workflow: Trace a Bad Value

If a Gold mart has an unexpected value, trace it backwards:

  1. Query the Gold mart to find the bad row:
    SELECT * FROM dbo.fact_trade WHERE orderline_id = 'XYZ';
    
  2. Query the Silver layer that feeds it:
    SELECT * FROM prep.int_trade_departure_dates WHERE orderline_id = 'XYZ';
    
  3. Query the Bronze source:
    SELECT * FROM main.salesline WHERE recid = ...;
    
  4. At each layer, find where the value diverges from expectation.
  5. Fix the model at that layer, rebuild with --select <model>+, verify.

Troubleshooting

DBeaver shows no tables after connecting: Wrong file. The DuckDB file is at dev-data/fabric_datalake.duckdb, not dbt/fabric_datalake.duckdb.

"Database is locked" in DBeaver: dbt is running. Wait for it to finish or close the DBeaver connection and reopen.

dbt: command not found: Activate the venv: source .venv/Scripts/activate.

Schema main is empty but build succeeded: You're looking at a different .duckdb file. DBeaver may have opened memory or a stale file. Disconnect and reconnect to dev-data/fabric_datalake.duckdb.

dbt docs generate fails with "file ... being used by another process": The DuckDB file is locked — DBeaver is connected, or another dbt process is still running. Close the DBeaver connection (or switch it to read-only), kill any stray python.exe holding the file, then retry. For lineage-only use you can bypass the lock entirely with --empty-catalog.

A model compiled but produced no rows: The source Parquet file is missing. Check dev-data/ for the .parquet file matching the source name. If missing, run bash scripts/setup-local.sh to regenerate stubs, or export real data via scripts/export-dev-data.sh.

Related Pages