DataGym.io dbt cheat sheet

dbt cheat sheet: the interactive dbt reference, updated for the AI era

Commands

Not the man page. The verbs and flags that actually earn a spot in muscle memory.

Core verbs: your daily drivers

dbt build
Run, test, seed and snapshot every selected resource in DAG order, the one command most teams run.
dbt run
Materialize models (view / table / incremental …) in the warehouse.
dbt test
Execute data tests and unit tests defined in the project.
dbt seed
Load CSV files from seeds/ into the warehouse.
dbt snapshot
Capture slowly-changing-dimension history for sources.

Mnemonic: build = run + test + seed + snapshot, all interleaved in dependency order.

Worth remembering

dbt list
Show which resources a selector matches (no run).
dbt show
Preview a model’s rows after transformation.
dbt retry
Re-run the last invocation from the point of failure.
dbt clone
Copy selected nodes from a state into the target schema(s).
dbt compile
Render SQL without executing. Inspect the Jinja.
dbt source freshness
Check whether sources are within their declared freshness SLA.
dbt parse
Parse the project and emit timing, useful in CI.

Setup & utilities

dbt init
Scaffold a new project and set up your profiles.yml connection.
dbt deps
Install the packages declared in packages.yml / dependencies.yml.
dbt debug
Check the install, the active profile and the warehouse connection.
dbt clean
Delete the folders in clean-targets: target/, dbt_packages/.
dbt docs generate
Build the documentation site and the catalog.json artifact.
dbt docs serve
Serve the generated docs site locally in your browser.
dbt run-operation
Invoke a macro directly: dbt run-operation my_macro --args '{k: v}'.

Flags worth remembering

--select / -s
Choose which resources to act on. See Node selection.
--exclude
Remove resources from the --select set; identical syntax.
--target / -t
Pick a target from profiles.yml: -t prod.
--defer
Resolve unselected refs from another state instead of rebuilding them.
--state
Path to a manifest.json, required by state-aware selectors and --defer.
--full-refresh
Rebuild incremental models and seeds from scratch.
--vars
Pass project vars: --vars '{"start_date": "2024-01-01"}'.
--threads
Override the thread count from your profile.
--store-failures
Persist failing test rows to a table for debugging.
--empty
Build schema-only (zero-row) models: a fast structural check for CI.
--sql 1.12+
On dbt run-operation, execute ad-hoc SQL directly. No macro file needed.

How dbt build runs

dbt build runs each model, then its tests, in dependency order. Step through it, widen each tick with --threads, and fail a test to see what gets skipped.

  • running
  • passed
  • failed
  • skipped

Trivial / one-time flags: --help, --version, --debug, --log-level, --profiles-dir, --project-dir.

Essentials

Materializations, tests and packages: the building blocks every dbt project leans on.

Materializations

Set with {{ config(materialized='…') }} or in dbt_project.yml.

view
Default. A CREATE VIEW: no storage, always fresh, recomputed on every read.
table
CREATE TABLE AS: fully rebuilt from scratch each run.
incremental
Built once, then each run only inserts / merges new or changed rows.
ephemeral
Not built at all. Inlined as a CTE into the models that ref() it.
materialized_view
A warehouse-managed materialized view: dbt owns the definition, the warehouse keeps the data refreshed.

Tests

generic
Reusable data tests declared in YAML: unique, not_null, accepted_values, relationships.
singular
A one-off .sql file in tests/: any query that returns failing rows.
unit 1.8+
Test model logic against mock inputs and expected output, declared with unit_tests:.

dbt build runs each model’s tests right after it; failures stop dependents. Add --store-failures to keep the offending rows.

Packages worth installing

Add to packages.yml, then run dbt deps. Browse the rest on the dbt Package hub.

dbt_utils
The standard toolbox: cross-database SQL, generic tests and helper macros.
dbt_project_evaluator
Audits your project against dbt's best-practice rules.
dbt_expectations
Great Expectations-style data quality tests.
audit_helper
Compare two relations to validate a refactor or migration.
elementary
Anomaly-detection tests and data observability reports.
dbt_artifacts
Models your run results and metadata for monitoring.

Incremental models

The materialization that transforms only new rows, and the one most likely to bite you. Strategies, microbatch, and schema changes.

Incremental strategies

Pick with config(materialized='incremental', incremental_strategy='…').

merge
Upsert on unique_key: updates existing rows, inserts new ones.
append
Insert new rows only. Fastest, but no de-duplication.
delete+insert
Delete rows matching the new batch, then insert: a merge without native MERGE.
insert_overwrite
Replace whole partitions (by partition_by), not rows. No unique_key; warehouse-specific (BigQuery, Spark, Snowflake).
microbatch 1.9+
Split the load into time-based batches by event_time: robust, retryable backfills.

is_incremental() gates a where filter; the incremental_strategy decides how the rows it lets through land in the table. Flip the switches to see what each run writes to fct_orders.

select * from {{ source('shop', 'orders') }}
{% if is_incremental() %}
  where updated_at > (select max(updated_at) from {{ this }})
{% endif %}

dbt run -s fct_orders

fct_orders: before

rows this run processes

fct_orders: after

is_incremental() is true only when the table already exists and you didn’t pass --full-refresh.

Microbatch

A strategy for large event tables: one run, many small dated queries.

Microbatch is for big time-series tables. Instead of one query with an is_incremental() filter, dbt splits the run into one bounded query per time period, here per day. Each batch is independent, so dbt can run them in parallel and retry just the ones that fail.

{{ config(
    materialized='incremental',
    incremental_strategy='microbatch',
    event_time='session_start',
    batch_size='day',
    begin='2024-01-01',
) }}
select * from {{ ref('page_views') }}
Jun 9
Jun 10
Jun 11
Jun 12
Jun 13
Jun 14
Jun 15today
  • already loaded
  • this run's batches
  • failed batch

When a model’s columns change

An incremental run doesn’t rebuild the table, so a changed column list needs handling. The on_schema_change config decides what happens:

on_schema_change You add a column You remove a column
ignore Default. The new column is silently dropped: it never reaches the table. dbt run fails.
fail The run fails with a schema-change error. The run fails with a schema-change error.
append_new_columns The column is added to the table. The column is kept; it is not dropped.
sync_all_columns The column is added to the table. The column is dropped from the table (data-type changes included).

No mode backfills a new column for old rows: they stay NULL until a --full-refresh.

Common patterns

Copy-pasteable recipes for the things you do every week.

  • Slim CI: build only what changed since prod dbt build -s state:modified+ --defer --state path/to/prod
  • Production build, skipping unit tests and unmodified views dbt build --exclude "test_type:unit state:unmodified,config.materialized:view" --state path/to/prod
  • Resume after a failure (pick up what failed / skipped) dbt retry
  • Rebuild incremental models from scratch dbt run --full-refresh -s config.materialized:incremental
  • Test only what you just touched dbt test -s state:modified+ --state path/to/prod
  • Preview what a selector would match (no run) dbt list -s "tag:nightly,config.materialized:table"
  • Refresh only what has newer source data dbt source freshness && dbt build -s source_status:fresher+ --state path/
  • Let Claude or Cursor operate your project safely (see dbt + AI) uvx dbt-mcp

Node selection

The most-Googled corner of dbt. Type a selector below and watch it light up the DAG.

matched not matched source

Specifying resources

The --select flag accepts one or more args. Each arg can be one of:

  • A package’s name
  • A model name
  • A fully-qualified path to a directory of models
  • A selection method (path:, tag:, config:, test_type:, test_name: etc)

Examples:

  • $ dbt run --select my_dbt_project_nameall models in your project
  • $ dbt run --select my_dbt_modela specific model
  • $ dbt run --select path.to.my.modelsall models in a specific directory
  • $ dbt run --select my_package.some_modela specific model in a specific package
  • $ dbt run --select tag:nightlymodels with the “nightly” tag
  • $ dbt run --select path/to/modelsmodels contained in path/to/models
  • $ dbt run --select path/to/my_model.sqla specific model by its path

Graph operators

Plus operator (+)

  • $ dbt run --select my_model+select my_model and all children
  • $ dbt run --select +my_modelselect my_model and all parents
  • $ dbt run --select +my_model+select my_model, and all of its parents and children

N-plus operator

  • $ dbt run --select my_model+1select my_model and its first-degree children
  • $ dbt run --select 2+my_modelselect my_model, its first-degree parents, and its second-degree parents (“grandparents”)
  • $ dbt run --select 3+my_model+4select my_model, its parents up to the 3rd degree, and its children down to the 4th degree

At operator (@)

  • $ dbt run -s @my_modelselect my_model, all its descendants, and the ancestors of those descendants

Wildcards (*, ?, [abc]) work inside selector strings. They aren’t graph operators. Use them within a name or a method:value.

Methods examples

tag
dbt run -s "tag:nightly"
source
dbt run -s "source:snowplow+"
resource_type
dbt list -s "resource_type:test"
path
dbt run -s "models/staging/github"
package
dbt run -s "package:snowplow"
config.X
dbt run -s "config.materialized:incremental"
fqn
dbt run -s "fqn:my_project.marts.finance.fct_orders"
file
dbt run -s "file:fct_orders.sql"
semantic_model
dbt parse -s "semantic_model:orders"
saved_query
dbt sl list -s "saved_query:weekly_revenue"
test_type
dbt test -s "test_type:generic"
test_name
dbt test -s "test_name:unique"
state
dbt run -s "state:modified" --state path/to/artifacts
exposure
dbt run -s "+exposure:weekly_kpis"
metric
dbt build -s "+metric:weekly_active_users"
result
dbt run -s "result:error" --state path/to/artifacts
source_status
dbt build -s "source_status:fresher+"
group
dbt run -s "group:finance"
access
dbt list -s "access:public"
version
dbt list -s "version:latest"
unit_test
dbt list -s "unit_test:*"
selector 1.12+
dbt run -s "selector:nightly_marts"

Set operators

Unions (space-delimited)

  • $ dbt run --select +snowplow_sessions +fct_ordersrun snowplow_sessions, all ancestors of snowplow_sessions, fct_orders, and all ancestors of fct_orders

Intersections (comma-separated)

  • $ dbt run --select +snowplow_sessions,+fct_ordersrun all the common ancestors of snowplow_sessions and fct_orders
  • $ dbt run --select marts.finance,tag:nightlyrun models that are in the marts/finance subdirectory and tagged nightly

Excluding models

dbt provides an --exclude flag with the same semantics as --select. Models specified with the --exclude flag will be removed from the set of models selected with --select.

Example:

  • $ dbt run --select path:models/marts/finance --exclude fct_orders+the finance marts, minus fct_orders and everything downstream of it

State

Some methods compare the project against a manifest.json from another run (a previous invocation, or production), passed via the --state flag.

  • $ dbt build -s "state:modified+" --defer --state path/to/artifacts

Selectors

  • state:newabsent from the comparison manifest
  • state:modifiednew nodes, plus any changed existing node
  • state:unmodifiedexisting nodes with no changes
  • state:oldpresent in the comparison manifest

Narrow state:modified to one kind of change with sub-selectors: .body, .configs, .relation, .macros, .contract.

Defer

--defer lets you build a model without first building its parents: a ref() to an unbuilt parent resolves to another environment’s state instead. It needs --state. Flip the switches to see where model_c’s ref('model_b') resolves.

dbt build -s model_c

Prod state Dev (your schema) model_a model_b model_c model_b model_c ref('model_b')

No --defer: ref('model_b') resolves to your dev schema, where model_b was never built. model_c fails with a 'relation not found' error.

dbt + AI · 2026

The part the original cheat sheet couldn’t have: how dbt works in the agent era.

dbt MCP server

A Model Context Protocol server that lets AI agents (Claude, Cursor, VS Code) operate your dbt project, safely and with governance.

Run it locally:

  • uvx dbt-mcplocal server: full dbt CLI access

Tool groups it exposes:

dbt CLI
run, build, test, compile, list, parse, show
Discovery
get_model_details, get_lineage, get_model_health, get_mart_models
Semantic Layer
list_metrics, query_metrics, get_dimensions
SQL
execute_sql, text_to_sql
Admin API
trigger_job_run, list_jobs, retry_job_run
Codegen
generate_model_yaml, generate_source, generate_staging_model

A remote MCP server (one endpoint per environment) is also available for data-consumption tools without any local setup.

Agents & Copilot

Developer Agent
Write and refactor models, generate tests, and validate changes from natural language inside the Studio IDE.
Analyst Agent
Answer natural-language questions with governed numbers, powered by the dbt Semantic Layer.
dbt Copilot
One-click inline generation of SQL, models, tests and documentation.
dbt Canvas
Visual, drag-and-drop model building. Governance and lineage maintained automatically.
dbt Insights
An AI query interface to explore, validate and share results with full lineage awareness.

Fusion engine

The dbt engine, rewritten in Rust: what changes when your project runs on it.

What Fusion brings

A ground-up rewrite of the dbt engine in Rust, shipped as a single binary (v2.0).

  • SQL comprehension: Fusion actually parses your SQL, so it catches column & type errors before a run.
  • Language server (LSP): go-to-definition, autocomplete and live errors in the editor.
  • Faster parse / compile, with native ADBC warehouse drivers.
  • Adds the session verbs environment, invocation, cancel, reattach.
  • Deprecated flags are removed, most notably --models / -m → use --select.