Incremental Models
An incremental model processes only the rows that changed since the last run - instead of rebuilding the entire table from scratch every time.
The problem with rebuilding everything
By default, a dbt table model rebuilds itself completely on every run: drop,
re-select, re-insert every row. For a 10,000-row dimension that is fine. For a
two-billion-row events table, it is a slow, expensive query you run again and
again to recompute rows that have not changed since yesterday.
An incremental model breaks that pattern. On the first run it builds the whole table. On every run after that, it processes only the new or changed rows and folds them into the table that already exists.
How an incremental model works
Two pieces make it work: a config that marks the model incremental, and an
is_incremental() block that filters down to just the fresh rows.
{{ config(materialized='incremental', unique_key='order_id') }}
select * from {{ source('shop', 'orders') }}
{% if is_incremental() %}
-- only on incremental runs: keep rows newer than what we already have
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}
On the first run, is_incremental() is false, the where clause is
skipped, and dbt builds the full table. On later runs it is true, so dbt
selects only the recent rows - then has to decide how to combine them with the
existing table. That decision is the incremental strategy.
The animation below walks through the merge strategy: dbt loads the new
rows into a temporary set, then matches them against the target table on the
unique_key. Matches are updated in place; everything else is inserted.
Strategies, briefly
merge is the default on most warehouses, but it is not the only option:
append simply adds rows (fast, but no updates and no dedup), and
delete+insert clears the matching keys first and then inserts. They all
answer the same question - what do we do with a row we have seen before? -
in different ways.
When to use this pattern
Reach for an incremental model when a table is large and append-mostly - event logs, clickstream, orders - and a full rebuild has become too slow or too expensive. For small tables, a plain table model is simpler and the extra moving parts are not worth it.