A multi-layer dbt project that models Airbnb booking, listing, and host data on Snowflake. The pipeline moves raw data through bronze (ingestion), silver (cleaning), and gold (analytics-ready) layers, producing both a denormalized One Big Table (OBT) and a star-schema fact table backed by SCD2 dimension snapshots.
- Architecture Overview
- Data Sources
- Project Structure
- Layer Reference
- Macros
- Tests
- Analyses
- Prerequisites
- Setup
- Running the Project
- Development Guidelines
Snowflake (AIRBNB database)
│
├── staging schema ← raw source tables (pre-existing, not managed by dbt)
│ ├── listings
│ ├── hosts
│ └── bookings
│
├── BRONZE schema ← incremental ingestion, no transformation
│ ├── bronze_listings
│ ├── bronze_hosts
│ └── bronze_bookings
│
├── SILVER schema ← cleaned, typed, business-logic enriched
│ ├── silver_listings
│ ├── silver_hosts
│ └── silver_bookings
│
└── GOLD schema ← analytics-ready outputs
├── obt ← One Big Table (denormalized, all entities joined)
├── fact ← Fact table (joins obt with SCD2 dimensions)
├── dim_listings ← SCD2 snapshot (created by dbt snapshot)
├── dim_hosts ← SCD2 snapshot (created by dbt snapshot)
└── dim_bookings ← SCD2 snapshot (created by dbt snapshot)
The gold layer is a hybrid design:
- OBT-centric path:
silver_* → obt— a single wide table joining all three entities, suitable for flat BI tool consumption (Tableau, Looker, Power BI). - Star schema path:
silver_* → snapshots (dim_*) → fact— a proper dimensional model with SCD2 history on listings and hosts, suitable for time-accurate historical analysis.
Both outputs coexist. Use obt for current-state reporting and fact when you need point-in-time accuracy (e.g. what was the listing price at the time of booking).
Defined in models/sources/sources.yml.
| Source name | Database | Schema | Tables |
|---|---|---|---|
staging |
AIRBNB |
staging |
listings, hosts, bookings |
These tables are owned by an upstream process and are not managed by this dbt project. dbt reads from them via {{ source('staging', '<table>') }}.
aws_dbt_snowflake_project/
├── analyses/
│ ├── exploration.sql # Ad-hoc OBT query (not compiled into runs)
│ ├── if_else.sql # Jinja control flow reference
│ └── loop.sql # Jinja loop reference
├── macros/
│ ├── generate_schema_name.sql # Overrides dbt default schema naming
│ ├── multiply.sql # Rounds the product of two columns
│ ├── tag.sql # Bucketing macro (low / medium / high)
│ └── trimmer.sql # Trim + uppercase utility
├── models/
│ ├── sources/
│ │ └── sources.yml
│ ├── bronze/
│ │ ├── bronze_bookings.sql
│ │ ├── bronze_hosts.sql
│ │ └── bronze_listings.sql
│ ├── silver/
│ │ ├── silver_bookings.sql
│ │ ├── silver_hosts.sql
│ │ └── silver_listings.sql
│ └── gold/
│ ├── obt.sql
│ ├── fact.sql
│ └── ephemeral/
│ ├── bookings.sql
│ ├── hosts.sql
│ └── listings.sql
├── snapshots/
│ ├── dim_bookings.yml
│ ├── dim_hosts.yml
│ └── dim_listings.yml
├── tests/
│ └── source_tests.sql # Warns when booking_amount < 200
├── dbt_project.yml
└── profiles.yml
Schema: AIRBNB.BRONZE | Materialization: incremental
Thin ingestion layer. Each model is a SELECT * from its corresponding staging source with an incremental filter on CREATED_AT. No business logic is applied here — the purpose is to land raw data into the warehouse under dbt management so that upstream source changes are isolated.
| Model | Source table | Incremental key |
|---|---|---|
bronze_bookings |
staging.bookings |
CREATED_AT |
bronze_hosts |
staging.hosts |
CREATED_AT |
bronze_listings |
staging.listings |
CREATED_AT |
Schema: AIRBNB.SILVER | Materialization: incremental (upsert via unique_key)
Cleaning and enrichment layer. Each model reads from its bronze counterpart via ref() and applies column selection, renaming, type casting, and derived fields.
| Column | Notes |
|---|---|
BOOKING_ID |
Unique key |
LISTING_ID |
FK to listings |
BOOKING_DATE |
|
TOTAL_AMOUNT |
round(NIGHTS_BOOKED * BOOKING_AMOUNT, 2) via multiply macro |
SERVICE_FEE |
|
CLEANING_FEE |
|
BOOKING_STATUS |
|
CREATED_AT |
| Column | Notes |
|---|---|
HOST_ID |
Unique key |
HOST_NAME |
Spaces replaced with underscores |
HOST_SINCE |
|
IS_SUPERHOST |
|
RESPONSE_RATE |
Raw numeric rate |
RESPONSE_RATE_QUALITY |
Derived: VERY GOOD / GOOD / FAIR / POOR |
CREATED_AT |
| Column | Notes |
|---|---|
LISTING_ID |
Unique key |
HOST_ID |
FK to hosts |
PROPERTY_TYPE |
|
ROOM_TYPE |
|
CITY |
|
COUNTRY |
|
ACCOMMODATES |
|
BEDROOMS |
|
BATHROOMS |
|
PRICE_PER_NIGHT |
|
PRICE_PER_NIGHT_TAG |
Derived: low / medium / high via tag macro |
CREATED_AT |
Schema: AIRBNB.GOLD | Materialization: table
A fully denormalized join of all three silver entities at booking grain. One row per booking, enriched with all listing and host attributes. Built with a Jinja loop over a configuration dict for dynamic FROM / JOIN clause generation.
Grain: one row per BOOKING_ID
Dependencies: silver_bookings → joined to silver_listings → joined to silver_hosts
Use when: you need a flat, current-state dataset for BI tools or ad-hoc analysis.
Joins obt with the SCD2 dimension tables (dim_listings, dim_hosts). Structured for time-accurate reporting — dimension attributes reflect their state at a point in time rather than their current state.
Grain: one row per BOOKING_ID
Dependencies: obt, dim_listings (snapshot), dim_hosts (snapshot)
Use when: you need historical accuracy, e.g. the listing price or host superhost status at the time of booking.
factrequires the dimension snapshots to exist in Snowflake. Rundbt snapshotbeforedbt run --select fact. See Running the Project.
Schema: AIRBNB.GOLD | Strategy: timestamp SCD2
Snapshots capture slowly changing dimension history on listings, hosts, and bookings. Each snapshot reads from its corresponding silver table and writes rows with DBT_VALID_FROM / DBT_VALID_TO columns. DBT_VALID_TO is set to 9999-12-31 for currently active records.
| Snapshot | Source | Unique key | updated_at |
|---|---|---|---|
dim_listings |
silver_listings |
LISTING_ID |
CREATED_AT |
dim_hosts |
silver_hosts |
HOST_ID |
CREATED_AT |
dim_bookings |
silver_bookings |
BOOKING_ID |
CREATED_AT |
Snapshots are not built by dbt run. They must be run explicitly:
dbt snapshotLocated in models/gold/ephemeral/. These compile to CTEs — they have no physical existence in the database and are inlined into any model that references them via ref().
| Model | Selects from | Purpose |
|---|---|---|
bookings |
obt |
Booking-scoped subset of the OBT |
hosts |
obt |
Host-scoped subset of the OBT |
listings |
obt |
Listing-scoped subset of the OBT |
Ephemeral models cannot be snapshotted, queried directly in Snowflake, or used as snapshot sources.
Overrides dbt's default schema naming convention. By default, dbt prefixes custom schemas with the target schema name (e.g. dev_bronze). This macro strips the prefix so schemas resolve to their bare names (bronze, silver, gold) across all environments.
{{ generate_schema_name(custom_schema_name, node) }}
-- returns: custom_schema_name if set, else target.schemaBecause all environments resolve to the same schema names, running dbt in a
devtarget writes to the sameBRONZE/SILVER/GOLDschemas as production. Addtarget.namelogic to the macro if dev/prod isolation is needed.
Multiplies two column expressions and rounds the result to a given decimal precision.
{{ multiply('NIGHTS_BOOKED', 'BOOKING_AMOUNT', 2) }}
-- compiles to: round(NIGHTS_BOOKED * BOOKING_AMOUNT, 2)Buckets a numeric column expression into low, medium, or high string labels.
{{ tag('CAST(PRICE_PER_NIGHT AS INT)') }}
-- compiles to:
-- CASE
-- WHEN <expr> < 100 THEN 'low'
-- WHEN <expr> < 200 THEN 'medium'
-- ELSE 'high'
-- ENDTrims whitespace and uppercases a column value.
{{ trimmer('host_name') }}
-- compiles to: host_name trimmed and uppercasedFile: tests/source_tests.sql
A singular test that returns rows where BOOKING_AMOUNT < 200 from the staging bookings source. Configured with severity: warn — the run will not fail, but dbt will log a warning for any matching rows.
dbt testAd-hoc SQL files in analyses/ are compiled by dbt but never executed as part of a run. Useful for exploratory queries and Jinja reference examples.
| File | Purpose |
|---|---|
exploration.sql |
Full OBT select — useful for sanity-checking the gold layer |
if_else.sql |
Jinja if/else control flow reference |
loop.sql |
Jinja for loop reference |
Compile an analysis without running it:
dbt compile --select exploration
# output written to: target/compiled/.../analyses/exploration.sql- Python 3.8+
- dbt-snowflake 1.11+
- A Snowflake account with:
- Database:
AIRBNB - Schema
stagingcontaininglistings,hosts, andbookingstables - A warehouse and role with
READonstagingandCREATE TABLEonBRONZE,SILVER,GOLD
- Database:
1. Clone the repository
git clone <repo-url>
cd aws_dbt_snowflake_project2. Create a virtual environment and install dbt
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install dbt-snowflake3. Configure your connection
Edit profiles.yml with your Snowflake credentials:
aws_dbt_snowflake_project:
outputs:
dev:
type: snowflake
account: <your-account-identifier>
user: <your-username>
password: <your-password>
role: <your-role>
database: AIRBNB
warehouse: <your-warehouse>
schema: dbt_schema
threads: 1
target: devUse an environment variable for the password rather than a plaintext value:
password: "{{ env_var('DBT_SNOWFLAKE_PASSWORD') }}"Then set it in your shell:
export DBT_SNOWFLAKE_PASSWORD=your_password # Windows: $env:DBT_SNOWFLAKE_PASSWORD="your_password"4. Verify the connection
dbt debug# Step 1 — build all models (bronze → silver → gold)
dbt run
# Step 2 — build SCD2 dimension snapshots (creates DIM_LISTINGS, DIM_HOSTS, DIM_BOOKINGS)
dbt snapshot
# Step 3 — rebuild fact now that the dimension tables exist
dbt run --select fact
# Step 4 — run tests
dbt test# Incrementally refresh bronze and silver, rebuild gold tables
dbt run
# Update snapshots with any changed dimension records
dbt snapshot# Build a single model and all upstream dependencies
dbt run --select +obt
# Build only silver models
dbt run --select silver.*
# Build only gold models
dbt run --select gold.*
# Compile to inspect generated SQL without executing
dbt compile --select fact
# → target/compiled/aws_dbt_snowflake_project/models/gold/fact.sql- Always use
ref()for model-to-model references. Never hardcodeDATABASE.SCHEMA.TABLEpaths.ref()wires the dependency graph, enforces build order, and respectsgenerate_schema_name. - Always use
source()for staging table references. Defined inmodels/sources/sources.yml. - Snapshots require
dbt snapshot. They are not built bydbt run. After any schema change to a silver model that feeds a snapshot, rundbt snapshotto propagate. - Ephemeral models cannot be snapshotted. They have no database existence. If you need a snapshot source, it must be a
tableorviewmaterialization. generate_schema_namestrips environment prefixes. All targets write tobronze,silver,gold. Update the macro if you need isolated dev schemas.- Bronze models override project-level materialization.
dbt_project.ymldeclares bronze astable, but each model overrides this withincremental. The model-level config takes precedence.
