Airbnb Analytics Engine — dbt + Snowflake

A multi-layer dbt project that models Airbnb booking, listing, and host data on Snowflake. The pipeline moves raw data through bronze (ingestion), silver (cleaning), and gold (analytics-ready) layers, producing both a denormalized One Big Table (OBT) and a star-schema fact table backed by SCD2 dimension snapshots.

Architecture Overview

Snowflake (AIRBNB database)
│
├── staging schema          ← raw source tables (pre-existing, not managed by dbt)
│   ├── listings
│   ├── hosts
│   └── bookings
│
├── BRONZE schema           ← incremental ingestion, no transformation
│   ├── bronze_listings
│   ├── bronze_hosts
│   └── bronze_bookings
│
├── SILVER schema           ← cleaned, typed, business-logic enriched
│   ├── silver_listings
│   ├── silver_hosts
│   └── silver_bookings
│
└── GOLD schema             ← analytics-ready outputs
    ├── obt                 ← One Big Table (denormalized, all entities joined)
    ├── fact                ← Fact table (joins obt with SCD2 dimensions)
    ├── dim_listings        ← SCD2 snapshot (created by dbt snapshot)
    ├── dim_hosts           ← SCD2 snapshot (created by dbt snapshot)
    └── dim_bookings        ← SCD2 snapshot (created by dbt snapshot)

Modeling approach

The gold layer is a hybrid design:

OBT-centric path: silver_* → obt — a single wide table joining all three entities, suitable for flat BI tool consumption (Tableau, Looker, Power BI).
Star schema path: silver_* → snapshots (dim_*) → fact — a proper dimensional model with SCD2 history on listings and hosts, suitable for time-accurate historical analysis.

Both outputs coexist. Use obt for current-state reporting and fact when you need point-in-time accuracy (e.g. what was the listing price at the time of booking).

Data Sources

Defined in models/sources/sources.yml.

Source name	Database	Schema	Tables
`staging`	`AIRBNB`	`staging`	`listings`, `hosts`, `bookings`

These tables are owned by an upstream process and are not managed by this dbt project. dbt reads from them via {{ source('staging', '<table>') }}.

Project Structure

aws_dbt_snowflake_project/
├── analyses/
│   ├── exploration.sql         # Ad-hoc OBT query (not compiled into runs)
│   ├── if_else.sql             # Jinja control flow reference
│   └── loop.sql                # Jinja loop reference
├── macros/
│   ├── generate_schema_name.sql  # Overrides dbt default schema naming
│   ├── multiply.sql              # Rounds the product of two columns
│   ├── tag.sql                   # Bucketing macro (low / medium / high)
│   └── trimmer.sql               # Trim + uppercase utility
├── models/
│   ├── sources/
│   │   └── sources.yml
│   ├── bronze/
│   │   ├── bronze_bookings.sql
│   │   ├── bronze_hosts.sql
│   │   └── bronze_listings.sql
│   ├── silver/
│   │   ├── silver_bookings.sql
│   │   ├── silver_hosts.sql
│   │   └── silver_listings.sql
│   └── gold/
│       ├── obt.sql
│       ├── fact.sql
│       └── ephemeral/
│           ├── bookings.sql
│           ├── hosts.sql
│           └── listings.sql
├── snapshots/
│   ├── dim_bookings.yml
│   ├── dim_hosts.yml
│   └── dim_listings.yml
├── tests/
│   └── source_tests.sql        # Warns when booking_amount < 200
├── dbt_project.yml
└── profiles.yml

Layer Reference

Bronze

Schema: AIRBNB.BRONZE | Materialization: incremental

Thin ingestion layer. Each model is a SELECT * from its corresponding staging source with an incremental filter on CREATED_AT. No business logic is applied here — the purpose is to land raw data into the warehouse under dbt management so that upstream source changes are isolated.

Model	Source table	Incremental key
`bronze_bookings`	`staging.bookings`	`CREATED_AT`
`bronze_hosts`	`staging.hosts`	`CREATED_AT`
`bronze_listings`	`staging.listings`	`CREATED_AT`

Silver

Schema: AIRBNB.SILVER | Materialization: incremental (upsert via unique_key)

Cleaning and enrichment layer. Each model reads from its bronze counterpart via ref() and applies column selection, renaming, type casting, and derived fields.

`silver_bookings`

Column	Notes
`BOOKING_ID`	Unique key
`LISTING_ID`	FK to listings
`BOOKING_DATE`
`TOTAL_AMOUNT`	`round(NIGHTS_BOOKED * BOOKING_AMOUNT, 2)` via `multiply` macro
`SERVICE_FEE`
`CLEANING_FEE`
`BOOKING_STATUS`
`CREATED_AT`

`silver_hosts`

Column	Notes
`HOST_ID`	Unique key
`HOST_NAME`	Spaces replaced with underscores
`HOST_SINCE`
`IS_SUPERHOST`
`RESPONSE_RATE`	Raw numeric rate
`RESPONSE_RATE_QUALITY`	Derived: `VERY GOOD / GOOD / FAIR / POOR`
`CREATED_AT`

`silver_listings`

Column	Notes
`LISTING_ID`	Unique key
`HOST_ID`	FK to hosts
`PROPERTY_TYPE`
`ROOM_TYPE`
`CITY`
`COUNTRY`
`ACCOMMODATES`
`BEDROOMS`
`BATHROOMS`
`PRICE_PER_NIGHT`
`PRICE_PER_NIGHT_TAG`	Derived: `low / medium / high` via `tag` macro
`CREATED_AT`

Gold

Schema: AIRBNB.GOLD | Materialization: table

`obt` — One Big Table

A fully denormalized join of all three silver entities at booking grain. One row per booking, enriched with all listing and host attributes. Built with a Jinja loop over a configuration dict for dynamic FROM / JOIN clause generation.

Grain: one row per BOOKING_ID

Dependencies: silver_bookings → joined to silver_listings → joined to silver_hosts

Use when: you need a flat, current-state dataset for BI tools or ad-hoc analysis.

`fact` — Fact Table

Joins obt with the SCD2 dimension tables (dim_listings, dim_hosts). Structured for time-accurate reporting — dimension attributes reflect their state at a point in time rather than their current state.

Grain: one row per BOOKING_ID

Dependencies: obt, dim_listings (snapshot), dim_hosts (snapshot)

Use when: you need historical accuracy, e.g. the listing price or host superhost status at the time of booking.

fact requires the dimension snapshots to exist in Snowflake. Run dbt snapshot before dbt run --select fact. See Running the Project.

Snapshots

Schema: AIRBNB.GOLD | Strategy: timestamp SCD2

Snapshots capture slowly changing dimension history on listings, hosts, and bookings. Each snapshot reads from its corresponding silver table and writes rows with DBT_VALID_FROM / DBT_VALID_TO columns. DBT_VALID_TO is set to 9999-12-31 for currently active records.

Snapshot	Source	Unique key	`updated_at`
`dim_listings`	`silver_listings`	`LISTING_ID`	`CREATED_AT`
`dim_hosts`	`silver_hosts`	`HOST_ID`	`CREATED_AT`
`dim_bookings`	`silver_bookings`	`BOOKING_ID`	`CREATED_AT`

Snapshots are not built by dbt run. They must be run explicitly:

dbt snapshot

Ephemeral Models

Located in models/gold/ephemeral/. These compile to CTEs — they have no physical existence in the database and are inlined into any model that references them via ref().

Model	Selects from	Purpose
`bookings`	`obt`	Booking-scoped subset of the OBT
`hosts`	`obt`	Host-scoped subset of the OBT
`listings`	`obt`	Listing-scoped subset of the OBT

Ephemeral models cannot be snapshotted, queried directly in Snowflake, or used as snapshot sources.

Macros

`generate_schema_name`

Overrides dbt's default schema naming convention. By default, dbt prefixes custom schemas with the target schema name (e.g. dev_bronze). This macro strips the prefix so schemas resolve to their bare names (bronze, silver, gold) across all environments.

{{ generate_schema_name(custom_schema_name, node) }}
-- returns: custom_schema_name if set, else target.schema

Because all environments resolve to the same schema names, running dbt in a dev target writes to the same BRONZE/SILVER/GOLD schemas as production. Add target.name logic to the macro if dev/prod isolation is needed.

`multiply`

Multiplies two column expressions and rounds the result to a given decimal precision.

{{ multiply('NIGHTS_BOOKED', 'BOOKING_AMOUNT', 2) }}
-- compiles to: round(NIGHTS_BOOKED * BOOKING_AMOUNT, 2)

`tag`

Buckets a numeric column expression into low, medium, or high string labels.

{{ tag('CAST(PRICE_PER_NIGHT AS INT)') }}
-- compiles to:
--   CASE
--     WHEN <expr> < 100 THEN 'low'
--     WHEN <expr> < 200 THEN 'medium'
--     ELSE 'high'
--   END

`trimmer`

Trims whitespace and uppercases a column value.

{{ trimmer('host_name') }}
-- compiles to: host_name trimmed and uppercased

Tests

File: tests/source_tests.sql

A singular test that returns rows where BOOKING_AMOUNT < 200 from the staging bookings source. Configured with severity: warn — the run will not fail, but dbt will log a warning for any matching rows.

dbt test

Analyses

Ad-hoc SQL files in analyses/ are compiled by dbt but never executed as part of a run. Useful for exploratory queries and Jinja reference examples.

File	Purpose
`exploration.sql`	Full OBT select — useful for sanity-checking the gold layer
`if_else.sql`	Jinja `if/else` control flow reference
`loop.sql`	Jinja `for` loop reference

Compile an analysis without running it:

dbt compile --select exploration
# output written to: target/compiled/.../analyses/exploration.sql

Prerequisites

Python 3.8+
dbt-snowflake 1.11+
A Snowflake account with:
- Database: AIRBNB
- Schema staging containing listings, hosts, and bookings tables
- A warehouse and role with READ on staging and CREATE TABLE on BRONZE, SILVER, GOLD

Setup

1. Clone the repository

git clone <repo-url>
cd aws_dbt_snowflake_project

2. Create a virtual environment and install dbt

python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate
pip install dbt-snowflake

3. Configure your connection

Edit profiles.yml with your Snowflake credentials:

aws_dbt_snowflake_project:
  outputs:
    dev:
      type: snowflake
      account: <your-account-identifier>
      user: <your-username>
      password: <your-password>
      role: <your-role>
      database: AIRBNB
      warehouse: <your-warehouse>
      schema: dbt_schema
      threads: 1
  target: dev

Use an environment variable for the password rather than a plaintext value:

password: "{{ env_var('DBT_SNOWFLAKE_PASSWORD') }}"

Then set it in your shell:

export DBT_SNOWFLAKE_PASSWORD=your_password   # Windows: $env:DBT_SNOWFLAKE_PASSWORD="your_password"

4. Verify the connection

dbt debug

Running the Project

First-time full build

# Step 1 — build all models (bronze → silver → gold)
dbt run

# Step 2 — build SCD2 dimension snapshots (creates DIM_LISTINGS, DIM_HOSTS, DIM_BOOKINGS)
dbt snapshot

# Step 3 — rebuild fact now that the dimension tables exist
dbt run --select fact

# Step 4 — run tests
dbt test

Day-to-day incremental refresh

# Incrementally refresh bronze and silver, rebuild gold tables
dbt run

# Update snapshots with any changed dimension records
dbt snapshot

Selective builds

# Build a single model and all upstream dependencies
dbt run --select +obt

# Build only silver models
dbt run --select silver.*

# Build only gold models
dbt run --select gold.*

# Compile to inspect generated SQL without executing
dbt compile --select fact
# → target/compiled/aws_dbt_snowflake_project/models/gold/fact.sql

Development Guidelines

Always use ref() for model-to-model references. Never hardcode DATABASE.SCHEMA.TABLE paths. ref() wires the dependency graph, enforces build order, and respects generate_schema_name.
Always use source() for staging table references. Defined in models/sources/sources.yml.
Snapshots require dbt snapshot. They are not built by dbt run. After any schema change to a silver model that feeds a snapshot, run dbt snapshot to propagate.
Ephemeral models cannot be snapshotted. They have no database existence. If you need a snapshot source, it must be a table or view materialization.
generate_schema_name strips environment prefixes. All targets write to bronze, silver, gold. Update the macro if you need isolated dev schemas.
Bronze models override project-level materialization. dbt_project.yml declares bronze as table, but each model overrides this with incremental. The model-level config takes precedence.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
aws_dbt_snowflake_project		aws_dbt_snowflake_project
img		img
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Airbnb Analytics Engine — dbt + Snowflake

Table of Contents

Architecture Overview

Modeling approach

Data Sources

Project Structure

Layer Reference

Bronze

Silver

silver_bookings

silver_hosts

silver_listings

Gold

obt — One Big Table

fact — Fact Table

Snapshots

Ephemeral Models

Macros

generate_schema_name

multiply

tag

trimmer

Tests

Analyses

Prerequisites

Setup

Running the Project

First-time full build

Day-to-day incremental refresh

Selective builds

Development Guidelines

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`silver_bookings`

`silver_hosts`

`silver_listings`

`obt` — One Big Table

`fact` — Fact Table

`generate_schema_name`

`multiply`

`tag`

`trimmer`

Packages