Building a Semantic Model

The semantic model is what makes natural language queries work. This guide walks through building one, using IntelliTravel as a real example.

Step 1: Define Entities

Entities map your domain tables to natural language names. For each table that users might ask about:

Go to Settings → Semantic Model → Entities
Add the table with a clear name and description

IntelliTravel example:

name	table_name	display_column	active_filter_sql	description
`customer`	`customers`	`name`	—	Travel agency customer with segment and lifetime value
`booking`	`bookings`	`id`	`status NOT IN ('cancelled')`	Travel booking linking customer to destination
`destination`	`destinations`	`name`	—	Available travel destination
`supplier`	`suppliers`	`name`	`is_active = true`	Service supplier (hotel, transfer, activity)
`contact_history`	`contact_history`	`id`	—	Customer interaction record

Warning

active_filter_sql is critical. Without it, queries count cancelled bookings, inactive suppliers, etc. Always think about what "active" means for each entity.

Step 2: Add Metrics

For each number users might ask about:

Go to Settings → Semantic Model → Metrics
Define the SQL expression, unit, and format

Key fields:

sql_expression — The actual SQL. Use CASE WHEN for conditional aggregation.
base_table — Which table this metric queries.
unit — number, currency, or percentage.
format_hint — How to display: ,0 for integers, ,2 EUR for euros, 0.0% for percentages.
is_additive — Can this metric be summed across time periods? Revenue: yes. Average booking value: no.

IntelliTravel example:

name	sql_expression	unit	is_additive
`total_bookings`	`COUNT(*)`	number	true
`revenue`	`SUM(CASE WHEN status IN ('confirmed','completed') THEN total_price_eur END)`	currency	true
`avg_booking_value`	`SUM(...) / NULLIF(COUNT(...), 0)`	currency	false
`cancellation_rate`	`COUNT() FILTER (WHERE status = 'cancelled') 100.0 / NULLIF(COUNT(*), 0)`	percentage	false

Step 3: Add Dimensions

Dimensions define how metrics get sliced:

Time dimensions — booking_date, travel_date. Enable day/week/month/year roll-up.
Categorical dimensions — destination, booking_channel, customer_segment, supplier.

For each dimension, specify the table and column. The description field is LLM context — explain what the dimension means and what kind of grouping makes sense.

Step 4: Define Vocabulary

Vocabulary terms bridge business language to SQL. This is where domain expertise matters most.

Good vocabulary entries:

Term	What it means	SQL filter
"high-value customer"	LTV > 5000 EUR + 2+ bookings	`ltv_eur > 5000 AND id IN (SELECT ... HAVING COUNT(*) >= 2)`
"at-risk customer"	2+ past bookings, none in 14 months	Complex subquery on booking recency
"season"	High/Shoulder/Low demand period	`CASE WHEN EXTRACT(MONTH ...) IN (6,7,8) THEN 'High' ...`

Each entry includes example questions — the phrases that should trigger this term:

"high-value customer" → ["Show me all high-value customers", "Which customers spend the most"]

Step 5: Test

Create a Raw Query widget on a test dashboard
Ask questions that use your vocabulary terms
Verify the generated SQL is correct
Check chart rendering

Common issues:

Wrong base_table — Metric queries the wrong table
Missing active_filter_sql — Includes inactive/cancelled records
Overly complex sql_expression — Simplify; the LLM handles complexity
Vocabulary too narrow — Add more example questions per term

What Happens Next

When you save the semantic model:

schema_context is regenerated for the LLM
Existing cached queries remain until TTL expires
Metabase and widgets that reference changed metrics are flagged for review