Guides

Building a Semantic Model

Step-by-step guide to creating a semantic model for your vertical.

The semantic model is what makes natural language queries work. This guide walks through building one, using IntelliTravel as a real example.

Step 1: Define Entities

Entities map your domain tables to natural language names. For each table that users might ask about:

  1. Go to Settings → Semantic Model → Entities
  2. Add the table with a clear name and description

IntelliTravel example:

nametable_namedisplay_columnactive_filter_sqldescription
customercustomersnameTravel agency customer with segment and lifetime value
bookingbookingsidstatus NOT IN ('cancelled')Travel booking linking customer to destination
destinationdestinationsnameAvailable travel destination
suppliersuppliersnameis_active = trueService supplier (hotel, transfer, activity)
contact_historycontact_historyidCustomer interaction record

Warning

active_filter_sql is critical. Without it, queries count cancelled bookings, inactive suppliers, etc. Always think about what "active" means for each entity.

Step 2: Add Metrics

For each number users might ask about:

  1. Go to Settings → Semantic Model → Metrics
  2. Define the SQL expression, unit, and format

Key fields:

  • sql_expression — The actual SQL. Use CASE WHEN for conditional aggregation.
  • base_table — Which table this metric queries.
  • unitnumber, currency, or percentage.
  • format_hint — How to display: ,0 for integers, ,2 EUR for euros, 0.0% for percentages.
  • is_additive — Can this metric be summed across time periods? Revenue: yes. Average booking value: no.

IntelliTravel example:

namesql_expressionunitis_additive
total_bookingsCOUNT(*)numbertrue
revenueSUM(CASE WHEN status IN ('confirmed','completed') THEN total_price_eur END)currencytrue
avg_booking_valueSUM(...) / NULLIF(COUNT(...), 0)currencyfalse
cancellation_rateCOUNT(*) FILTER (WHERE status = 'cancelled') * 100.0 / NULLIF(COUNT(*), 0)percentagefalse

Step 3: Add Dimensions

Dimensions define how metrics get sliced:

  • Time dimensionsbooking_date, travel_date. Enable day/week/month/year roll-up.
  • Categorical dimensionsdestination, booking_channel, customer_segment, supplier.

For each dimension, specify the table and column. The description field is LLM context — explain what the dimension means and what kind of grouping makes sense.

Step 4: Define Vocabulary

Vocabulary terms bridge business language to SQL. This is where domain expertise matters most.

Good vocabulary entries:

TermWhat it meansSQL filter
"high-value customer"LTV > 5000 EUR + 2+ bookingsltv_eur > 5000 AND id IN (SELECT ... HAVING COUNT(*) >= 2)
"at-risk customer"2+ past bookings, none in 14 monthsComplex subquery on booking recency
"season"High/Shoulder/Low demand periodCASE WHEN EXTRACT(MONTH ...) IN (6,7,8) THEN 'High' ...

Each entry includes example questions — the phrases that should trigger this term:

  • "high-value customer" → ["Show me all high-value customers", "Which customers spend the most"]

Step 5: Test

  1. Create a Raw Query widget on a test dashboard
  2. Ask questions that use your vocabulary terms
  3. Verify the generated SQL is correct
  4. Check chart rendering

Common issues:

  • Wrong base_table — Metric queries the wrong table
  • Missing active_filter_sql — Includes inactive/cancelled records
  • Overly complex sql_expression — Simplify; the LLM handles complexity
  • Vocabulary too narrow — Add more example questions per term

What Happens Next

When you save the semantic model:

  • schema_context is regenerated for the LLM
  • Existing cached queries remain until TTL expires
  • Metabase and widgets that reference changed metrics are flagged for review