Agent Skill · Databricks

databricks-aibi-dashboards

Create Databricks AI/BI dashboards. Must use when creating, updating, or deploying Lakeview dashboards as Databricks Dashboard have a unique json structure. CRITICAL: You MUST test ALL SQL queries via CLI BEFORE deploying. Follow guidelines strictly.

View SKILL.md on GitHub → Source repository Provider profile

Provider: Databricks Path in repo: experimental/databricks-aibi-dashboards/SKILL.md

Skill body

AI/BI Dashboard Skill

Create Databricks AI/BI dashboards (formerly Lakeview dashboards). A dashboard should be showing something relevant for a human, typically some KPI on the top, and based on the story, some graph (often temporal), and we see “something happens”. Follow these guidelines strictly.

Quick Reference

Task	Command
List warehouses	`databricks warehouses list`
List tables	`databricks experimental aitools tools query --warehouse WH "SHOW TABLES IN catalog.schema"`
Get schema	`databricks experimental aitools tools discover-schema catalog.schema.table1 catalog.schema.table2`
Test query	`databricks experimental aitools tools query --warehouse WH "SELECT..."`
Create dashboard	`databricks lakeview create --display-name "X" --warehouse-id "WH" --dataset-catalog CATALOG --dataset-schema SCHEMA --serialized-dashboard "$(cat file.json)" --json '{"parent_path": "/Workspace/Users/<you>/path"}'` — `--dataset-catalog` / `--dataset-schema` are flag-only (REQUIRED; CLI silently drops them if put in `--json`); `parent_path` is JSON-only (no flag). Queries must use bare table names.
Update dashboard	`databricks lakeview update DASHBOARD_ID --serialized-dashboard "$(cat file.json)"`
Publish	`databricks lakeview publish DASHBOARD_ID --warehouse-id WH`
Delete	`databricks lakeview trash DASHBOARD_ID`

--warehouse flag: if databricks experimental aitools tools query --warehouse WH "..." fails with unknown flag: --warehouse on your CLI version, set DATABRICKS_WAREHOUSE_ID=WH in the environment instead and drop the flag — the command auto-picks it from there.

Wrong version = broken widget! This is the #1 cause of dashboard errors.

Widget Type	Version	Notes
`counter`	2	KPI cards
`table`	2	Data tables
`bar`, `line`, `area`, `pie`, `scatter`	3	Charts
`combo`, `choropleth-map`	1	Advanced charts
`filter-*`	2	All filter types

NEW DASHBOARD CREATION WORKFLOW

You MUST test ALL SQL queries via CLI BEFORE deploying. Follow the overall logic in these steps for new dashboard - Skipping validation causes broken dashboards.

Step 1: Get Warehouse ID if not already known

# List warehouses to find one for SQL execution
databricks warehouses list

Step 2: Discover Table Schemas and existing data pattern

A good dashboard comes from knowing the data first. Spend time here — the exploration drives design decisions in Step 4 (which widgets, which filters, which groupings).

Use discover-schema as the default — one call returns columns, types, sample rows, null counts, and row count. If you only know the schema, list tables first with query "SHOW TABLES IN ...".

databricks experimental aitools tools discover-schema catalog.schema.orders catalog.schema.customers

Sample rows alone don’t tell you what to build. you can write aggregate SQL through databricks experimental aitools tools query --warehouse <WH> "..." to probe typically:

Cardinality of candidate grouping columns → decides chart color-group vs. table (≤8 distinct values for charts, see Cardinality & Readability below).
Top categorical values → populates filter options and chart legends meaningfully.
Numeric distribution (min/max/avg/percentiles) → decides KPI with delta vs. trend chart (flat metrics shouldn’t be line charts, see Data Variance Considerations below).
Trend viability at daily/weekly/monthly grain → picks the right trend granularity.
Story confirmation — run the aggregations you plan to put in the dashboard and check they’re not flat, empty, or uninteresting. Fix the query or adjust the story before moving on.

submit() { databricks api post /api/2.0/sql/statements --json "$(jq -nc --arg w "$1" --arg s "$2" '{warehouse_id:$w,statement:$s,wait_timeout:"0s",on_wait_timeout:"CONTINUE"}')" | jq -r .statement_id; }
SIDS=(); for q in "$@"; do SIDS+=( "$(submit "$WH" "$q")" ); done
for s in "${SIDS[@]}"; do databricks api get "/api/2.0/sql/statements/$s" | jq '{state:.status.state, rows:.result.data_array}'; done
# cancel: databricks api post "/api/2.0/sql/statements/$SID/cancel"

Dashboard queries are different — inside the dashboard JSON, the FROM clause must reference ONLY the table name, with no catalog or schema prefix:

✅ Correct: FROM trips

❌ Wrong: FROM nyctaxi.trips

❌ Wrong: FROM samples.nyctaxi.trips

The catalog and schema are supplied separately via the --dataset-catalog and --dataset-schema flags when you run databricks lakeview create. These flags do NOT rewrite the query — they only fill in the catalog/schema when the query omits them. If you hardcode a catalog or schema in the FROM clause, the flags are ignored for that query and the dashboard won’t be portable across environments.

Step 3: Verify Data Matches Story

The datasets.querylines in the dashboard json (see example below) must be tested to ensure

Before finalizing, run the SQL Queries you intend to add in each dataset to confirm that they run properly and that the result are valid. This is crucial, as the widget defined in the json will use the query field output to render the visualization. The value should also make sense at a business level. Remember that for the filter to work, the query should have the field available (so typically group by the filter field)

If values don’t match expectations, ensure the query is correct, fix the data if you can, or adjust the story before creating the dashboard.

Step 4: Plan Dashboard Structure

Before writing JSON, plan your dashboard:

You must know the expected specific JSON structure. For this, Read reference files: references/1-widget-specifications.md, references/3-filters.md, references/4-examples.md
Think: What widgets? Map each visualization to a dataset: | Widget | Type | Dataset | Has filter field? | |——–|——|———|——————-| | Revenue KPI | counter | ds_sales | ✓ date, region | | Trend Chart | line | ds_sales | ✓ date, region | | Top Products | table | ds_products | ✗ no date | …
What filters? For each filter, verify ALL datasets you want filtered contain the filter field.

Filters only affect datasets that have the filter field. A pre-aggregated table without dates WON’T be date-filtered.
Build the dashboard JSON as a local working file (intermediate step, not the deliverable).

Step 5: Deploy

Now deploy the JSON to the workspace. Run databricks lakeview create (below). Your task is not complete until this command succeeds and returns a dashboard ID — the JSON file alone is an intermediate working artifact.

After deploying, the same lakeview subcommands manage the dashboard’s lifecycle (list, get, update, publish, trash).

# Deploy: creates the dashboard in the workspace and returns a dashboard ID.
# Canonical form — MIX flags + --json. Each field has exactly ONE valid place:
#   --dataset-catalog / --dataset-schema : FLAG-ONLY (REQUIRED — no JSON field).
#       The CLI silently warns "unknown field" and drops them if put in --json,
#       leaving every dataset query unable to resolve its catalog.schema.
#   parent_path : JSON-ONLY (no flag). Without it, dashboard lands at
#       /Users/<you>/<display-name>.
#   display_name / warehouse_id / serialized_dashboard : either form works;
#       prefer flags for readability.
# Queries inside dashboard.json MUST use bare table names ("FROM trips", never
# "FROM schema.trips" or "FROM catalog.schema.trips") — --dataset-catalog and
# --dataset-schema only fill in missing parts, they do NOT rewrite hardcoded
# prefixes.
databricks lakeview create \
  --display-name "My Dashboard" \
  --warehouse-id "abc123def456" \
  --dataset-catalog "my_catalog" \
  --dataset-schema "my_schema" \
  --serialized-dashboard "$(cat dashboard.json)" \
  --json '{"parent_path": "/Workspace/Users/[email protected]/dashboards"}'

# List all dashboards
databricks lakeview list

# Get dashboard details
databricks lakeview get DASHBOARD_ID

# Update a dashboard
databricks lakeview update DASHBOARD_ID --serialized-dashboard "$(cat dashboard.json)"

# Publish a dashboard
databricks lakeview publish DASHBOARD_ID --warehouse-id WAREHOUSE_ID

# Unpublish a dashboard
databricks lakeview unpublish DASHBOARD_ID

# Delete (trash) a dashboard
databricks lakeview trash DASHBOARD_ID

# By default, after creation, tag dashboards to track resources created with this skill
databricks workspace-entity-tag-assignments create-tag-assignment \
  dashboards DASHBOARD_ID aidevkit_project --tag-value ai-dev-kit

JSON Structure (Required Skeleton)

Every dashboard’s serialized_dashboard content must follow this exact structure:

{
  "datasets": [
    {
      "name": "ds_x",
      "displayName": "Dataset X",
      "queryLines": ["SELECT col1, col2 ", "FROM my_table"]
    }
  ],
  "pages": [
    {
      "name": "main",
      "displayName": "Main",
      "pageType": "PAGE_TYPE_CANVAS",
      "layout": [
        {"widget": {/* INLINE widget definition */}, "position": {"x":0,"y":0,"width":2,"height":3}}
      ]
    }
  ]
}

Structural rules (violations cause “failed to parse serialized dashboard”):

queryLines: Array of strings, NOT "query": "string"
Widgets: INLINE in layout[].widget, NOT a separate "widgets" array
pageType: Required on every page (PAGE_TYPE_CANVAS or PAGE_TYPE_GLOBAL_FILTERS)
Query binding: query.fields[].name must exactly match encodings.*.fieldName

Linking a Genie Space (Optional)

To add an “Ask Genie” button to the dashboard, or to link a genie space/room with an ID, add uiSettings.genieSpace to the JSON:

{
  "datasets": [...],
  "pages": [...],
  "uiSettings": {
    "genieSpace": {
      "isEnabled": true,
      "overrideId": "your-genie-space-id-here",
      "enablementMode": "ENABLED"
    }
  }
}

Genie is NOT a widget. Link via uiSettings.genieSpace only. There is no "widgetType": "assistant".

Design Best Practices

Apply unless user specifies otherwise:

Global date filter: When data has temporal columns, add a date range filter. Most dashboards need time-based filtering.
KPI time bounds: Use time-bounded metrics that enable period comparison (MoM, YoY). Unbounded “all-time” totals are less actionable.
Value formatting: Format values based on their meaning — currency with symbol, percentages with %, large numbers compacted (K/M/B).
Chart selection: Match cardinality to chart type. Few distinct values → bar with color grouping (or pie if you really want a snapshot); many values → table.

Reference Files

What are you building?	Reference
Any widget (text, counter, table, chart)	references/1-widget-specifications.md
Advanced charts (area, scatter/Bubble, combo (Line+Bar), Choropleth map)	references/2-advanced-widget-specifications.md
Dashboard with filters (global or page-level)	references/3-filters.md
Need a complete working template to adapt	references/4-examples.md
Debugging a broken dashboard	references/5-troubleshooting.md

Implementation Guidelines

1) DATASET ARCHITECTURE

One dataset per domain (e.g., orders, customers, products). Datasets shared across widgets benefit from the same filters.
Exactly ONE valid SQL query per dataset (no multiple queries separated by ;)
Queries must use bare table names only — no catalog, no schema prefix. Example: FROM orders, never FROM gold.orders or FROM main.gold.orders. The catalog and schema come from the --dataset-catalog and --dataset-schema flags at creation time. These flags only fill in missing parts — they do NOT override any catalog/schema written in the query.
SELECT must include all dimensions needed by widgets and all derived columns via AS aliases
Put ALL business logic (CASE/WHEN, COALESCE, ratios) into the dataset SELECT with explicit aliases
Contract rule: Every widget fieldName must exactly match a dataset column or alias
Add ORDER BY when visualization depends on data order:
- Time series: ORDER BY date for chronological display
- Rankings/Top-N: ORDER BY metric DESC LIMIT 10 for “Top 10” charts
- Categorical charts: ORDER BY metric DESC to show largest values first

CRITICAL: Field Name Matching Rule The name in query.fields MUST exactly match the fieldName in encodings. If they don’t match, the widget shows “no selected fields to visualize” error!

Correct pattern for aggregations:

// In query.fields:
{"name": "sum(spend)", "expression": "SUM(`spend`)"}

// In encodings (must match!):
{"fieldName": "sum(spend)", "displayName": "Total Spend"}

WRONG - names don’t match:

// In query.fields:
{"name": "spend", "expression": "SUM(`spend`)"}  // name is "spend"

// In encodings:
{"fieldName": "sum(spend)", ...}  // ERROR: "sum(spend)" ≠ "spend"

Allowed expressions in widget queries (you CANNOT use CAST or other SQL in expressions):

{"name": "[sum|avg|count|countdistinct|min|max](col)", "expression": "[SUM|AVG|COUNT|COUNT(DISTINCT)|MIN|MAX](`col`)"}
{"name": "[daily|weekly|monthly](date)", "expression": "DATE_TRUNC(\"[DAY|WEEK|MONTH]\", `date`)"}
{"name": "field", "expression": "`field`"}

If you need conditional logic or multi-field formulas, compute a derived column in the dataset SQL first.

3) SPARK SQL PATTERNS

Date math: date_sub(current_date(), N) for days, add_months(current_date(), -N) for months
Date truncation: DATE_TRUNC('DAY'|'WEEK'|'MONTH'|'QUARTER'|'YEAR', column)
AVOID INTERVAL syntax - use functions instead

4) LAYOUT (12-Column Grid, NO GAPS)

Every page must include "layoutVersion": "GRID_V1" alongside pageType.

{
  "name": "overview",
  "displayName": "Overview",
  "pageType": "PAGE_TYPE_CANVAS",
  "layoutVersion": "GRID_V1",
  "layout": [...]
}

Each widget has a position: {"x": 0, "y": 0, "width": 4, "height": 4}

CRITICAL: Each row must fill width=12 exactly. No gaps allowed.

CORRECT:                            WRONG:
y=0: [w=12]                         y=0: [w=8]____  ← gap!
y=1: [w=4][w=4][w=4]  ← fills 12    y=1: [w=2][w=2][w=2][w=2]__  ← gap!
y=4: [w=6][w=6]       ← fills 12

Recommended widget sizes:

Widget Type	Width	Height	Notes
Text header	12	1	Full width; use SEPARATE widgets for title and subtitle
Counter/KPI	4	3-4	NEVER height=2 - too cramped!
Line/Bar/Area chart	6	5-6	Pair side-by-side to fill row
Pie chart	6	5-6	Needs space for legend
Full-width chart	12	5-7	For detailed time series
Table	12	5-8	Full width for readability

Standard dashboard structure:

y=0:  Title (w=12, h=1) - Dashboard title (use separate widget!)
y=1:  Subtitle (w=12, h=1) - Description (use separate widget!)
y=2:  KPIs (w=4 each, h=3) - 3 key metrics side-by-side
y=5:  Section header (w=12, h=1) - "Trends" or similar
y=6:  Charts (w=6 each, h=5) - Two charts side-by-side
y=11: Section header (w=12, h=1) - "Details"
y=12: Table (w=12, h=6) - Detailed data

5) CARDINALITY & READABILITY (CRITICAL)

Dashboard readability depends on limiting distinct values:

Dimension Type	Max Values	Examples
Chart color/groups	3-8	4 regions, 5 product lines, 3 tiers
Filters	4-15	8 countries, 5 channels
High cardinality	Table only	customer_id, order_id, SKU

Before creating any chart with color/grouping:

Check column cardinality via discover-schema or a COUNT DISTINCT query
If >10 distinct values, aggregate to higher level OR use TOP-N + “Other” bucket
For high-cardinality dimensions, use a table widget instead of a chart

6) QUALITY CHECKLIST

Before deploying, verify:

All widget names use only alphanumeric + hyphens + underscores
Every page has "layoutVersion": "GRID_V1"
All rows sum to width=12 with no gaps
KPIs use height 3-4, charts use height 5-6
Chart dimensions have reasonable cardinality (≤8 for colors/groups)
All widget fieldNames match dataset columns exactly
Field name in query.fields matches fieldName in encodings exactly (e.g., both "sum(spend)")
Counter datasets: use disaggregated: true for 1-row datasets, disaggregated: false with aggregation for multi-row
Percent values must be 0-1 for number-percent format (0.865 displays as “86.5%”, don’t forget to set the format). If data is 0-100, either divide by 100 in SQL or use number format instead.
SQL uses Spark syntax (date_sub, not INTERVAL)
All SQL queries tested via CLI and return expected data
Every dataset you want filtered MUST contain the filter field — filters only affect datasets with that column in their query

Data Variance Considerations

Before creating trend charts, check if the metric has enough variance to visualize meaningfully:

SELECT MIN(metric), MAX(metric), MAX(metric) - MIN(metric) as range FROM dataset

If the range is very small relative to the scale (e.g., 83-89% on a 0-100 scale), the chart will appear nearly flat. Consider:

Showing as KPI with delta/comparison instead of chart
Using a table to display exact values
Adjusting the visualization to focus on the variance

databricks-unity-catalog - for querying the underlying data and system tables
databricks-spark-declarative-pipelines - for building the data pipelines that feed dashboards
databricks-jobs - for scheduling dashboard data refreshes

databricks-aibi-dashboards

Skill body

AI/BI Dashboard Skill

Quick Reference

CRITICAL: Widget Version Requirements

NEW DASHBOARD CREATION WORKFLOW

Step 1: Get Warehouse ID if not already known

Step 2: Discover Table Schemas and existing data pattern

Step 3: Verify Data Matches Story

Step 4: Plan Dashboard Structure

Step 5: Deploy

JSON Structure (Required Skeleton)

Linking a Genie Space (Optional)

Design Best Practices

Reference Files

Implementation Guidelines

1) DATASET ARCHITECTURE

2) WIDGET FIELD EXPRESSIONS

3) SPARK SQL PATTERNS

4) LAYOUT (12-Column Grid, NO GAPS)

5) CARDINALITY & READABILITY (CRITICAL)

6) QUALITY CHECKLIST

Data Variance Considerations

databricks-aibi-dashboards

Skill body

AI/BI Dashboard Skill

Quick Reference

CRITICAL: Widget Version Requirements

NEW DASHBOARD CREATION WORKFLOW

Step 1: Get Warehouse ID if not already known

Step 2: Discover Table Schemas and existing data pattern

Step 3: Verify Data Matches Story

Step 4: Plan Dashboard Structure

Step 5: Deploy

JSON Structure (Required Skeleton)

Linking a Genie Space (Optional)

Design Best Practices

Reference Files

Implementation Guidelines

1) DATASET ARCHITECTURE

2) WIDGET FIELD EXPRESSIONS

3) SPARK SQL PATTERNS

4) LAYOUT (12-Column Grid, NO GAPS)

5) CARDINALITY & READABILITY (CRITICAL)

6) QUALITY CHECKLIST

Data Variance Considerations

Related Skills