Agent Skill · SigNoz

signoz-generating-queries

Generate, write, or run an ad-hoc query against SigNoz observability data — metrics, logs, traces, or exceptions — without wrapping it in a dashboard panel or alert. Make sure to use this skill whenever the user asks "show me error rates", "query logs for timeout errors", "what's the p99 latency for the cart service", "how many requests hit the payment endpoint", "find slow traces", "errors in the last hour", or otherwise asks an exploratory question that needs live observability data — even if they don't say "query" or "search" explicitly.

Provider: SigNoz Path in repo: plugins/signoz/skills/signoz-generating-queries/SKILL.md

Skill body

Query Generate

Prerequisites

This skill calls SigNoz MCP server tools heavily (signoz:signoz_execute_builder_query, signoz:signoz_query_metrics, signoz:signoz_search_logs, signoz:signoz_search_traces, signoz:signoz_aggregate_logs, signoz:signoz_aggregate_traces, signoz:signoz_get_field_keys, signoz:signoz_get_field_values, signoz:signoz_list_metrics, signoz:signoz_list_services, signoz:signoz_get_service_top_operations, signoz:signoz_get_trace_details). Before running the workflow, confirm the signoz:signoz_* tools are available. If they are not, run signoz-mcp-setup first to initialize or repair the MCP connection. Do not fall back to raw HTTP calls or fabricate query results without the MCP tools.

When to use

Use this skill when the user asks to:

Do NOT use when:

Instructions

Step 1: Determine the signal type

Map the user’s intent to the right signal:

User intent Signal Why
Error rate, latency, throughput, request count metrics (preferred) or traces Metrics are pre-aggregated and fastest. Use traces if the user needs per-request detail or no matching metric exists.
p50/p75/p90/p95/p99 latency metrics (histogram) or traces (aggregate on durationNano) Prefer metrics if a histogram metric exists (e.g., signoz_latency_bucket). Fall back to trace aggregation.
Find specific log entries, error messages, stack traces logs Text search, pattern matching, severity filtering.
Find specific traces, slow requests, error spans traces Per-request detail, span attributes, duration filtering.
Infrastructure metrics (CPU, memory, disk, network) metrics Always metrics for resource utilization.
“How many X per Y” (count/rate grouped by dimension) traces or logs (aggregate) Use signoz:signoz_aggregate_traces or signoz:signoz_aggregate_logs for grouped counts.

If the signal is genuinely ambiguous, ask the user before proceeding. The host application decides how the question is surfaced (e.g. a structured clarification tool or an inline <assistant_question> tag) — follow the host’s UI rendering rules.

Step 2: Discover available data

Always discover before querying. Use only names returned by tools — never guess from training knowledge.

Run discovery calls in parallel where possible:

If the user already provides exact field names, service names, or metric names from context (e.g., from a dashboard or @mention), skip redundant discovery.

Step 3: Choose the right tool

Use the simplest tool that answers the question:

Question type Tool When to use
Metric time series or scalar signoz:signoz_query_metrics Any metrics query. Handles aggregation defaults automatically. Supports formulas via formula + formulaQueries params.
Log search (find matching entries) signoz:signoz_search_logs Finding specific log lines. Use searchText for body text, query for field filters, severity for level filtering.
Trace search (find matching spans) signoz:signoz_search_traces Finding specific traces/spans. Use service, operation, error, minDuration/maxDuration shortcuts plus query for field filters.
Log aggregation (count, avg, percentiles) signoz:signoz_aggregate_logs “How many errors?”, “error count by service”, “p99 response time from logs”. Set requestType to scalar for totals or time_series for trends.
Trace aggregation (count, avg, percentiles) signoz:signoz_aggregate_traces “p99 latency for checkout”, “error count per operation”, “request rate by endpoint”. Set requestType to scalar for totals or time_series for trends.
Complex multi-query or formula signoz:signoz_execute_builder_query Only when the simpler tools above cannot express the query — e.g., joining multiple data sources, complex filter expressions, or queries needing the full Query Builder v5 schema. Read signoz://traces/query-builder-guide before using.

requestType decision for aggregations:

Step 4: Execute the query

Step 5: Handle results

Data returned:

No data returned — apply three-way distinction:

  1. Healthy zero: The query ran successfully but the count is zero. Say so: “No errors found for checkout-service in the last hour — error count is zero.”
  2. No data in range: The field/metric exists but no data points fall in the time window. Suggest expanding: “No data in the last hour. Try a wider range?”
  3. Missing instrumentation: The metric, field, or service doesn’t exist in discovery results. Say what’s missing and suggest how to instrument.

Drill-down:

Guardrails

Examples

User: “Show me the error rate for the checkout service in the last hour”

Agent:

  1. Calls signoz:signoz_list_metrics(searchText: "error") — finds signoz_calls_total metric.
  2. Calls signoz:signoz_query_metrics with metricName: "signoz_calls_total", filter: "service.name = 'checkout'", formula: "A / B * 100" where A is error count and B is total count, timeRange: "1h".
  3. Presents: “Error rate for checkout-service: 2.3% over the last hour (14:00– 15:00 UTC). 47 errors out of 2,041 total requests.”
  4. Offers drill-down: “Want me to check which operations have the highest error rate?”

User: “Find timeout errors in logs”

Agent:

  1. Calls signoz:signoz_search_logs(searchText: "timeout", severity: "ERROR", timeRange: "1h").
  2. Presents: “Found 23 timeout errors in the last hour. Most are from payment-service (18) and gateway (5). Common pattern: ‘context deadline exceeded’ in gRPC calls to downstream-api.”
  3. Offers: “Want me to check traces for these services to see where the timeouts occur?”

User: “What’s the p99 latency for the cart service?”

Agent:

  1. Calls signoz:signoz_aggregate_traces(aggregation: "p99", aggregateOn: "durationNano", service: "cart-service", requestType: "scalar", timeRange: "1h").
  2. Presents: “p99 latency for cart-service: 1.2s over the last hour.”
  3. Offers: “Want me to break this down by operation or show the trend over time?”

User: “When did errors spike for the frontend?”

Agent:

  1. Calls signoz:signoz_aggregate_traces(aggregation: "count", error: "true", service: "frontend", requestType: "time_series", timeRange: "6h").
  2. Presents: “Error count for frontend over the last 6 hours. Spike at 11:30 UTC — error count jumped from ~5/min to ~45/min, returning to baseline by 12:15.”
  3. Offers: “Want me to check what error types appeared during the spike?”