# Telemetry Skill

Use this skill when you need to instrument a codebase with structured logs, send events to Telemetry, verify ingestion, and create a high-level dashboard that humans can use.

Telemetry is optimized for this loop:

1. An agent instruments the codebase and emits structured events.
2. Telemetry stores those events as queryable tables.
3. Humans review charts, tables, dashboards, and alerts to understand what is happening.

## Inputs you need

- A Telemetry API key
- Access to the codebase you should instrument
- Enough product context to identify the important workflows

## Your objective

Instrument the project so the team can answer questions like:

- What are the most important workflows and are they succeeding?
- Where is latency increasing?
- What failures or retries are happening?
- What volume, cost, or throughput trends matter?
- Which dashboard should a human open first to understand system health?

## Core workflow

1. Inspect the codebase and identify the main user-facing flows, background jobs, queues, cron jobs, and AI/tooling workflows.
2. Add structured logs around the key steps in those workflows.
3. Send at least one real or test event into Telemetry.
4. Verify that the data arrived with a query.
5. Create a high-level dashboard with charts and tables.
6. Report back with what was instrumented, which tables were created, and which dashboard views matter most.

## Logging conventions

### Prefer pragmatic table design

- Use one table per domain or workflow, not one table for the whole system.
- Prefer stable, boring table names in `snake_case`.
- Good examples: `http_requests`, `queue_jobs`, `agent_runs`, `llm_calls`, `billing_events`.

### Prefer structured fields over log blobs

- Use flat or lightly nested JSON.
- Prefer explicit fields over dumping long free-form strings.
- Keep field names in `snake_case`.

### Include the important dimensions

Capture the signals that make a dashboard useful:

- `timestamp`
- identifiers like `user_id`, `workspace_id`, `request_id`, `job_id`, `trace_id`
- workflow names like `route`, `task_name`, `agent_name`, `tool_name`, `model`
- result fields like `status`, `outcome`, `error_type`, `error_message`
- numeric measures like `duration_ms`, `latency_ms`, `cost_usd`, `input_tokens`, `output_tokens`, `retry_count`
- environment fields like `environment`, `service`, `region`, `version`

### Emit events at meaningful boundaries

Prefer events for completed units of work:

- request finished
- job started or finished
- retry happened
- tool call completed
- AI generation completed
- payment succeeded or failed

Avoid giant payloads that try to capture everything about the system in one row.

## Telemetry APIs

### Ingest events

`POST https://api.telemetry.sh/log`

Headers:

- `Content-Type: application/json`
- `Authorization: <API_KEY>` or `Authorization: Bearer <API_KEY>`

Example:

```json
{
  "table": "agent_runs",
  "data": {
    "agent_name": "planner",
    "workflow": "onboarding",
    "status": "success",
    "duration_ms": 842,
    "workspace_id": "ws_123",
    "timestamp": "2026-03-28T18:30:00Z"
  }
}
```

Notes:

- Telemetry adds `timestamp` automatically if it is missing.
- Table names are normalized to lowercase `snake_case`.
- Null and empty values are pruned.

### Verify with SQL

`POST https://api.telemetry.sh/query`

Headers:

- `Content-Type: application/json`
- `Authorization: <API_KEY>` or `Authorization: Bearer <API_KEY>`

Example:

```json
{
  "query": "SELECT status, COUNT(*) AS runs FROM agent_runs GROUP BY status ORDER BY runs DESC",
  "realtime": true,
  "json": true
}
```

Use queries like this to confirm:

- the table exists
- the expected columns are present
- rows are arriving with reasonable values

### Create a dashboard

`POST https://api.telemetry.sh/dashboard`

Headers:

- `Content-Type: application/json`
- `Authorization: <API_KEY>` or `Authorization: Bearer <API_KEY>`

Example:

```json
{
  "name": "Agent Overview",
  "description": "High-level health for the most important automated workflows",
  "widgets": [
    {
      "title": "Runs Per Hour",
      "widget_type": "query",
      "config": {
        "querySql": "SELECT date_trunc('hour', timestamp_utc) AS hour, COUNT(*) AS runs FROM agent_runs GROUP BY hour ORDER BY hour ASC",
        "chartType": "Line Chart",
        "xAxis": "hour",
        "yAxis": "runs",
        "groupBy": null
      },
      "layout": { "x": 0, "y": 0, "w": 6, "h": 4 }
    },
    {
      "title": "Recent Failures",
      "widget_type": "query",
      "config": {
        "querySql": "SELECT timestamp_utc, workflow, error_type, error_message FROM agent_runs WHERE status != 'success' ORDER BY timestamp_utc DESC LIMIT 100",
        "chartType": "table",
        "xAxis": "",
        "yAxis": "",
        "groupBy": null
      },
      "layout": { "x": 6, "y": 0, "w": 6, "h": 4 }
    }
  ]
}
```

## Dashboard guidance

Your first dashboard should be high signal, not exhaustive.

Prefer 4 to 8 widgets covering:

- volume over time
- success vs failure rate
- latency or duration trends
- cost or token usage when AI workflows matter
- a recent failures table
- a top entities table such as slow routes, failing jobs, or expensive models

Good dashboard names:

- `Operations Overview`
- `Agent Runs`
- `LLM Usage`
- `Queue Health`
- `API Health`

## Recommended workflow by domain

### For web apps

Instrument:

- request completion
- endpoint latency
- status codes
- user actions tied to business outcomes

### For background workers

Instrument:

- job start and completion
- retries
- queue wait time
- processing duration
- failure reasons

### For AI products

Instrument:

- model name
- prompt category or workflow
- input tokens
- output tokens
- total cost
- latency
- tool usage
- final outcome

## Success criteria

You are done when:

1. Important workflows are instrumented with structured logs.
2. At least one event has been ingested successfully.
3. A verification query confirms the data shape.
4. A high-level dashboard exists with charts and tables.
5. A human can open Telemetry and quickly understand the state of the system.

## Report back clearly

When you finish, summarize:

- which files and workflows you instrumented
- which Telemetry tables you created
- which fields each table contains
- which dashboard you created
- which widgets are most important
- what follow-up instrumentation would improve coverage

## Links

- Docs: https://telemetry.sh/docs
- Log API: https://telemetry.sh/docs/api-reference/log
- Query API: https://telemetry.sh/docs/api-reference/query
- Dashboard API: https://telemetry.sh/docs/api-reference/dashboard
