Grafana + Prometheus Exporter #15

Open
opened 2026-05-23 20:54:12 +00:00 by coilysiren · 0 comments
Owner

Originally filed by @coilysiren on 2026-05-02T23:09:29Z - https://github.com/coilysiren/eco-mcp-app/issues/10

Task 1 - Grafana + Prometheus Exporter

Unblocked 2026-05-02. The homelab now has VictoriaMetrics (vmsingle) + Grafana running in the observability namespace on kai-server, with a public ingress at https://grafana.coilysiren.me. vmagent scrapes node-exporter today; pointing it at a new exporter is a values-file edit in infrastructure/deploy/observability/vmagent-values.yml.

Dashboards now live in Terraform at infrastructure/terraform/grafana/, sourced from YAML files under dashboards/. Deliverable 3 (the Eco dashboard) lands there as dashboards/eco-mcp.yaml, not as exporter/dashboards/eco.json in this repo. Update the Deliverables section accordingly when you start the work.

Prereq when starting: read todo/README.md first. Also read /Users/kai/projects/coilysiren/kai-server for cluster conventions - Claude can't run write-kubectl directly, GH Actions → cluster is the path. For the dashboard half, read /Users/kai/projects/coilysiren/infrastructure/terraform/grafana/README.md.

Goal

Build a Prometheus exporter for the Eco server and a Grafana dashboard that renders snapshots via an MCP tool.

Endpoints to scrape (every 30 s — do not go faster)

Public:

  • GET /info
  • GET /datasets/flatlist (one-time at startup to discover stat names) + GET /datasets/get?dataset=X&dayStart=0&dayEnd=<now> for ~15 chosen stats.
    • All of these are in /datasets/flatlist (verified): PayWages, RepaidLoanOrBond, DefaultedOnLoanOrBond, PostedContract, CompletedContract, PropertyTransfer, ReputationTransfer, TransferMoney, PayTax, SettlementFounded, BecomeCitizen, ItemCraftedAction, ChopTree, HarvestOrHunt. Do NOT try TotalCulture as a dataset — it's only a /info field and /datasets/get?dataset=TotalCulture returns 500.
    • Also expose derived gauges: online_players, days_running (from /info).

Admin (X-API-Key from SSM /eco-mcp-app/api-admin-token, region us-east-1):

  • GET /api/v1/users?hoursPlayedGte=0

Deliverables

  1. New top-level directory exporter/ with a Python /metrics endpoint using prometheus_client:

    • Gauges for snapshot values (players online, days running, active loans, etc.).
    • Counters (with _total suffix) for cumulative event series (wages paid, contracts completed).
  2. exporter/Dockerfile + k3s manifest at deploy/exporter.yml, matching the backend repo deploy pattern. See /Users/kai/projects/coilysiren/backend for the canonical rig (Dockerfile shape, Makefile targets, GH Actions publish, Traefik ingress). Add a vmagent scrape config for the exporter's /metrics endpoint to infrastructure/deploy/observability/vmagent-values.yml (cross-repo edit; commit there separately).

  3. Grafana dashboard YAML at infrastructure/terraform/grafana/dashboards/eco-mcp.yaml (NOT in this repo) with panels:

    • Players Online (gauge + timeseries)
    • Loan Defaults (rate panel)
    • Wage Velocity (rate panel)
    • Contracts Completed (counter rate)
    • Top Craft Events (table, top-N by ItemCraftedAction delta)

    Wire it up by adding a grafana_dashboard resource in infrastructure/terraform/grafana/dashboards.tf and running inv k8s.terraform-grafana --action apply from infrastructure/. UID: eco-mcp. Datasource: VictoriaMetrics.

  4. New MCP tool get_grafana_snapshot(panel_id) in src/eco_mcp_app/server.py that calls Grafana's /render API and inlines the resulting PNG as a data URI in an iframe card (CSP, per claude-ai-mcp#40).

Constraints

  • Scrape cadence: 30 s floor. The datasets endpoints serve heavy CSVs; going faster risks backpressure on the game server.
  • Grafana already sits behind the homelab's Traefik. Do not build custom auth.
  • Claude Code in this repo cannot run write-kubectl. Deploys land via GH Actions; mirror the pattern in deploy/main.yml.
  • The dashboard is a YAML file consumed by yamldecode + jsonencode in terraform. Don't author it as raw JSON.

Acceptance

  • Exporter returns non-zero gauges for players_online and at least one counter series.
  • vmagent in the observability namespace successfully scrapes the new exporter (scrape_samples_scraped > 0 for the new job).
  • terraform plan from infrastructure/terraform/grafana/ shows the new dashboard resource cleanly; apply lands it in Grafana with the correct datasource.
  • MCP tool smoke-tested via inv smoke.
  • k3s manifest deploys via the existing GH Actions → cluster path (do not attempt from local).
_Originally filed by @coilysiren on 2026-05-02T23:09:29Z - [https://github.com/coilysiren/eco-mcp-app/issues/10](https://github.com/coilysiren/eco-mcp-app/issues/10)_ # Task 1 - Grafana + Prometheus Exporter > **Unblocked 2026-05-02.** The homelab now has VictoriaMetrics (`vmsingle`) + Grafana running in the `observability` namespace on kai-server, with a public ingress at `https://grafana.coilysiren.me`. vmagent scrapes node-exporter today; pointing it at a new exporter is a values-file edit in `infrastructure/deploy/observability/vmagent-values.yml`. > > Dashboards now live in Terraform at `infrastructure/terraform/grafana/`, sourced from YAML files under `dashboards/`. Deliverable 3 (the Eco dashboard) lands there as `dashboards/eco-mcp.yaml`, not as `exporter/dashboards/eco.json` in this repo. Update the Deliverables section accordingly when you start the work. **Prereq when starting**: read `todo/README.md` first. Also read `/Users/kai/projects/coilysiren/kai-server` for cluster conventions - Claude can't run write-kubectl directly, GH Actions → cluster is the path. For the dashboard half, read `/Users/kai/projects/coilysiren/infrastructure/terraform/grafana/README.md`. ## Goal Build a Prometheus exporter for the Eco server and a Grafana dashboard that renders snapshots via an MCP tool. ## Endpoints to scrape (every 30 s — do not go faster) Public: - `GET /info` - `GET /datasets/flatlist` (one-time at startup to discover stat names) + `GET /datasets/get?dataset=X&dayStart=0&dayEnd=<now>` for ~15 chosen stats. - All of these are in `/datasets/flatlist` (verified): `PayWages`, `RepaidLoanOrBond`, `DefaultedOnLoanOrBond`, `PostedContract`, `CompletedContract`, `PropertyTransfer`, `ReputationTransfer`, `TransferMoney`, `PayTax`, `SettlementFounded`, `BecomeCitizen`, `ItemCraftedAction`, `ChopTree`, `HarvestOrHunt`. **Do NOT** try `TotalCulture` as a dataset — it's only a `/info` field and `/datasets/get?dataset=TotalCulture` returns 500. - Also expose derived gauges: `online_players`, `days_running` (from `/info`). Admin (`X-API-Key` from SSM `/eco-mcp-app/api-admin-token`, **region `us-east-1`**): - `GET /api/v1/users?hoursPlayedGte=0` ## Deliverables 1. New top-level directory `exporter/` with a Python `/metrics` endpoint using `prometheus_client`: - Gauges for snapshot values (players online, days running, active loans, etc.). - Counters (with `_total` suffix) for cumulative event series (wages paid, contracts completed). 2. `exporter/Dockerfile` + k3s manifest at `deploy/exporter.yml`, matching the `backend` repo deploy pattern. See `/Users/kai/projects/coilysiren/backend` for the canonical rig (Dockerfile shape, Makefile targets, GH Actions publish, Traefik ingress). Add a vmagent scrape config for the exporter's `/metrics` endpoint to `infrastructure/deploy/observability/vmagent-values.yml` (cross-repo edit; commit there separately). 3. Grafana dashboard YAML at `infrastructure/terraform/grafana/dashboards/eco-mcp.yaml` (NOT in this repo) with panels: - Players Online (gauge + timeseries) - Loan Defaults (rate panel) - Wage Velocity (rate panel) - Contracts Completed (counter rate) - Top Craft Events (table, top-N by `ItemCraftedAction` delta) Wire it up by adding a `grafana_dashboard` resource in `infrastructure/terraform/grafana/dashboards.tf` and running `inv k8s.terraform-grafana --action apply` from `infrastructure/`. UID: `eco-mcp`. Datasource: `VictoriaMetrics`. 4. New MCP tool `get_grafana_snapshot(panel_id)` in `src/eco_mcp_app/server.py` that calls Grafana's `/render` API and **inlines the resulting PNG as a data URI** in an iframe card (CSP, per `claude-ai-mcp#40`). ## Constraints - **Scrape cadence: 30 s floor.** The datasets endpoints serve heavy CSVs; going faster risks backpressure on the game server. - Grafana already sits behind the homelab's Traefik. **Do not build custom auth.** - Claude Code in this repo cannot run write-kubectl. Deploys land via GH Actions; mirror the pattern in `deploy/main.yml`. - The dashboard is a YAML file consumed by `yamldecode` + `jsonencode` in terraform. Don't author it as raw JSON. ## Acceptance - Exporter returns non-zero gauges for `players_online` and at least one counter series. - vmagent in the `observability` namespace successfully scrapes the new exporter (`scrape_samples_scraped > 0` for the new job). - `terraform plan` from `infrastructure/terraform/grafana/` shows the new dashboard resource cleanly; `apply` lands it in Grafana with the correct datasource. - MCP tool smoke-tested via `inv smoke`. - k3s manifest deploys via the existing GH Actions → cluster path (do not attempt from local).
coilysiren added
P2
and removed
P1
labels 2026-05-31 07:00:22 +00:00
Sign in to join this conversation.
No labels
P0
P1
P2
P3
P4
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
coilyco-flight-deck/eco-mcp-app#15
No description provided.