Skip to content
AGH RuntimeSessions

Session Health

Metadata-only session presence, attachability, wake eligibility, and boundaries with HEARTBEAT.md policy, task leases, and network greet presence.

Audience
Operators running durable agent work
Focus
Sessions guidance shaped for scanability, day-two clarity, and operator context.

Session health is a metadata-only runtime primitive AGH maintains for every active session. It answers four questions without ever calling the model:

  1. What state is this session in?
  2. Is it healthy or degraded?
  3. Can the runtime deliver a prompt to it right now?
  4. Is the session eligible for synthetic wake from HEARTBEAT.md policy?

What health describes

Health is stored alongside the session and maintained by the daemon. The wake service reads it, the CLI/HTTP/UDS surfaces expose it, and HEARTBEAT.md consumes it but never implements it.

FieldMeaning
stateidle, prompting, stopped, detached.
healthhealthy, degraded, stale, dead, unknown.
active_promptA user/ACP prompt is currently in flight.
attachableRuntime can deliver a prompt to this session right now.
eligible_for_wakeThe wake service may target the session.
ineligibility_reasonClosed enum reason from the wake-state set; never free text.
last_activity_atLast real activity update from active prompts (ACP events or supervision heartbeats).
last_presence_atLast metadata-only presence touch.
last_errorRedacted last error if any.
updated_atLast health write.

Health updates never inject prompts, never renew task leases, never call the model, and never change task ownership. Editing or deleting HEARTBEAT.md does not change health.

Where health comes from

TriggerWhat changes
Session start, resume, stopstate and attachable.
Active prompt activity (ACP events or supervision heartbeats)active_prompt, last_activity_at, health recovers toward healthy.
Idle presence touchlast_presence_at.
Idle age past [agents.heartbeat].session_health_stale_afterhealth transitions toward stale; eligible_for_wake may flip to false.
Daemon restartHealth is recomputed before any wake consumer runs.

[session.supervision] continues to own active-prompt activity timers, runtime progress events, warnings, and the inactivity timeout. It does not write to session_health.

Reading health

CLI, HTTP, and UDS share the same DTOs. --json/-o json returns deterministic structured output suitable for agents.

agh session health sess_123 --json
agh session status sess_123 --json
agh session inspect sess_123 --json
curl -s http://localhost:2123/api/sessions/sess_123/health
curl -s http://localhost:2123/api/sessions/sess_123/status
curl -s http://localhost:2123/api/sessions/sess_123/inspect

To list sessions with health summary attached:

curl -s "http://localhost:2123/api/sessions?include_health=true"

agh session inspect adds Heartbeat policy correlation (snapshot id, digest, wake state, last audit) and any redacted last-error context for diagnostics.

Wake eligibility

eligible_for_wake is computed by the daemon from health, attachability, supervision state, and config. The wake service rejects ineligible sessions before doing any work and returns a closed deterministic reason:

ReasonWhen AGH returns it
session_not_foundThe target session id was not resolved.
session_unhealthyHealth is degraded, stale, dead, or unknown such that wake is unsafe.
session_not_attachableRuntime cannot deliver a prompt right now (stopping, stopped, detached, or supervision down).
session_prompt_activeAn active prompt is already in flight.
session_prompt_active_raceA prompt started concurrently inside the wake gate; the wake is skipped to avoid stomping.
cooldown_activeThe session is inside wake_cooldown from a prior wake.
quiet_windowThe current instant fell inside an authored quiet window.
heartbeat_disabled[agents.heartbeat].enabled=false or the file disables policy locally.
heartbeat_invalidThe latest snapshot is invalid; wake is paused until the file is fixed.
heartbeat_no_policyNo valid snapshot exists for the agent.
heartbeat_rate_limitedThe daemon hit max_wakes_per_cycle for this scheduler cycle.
wake_coalescedA previous wake attempt is still in progress; this one was coalesced.
synthetic_prompt_failedSynthetic prompt enqueue failed.

These reasons appear in agh agent heartbeat status, agh agent heartbeat wake, and the /api/agents/{name}/heartbeat/status|wake responses.

Boundaries

  • Session health is not task-run lease heartbeat. task_runs + ClaimNextRun + HeartbeatRunLease remain the only authority for run ownership.
  • Session health is not [session.supervision]. Supervision drives the durable activity timers and runtime progress events; health is consumed metadata.
  • Session health is not HEARTBEAT.md. The authored file expresses wake/reentry preferences within config bounds. Health is always present; HEARTBEAT.md is optional.
  • Session health is not AGH Network presence. Peer presence and greet_interval belong to the protocol model. A peer being online never makes a session wake-eligible by itself.
  • Session health is not an event bus. session.health.update.after fires on transitions only and is rate-limited by [agents.heartbeat].session_health_hook_min_interval.

Repair and troubleshooting

If a session looks unhealthy unexpectedly:

  1. Inspect with agh session inspect <session-id> -o json.
  2. Check supervision state in agh session status — long-running prompts can show idle timers.
  3. Verify [agents.heartbeat].session_health_stale_after is not too aggressive for your idle pattern.
  4. If the daemon was restarted, allow recovery to recompute health before retrying a wake.
  5. Use agh session repair for stale active/ stopping/starting metadata after an unclean exit. Repair never alters wake state.

On this page