Review Gate
Post-terminal review of task runs, reviewer routing, persisted verdicts, continuation runs, and the runtime authority boundary.
- Audience
- Operators running durable agent work
- Focus
- Autonomy guidance shaped for scanability, day-two clarity, and operator context.
The review gate is a post-terminal quality check on a task run. After a worker terminalizes a run,
task.Service may create a review request, the daemon ReviewRouter selects a reviewer, the
reviewer submits a typed verdict, and either the terminal outcome is accepted or a new continuation
run is enqueued with bounded missing_work and next_round_guidance. The gate never rewrites the
reviewed run's history, and channels, bridge messages, skills, notification cursors, and the web UI
are never verdict authority.
Authority boundary
Read this section first. Everything else on the page operates inside it.
task.Service.RecordRunReviewis the only path that records a verdict, updates task review rollups, or creates a rejected-review continuation run. It runs in oneBEGIN IMMEDIATEtransaction with the verdict, rollup fields, review events, and any continuation row written together.task_runsremains the only durable execution queue and ownership source. Review does not change a terminal run'sstatus. A failed run that is later approved stays failed; an approved completed run stays completed.- A reviewer session must be bound to a persisted review request through
task.Service.BindRunReviewSessionbefore the nativesubmit_run_reviewtool is even visible to the session. Review id alone is not authorization. - Channels, bridge deliveries, notification cursors, prompt overlays, skills, and web cards never approve, reject, block, or retry a review. They surface, route, or coordinate.
TaskExecutionProfile.Reviewis reviewer-selection input. It chooses who reviews; it never decides the verdict.
Lifecycle
- A worker terminalizes a run through the existing token-fenced execution paths (Task Runs and Leases).
- The terminal transition writes the review trigger fields on the run row
(
review_required,review_request_round,review_policy_snapshot) along with the terminal status, summary, and event row. - After the terminal commit,
task.Serviceruns a follow-up transaction. If the captured policy matches the run's terminal status, it inserts an attempt-1 review request, links the newreview_idintotask_runs.review_request_id, clearsreview_required, and emitstask.run_review_requested. The follow-up is idempotent on(run_id, review_round, attempt = 1); duplicate triggers return the existing row without emitting a second event. task.Servicecalls the daemonReviewRoutercallback at the same call site. Wake-up is a nudge only; the router reads persisted state through task service before doing anything.ReviewRouterresolves the effective reviewer selector from the persisted execution profile (profile.review) and active session/task state. It excludes the original worker by default, then either reuses an eligible active reviewer session or creates a local reviewer session through the daemon composition root.- The daemon binds the reviewer session to the review request through
task.Service.BindRunReviewSessionbefore exposingsubmit_run_review. The bundledagh-task-reviewerskill loads only because the binding is durable. - The reviewer submits exactly one typed verdict.
task.Service.RecordRunReviewvalidates the review/run link, actor identity, active binding when the actor is a reviewer session, idempotency, payload bounds, and status transition. - The transaction either accepts the terminal outcome (
approved), enqueues exactly one continuation run (rejected), records a blocked review/circuit diagnostic (blocked), or stores an evaluation diagnostic (error,timeout,invalid_output).
Review request creation is idempotent on (run_id, review_round, attempt = 1), so callers can
retry the request path without creating duplicate review rows. There is no second queue.
Review policy and outcomes
[task.orchestration.review] defines the defaults; each task may override within those bounds.
The policy decides whether a terminal run triggers a review:
| Policy | Triggers review on |
|---|---|
none | Never. The default. |
on_success | Completed runs only. |
on_failure | Failed or canceled runs only. |
always | Completed, failed, and canceled runs. |
Verdict outcome values are typed. The terminal review row stores
status = "recorded" plus the typed outcome; never store outcomes as review statuses:
| Outcome | Effect |
|---|---|
approved | Accepts the run's terminal outcome. Emits task.run_review_approved. missing_work must be empty. |
rejected | Records missing work and enqueues exactly one continuation run linked by review_id. Emits task.run_review_retry_enqueued. |
blocked | Records a blocked review and writes the task review circuit rollup fields with the bounded reason. |
error | Records a reviewer/tool execution diagnostic. Emits task.run_review_error; it does not create a continuation run. |
timeout | Records that the review exceeded its expected window. Emits task.run_review_timeout; it does not create a continuation run. |
invalid_output | Records that the run output could not be evaluated. Emits task.run_review_invalid_output; it does not create a continuation run. |
approved requires empty missing_work. rejected requires at least one missing_work item or
non-empty next_round_guidance. Both fields are bounded by
missing_work_max_items/missing_work_item_max_bytes and next_round_guidance_max_bytes from
[task.orchestration.review]. confidence must be in [0, 1]. Every verdict needs a delivery_id;
task.Service.RecordRunReview is idempotent only when review id, run id, actor identity, outcome,
and delivery id all match the persisted row.
[task.orchestration.review] also carries bounded policy fields such as max_review_attempts,
rapid_terminal_window, rapid_terminal_limit, and failure_policy. They are validated config
state and visible to agents, but the current shipped verdict path does not expose a standalone
review-circuit reset command or a retry-attempt API.
Reviewer routing and binding
ReviewRouter (internal/daemon) is the runtime composition-root component that turns a persisted
review request into a bound reviewer session. It is wake-driven, not event-tail driven.
Selector inputs come from the persisted TaskExecutionProfile.Review block. See
Task Execution Profiles for the full overlay model.
The router reads the normalized review agent/provider/model, allowed/preferred agent names,
allowed/preferred peer ids, allowed/preferred channel ids, and required/preferred capabilities.
Selection order, after resolving the effective selector:
- Existing active sessions in the task workspace are scored first. Exact/preferred agent names, preferred peer ids, preferred channel ids, and preferred capabilities increase the score.
- If no active session matches, the router creates a local reviewer session for the exact, preferred, or allowed agent that satisfies required capabilities.
- If peer selectors require an explicit peer and no active eligible peer exists, the router fails closed rather than creating an unrelated local reviewer.
- If nothing matches, the router records a deterministic no-route diagnostic as a typed
blockedoutcome throughtask.Service.RecordRunReviewwith areview-router:no-route:delivery id, so the no-route reason becomes part of review history instead of hidden channel state.
Eligibility rules:
- Default is
allow_original_worker = false. The session, agent, peer, and actor identity that terminalized the run cannot be selected unless the policy explicitly allows it. When the original-worker identity cannot be determined, routing fails closed. - Channel membership and peer authorization remain enforced by the network/bridge subsystems.
ReviewProfileselectors narrow routing; they do not grant access.
Session binding (task.Service.BindRunReviewSession) is authorization state, not prompt context.
It runs in a task-service transaction, verifies the review is still routed or requested, sets
reviewer_session_id, started_at, deadline_at, and status = "in_review", and rejects any
second active session binding. The native submit_run_review tool calls
LookupReviewForSession(session_id) on every invocation; a session without a matching active
binding sees ErrToolUnavailable and cannot submit a verdict.
Continuation runs
A rejected verdict creates exactly one continuation run inside the same verdict transaction. The
continuation row is a normal task_runs row, not a hidden retry lane:
parent_run_idpoints at the reviewed terminal run.review_idpoints at the rejected review row.review_roundis the next round number.continuation_reason = "review_rejected".missing_work_jsonandnext_round_guidanceare bounded copies from the verdict, ready for the next worker's context bundle.
Continuation profile precedence — see Task Execution Profiles:
- The task's current
TaskExecutionProfileat enqueue time controls worker, reviewer, participant, coordinator-guidance, and sandbox selection for the continuation. - The reviewed run's native coordination/capability fields are copied forward only when the profile leaves the equivalent worker/participant selectors empty.
- The continuation run's review columns provide context and lineage. They do not override worker selection or grant permissions.
The next worker reads continuation context through TaskContextBundle.ReviewContinuation and
TaskContextBundle.ReviewHistory in the /agent/context task bundle. The implemented API
reference is Agent API, and the CLI path is
agh me context. Workers must still claim the run through
ClaimNextRun and mutate it through session-bound lease lookup. The continuation context is
guidance, not permission.
Idempotent rejected-verdict replay returns the existing continuation by
task_runs.review_id = review_id. The transaction never enqueues a duplicate when the same
delivery_id is replayed.
Manage reviews from the CLI
The agh task review command group operates over UDS and shares its surface with the matching
HTTP endpoints. Every subcommand supports -o json|jsonl|toon for agent consumption.
# Request a review for a terminal run.
agh task review request <run-id> --policy on_success --reason "audit before promotion" -o json
# List reviews for a task or run.
agh task review list --task <task-id> --status recorded -o json
agh task review list --run <run-id> --status routed -o json
# Inspect one review.
agh task review show <review-id> -o json
# Submit the bound verdict (review id is the path argument; --run is required for the run link).
agh task review submit <review-id> \
--run <run-id> \
--outcome rejected \
--confidence 0.4 \
--reason "missing migration safety check" \
--missing-work "add ON DELETE behavior" \
--next-round-guidance "Re-run with explicit cascade analysis" \
--delivery-id rev-123-attempt-1 \
-o jsonGenerated reference for each verb:
- agh task review
- agh task review request
- agh task review list
- agh task review show
- agh task review submit
agh task review submit is the operator-facing path. Submitting from the CLI uses server-derived
operator actor authorization. It does not surface or call the reviewer-bound native tool, and it
cannot be invoked by an unbound session.
Manage reviews through HTTP and UDS
Both transports mount the same shared core handlers, so HTTP and UDS responses are identical and
refer to the same authority. Operation IDs come from openapi/agh.json; do not paraphrase the
shapes.
| Method | Path | Operation ID | Purpose |
|---|---|---|---|
POST | /api/task-runs/{id}/reviews | requestTaskRunReview | Create a review request for a run. |
GET | /api/task-runs/{id}/reviews | listTaskRunReviews | List reviews scoped to one run; supports status, reviewer_session_id, limit. |
GET | /api/tasks/{id}/reviews | listTaskReviews | List reviews scoped to one task. |
GET | /api/task-reviews/{id} | getTaskRunReview | Show one review row. |
POST | /api/task-reviews/{id}/verdict | submitTaskRunReviewVerdict | Persist the verdict through task.Service.RecordRunReview. |
status filters use the persisted review status enum (requested, routed, in_review,
recorded, circuit_opened, canceled). The verdict request body is the typed
RunReviewVerdict; outcome uses the verdict-outcome enum from this page (approved, rejected,
blocked, error, timeout, invalid_output). Generated TypeScript types for web consumers live
in web/src/generated/agh-openapi.d.ts.
Reviewer-bound native tool
In-session reviewers submit verdicts through one model-facing tool. AGH registers the tool
internally with the agh__ prefix so toolset routing can hide it when the session is not bound.
| Model-facing name | Internal id | Authority |
|---|---|---|
submit_run_review | agh__task_run_review_submit | Persists the verdict through task.Service.RecordRunReview. Available only to reviewer-bound sessions. |
Visibility rules enforced by the runtime:
- The bundled
agh-task-reviewerskill must be active. Itsmetadata.agh.requires_review_requestflag is the load trigger; the tool is hidden until the loader sees an active review binding. LookupReviewForSession(session_id)must return a binding that matches the call'sreview_id. Mismatched ids return an unavailable-tool error and never reach the verdict path.- The tool never inspects claim leases or exposes raw claim tokens. Reviewer sessions do not need an active claim.
- Operator/API/UDS/CLI verdict submissions go through the explicit
/api/task-reviews/{id}/verdictHTTP/UDS endpoint oragh task review submitwith server-derived operator identity. There is no debug-only native-tool bypass for unbound sessions.
The reviewer-related auxiliary tools — task_run_review_request, task_run_review_list, and
task_run_review_show (registered as agh__task_run_review_request,
agh__task_run_review_list, and agh__task_run_review_show) — are read/request helpers that
delegate to task.Service.RequestRunReview, task.Service.ListRunReviews, and
task.Service.GetRunReview. They never bypass review policy validation or write GlobalDB directly.
Inspect from the operator web UI
Open a task and switch to the Orchestration tab (the same tab documented in Task Execution Profiles). The task-level reviews card and the run-detail page's run-level variant are read-only views over the same authority:
- They render
status,outcome,reason,missing_work,next_round_guidance,reviewed_at,delivery_id, and reviewer identity fromweb/src/generated/agh-openapi.d.tstypes — no inferred state. - A permanent disclaimer reinforces that operator sessions cannot submit a verdict from the web UI.
Verdict authority is the reviewer-bound native tool or the explicit
/api/task-reviews/{id}/verdictendpoint. - Continuation lineage and rejected-review guidance appear next to the reviewed run so operators can match the continuation to the rejected verdict without parsing event payloads.
The Stream Resume card on the same tab seeds reconnects from latest_event_seq, so review
events appear in the timeline without a read-then-stream race. See
Notification Cursors for the related delivery
behavior.
Review events
task.Service emits typed task events for every review transition. They appear on the task SSE
stream after persistence and feed the timeline, hooks, and observe surfaces. Payloads are bounded
and never carry raw claim tokens or full reviewer transcripts.
| Event | Emitted when |
|---|---|
task.run_review_requested | Attempt-1 review row is durable and task_runs.review_request_id is set. |
task.run_review_bound | BindRunReviewSession binds the review to a reviewer session. |
task.run_review_recorded | Verdict is persisted (any outcome). |
task.run_review_approved | Recorded verdict outcome is approved. Bridge terminal notifier delivers. |
task.run_review_rejected | Recorded verdict outcome is rejected. |
task.run_review_blocked | Recorded verdict outcome is blocked. |
task.run_review_error | Recorded verdict outcome is error. |
task.run_review_timeout | Recorded verdict outcome is timeout. |
task.run_review_invalid_output | Recorded verdict outcome is invalid_output. |
task.run_review_retry_enqueued | Rejected verdict created a continuation run inside the verdict transaction. |
task.run_review_approved is also the accepted-final terminal event for the bridge terminal
notifier on review-gated work. See
Notification Cursors.
Bundled skill expectations
agh-task-reviewer is one of the bundled orchestration skills. It is loaded only by reviewer
sessions with an active binding (metadata.agh.requires_review_request = true) and is purely
instructional:
- It cannot record verdicts, change task state, alter reviewer routing, or expose claim tokens.
- It teaches reviewers how to read review packets, prefer typed outcomes, and submit bounded
evidence through
submit_run_review.
See Bundled Skills — Orchestration for skill load triggers and the orchestration-skill authority boundary.
Config lifecycle
Defaults and bounds for the gate live under [task.orchestration.review]. Read
config.toml for the complete field reference and the
[task.orchestration.profile] overlay that gates per-task provider/sandbox overrides used by
reviewer routing.
Behavior to keep in mind:
- Workspace overlays (
<workspace>/.agh/config.toml) may tighten or relax the review defaults per workspace. Unknown keys still fail validation. - Updating
[task.orchestration.review]viaagh config setor theagh__config_*native tools is allowed because review policy is a runtime configuration value, not a credential. Secret- shaped paths and trust-rooted sections remain off-limits. - Per-task review policy and reviewer routing are managed through the task execution profile, not
through arbitrary
metadata_jsonfields.
Related pages
- Task Runs and Leases explains the terminal transitions that the review gate sits on top of.
- Task Execution Profiles explains how
TaskExecutionProfile.Reviewshapes reviewer selection. - Notification Cursors explains why bridge terminal notifications wait for the accepted-final event on review-gated runs.
- Bundled Skills documents the orchestration skills, including
agh-task-reviewer. - config.toml lists every
[task.orchestration]key. - agh task review is the generated CLI reference for the review commands.