Skip to content
AGH RuntimeOperations

Production Checklist

Prepare AGH for persistent unattended operation with clear pass and fail checks.

Audience
Operators running durable agent work
Focus
Operations guidance shaped for scanability, day-two clarity, and operator context.

Use this checklist before running AGH as a persistent daemon for real work. It is written for local or self-managed production-like environments where one service user owns one AGH_HOME.

1. Pin the daemon identity and home

CheckPass condition
Service userA dedicated OS user owns the daemon process.
Home directoryAGH_HOME is explicit, stable, and owned by the service user.
CLI operationsOperators use the same AGH_HOME when running agh status, agh session list, and related commands.
File permissionsThe home directory is not world-writable; socket access is limited to the daemon user.

Example:

sudo install -d -o agh -g agh -m 0750 /var/lib/agh

AGH creates its standard subdirectories with normal directory permissions, and the live UDS socket is chmodded to 0600.

2. Harden configuration

Review the home config that the daemon loads:

export AGH_HOME="${AGH_HOME:-$HOME/.agh}"
sed -n '1,220p' "$AGH_HOME/config.toml"

After changing it, run agh daemon start --foreground during a maintenance window or in a staging AGH_HOME to surface config validation errors directly.

Use explicit daemon and HTTP settings:

[daemon]
socket = "/var/lib/agh/daemon.sock"

[http]
host = "localhost"
port = 2123

[log]
level = "info"
max_size_mb = 10
max_backups = 5
max_age_days = 30
compress_backups = false

[limits]
max_concurrent_agents = 20
CheckPass condition
HTTP bind[http].host is localhost unless AGH is intentionally protected by a reverse proxy or host firewall.
UDS path[daemon].socket is inside a directory owned by the daemon user.
Log level[log].level is info or warn for unattended operation; use debug only for short investigations.
LimitsAgent concurrency limits and host resource expectations match the host capacity.
Provider authNative CLI providers are logged in for the service user, and bound_secret providers have resolvable env: or vault: credentials.

3. Run under a service manager

The service manager should:

  • start agh daemon start --foreground
  • send SIGTERM during stop
  • restart on unexpected failure
  • provide the provider environment used by bound_secret providers and run under the user whose native provider logins should be visible
  • keep stdout and stderr in a known log location

For concrete service files, see Daemon Operations.

4. Configure log retention

AGH writes structured logs to $AGH_HOME/logs/agh.log. Detached daemon startup also appends child stdout and stderr there. The daemon rotates this file directly from the LogSink configured under [log]; no external logrotate rule is required for the default file.

[log]
level = "info"
max_size_mb = 10
max_backups = 5
max_age_days = 30
compress_backups = false
CheckPass condition
Retentionmax_size_mb * (max_backups + 1) fits the filesystem budget for the AGH home.
AccessOnly operators who need runtime logs can read $AGH_HOME/logs/ and downloaded support bundles.
Error reviewRecent error lines are reviewed during incident response and before upgrades.

5. Monitor daemon and runtime health

Use both runtime status and doctor diagnostics:

agh status --output json
agh doctor --output json

If HTTP is available locally:

curl -fsS http://localhost:2123/api/status >/dev/null
curl -fsS http://localhost:2123/api/doctor >/dev/null

Alert on:

SignalFailing condition
Daemon statusStatus is not running, or PID is absent.
HTTP status/api/status or /api/doctor cannot be reached from the host.
Active sessionsCount exceeds the expected operating range.
Database sizeglobal_db_size_bytes or session_db_size_bytes grows faster than planned.
LogsRepeated startup, socket, database, or ACP spawn errors.

6. Back up state

Back up at least:

  • $AGH_HOME/agh.db and SQLite sidecars
  • $AGH_HOME/sessions/
  • $AGH_HOME/config.toml
  • $AGH_HOME/agents/
  • $AGH_HOME/skills/
  • $AGH_HOME/memory/

Use one of the backup paths in Database Operations. For unattended hosts, prefer a scheduled cold backup when the daemon can be stopped. If it cannot be stopped, use SQLite .backup instead of copying only the main database files.

CheckPass condition
FrequencyBackup frequency matches the amount of session history you can afford to lose.
CoverageBackups include global and per-session databases plus config and content directories.
Restore drillA restore has been tested on a separate AGH_HOME.
RetentionOld backups expire according to your storage and compliance needs.

7. Reserve host resources

AGH starts real ACP-compatible agent CLIs as child processes. Size the host for the agent binaries you run, not only the daemon.

CheckPass condition
DiskAGH_HOME, logs, and session event databases have room to grow.
File descriptorsThe service limit is high enough for concurrent sessions, sockets, logs, and SQLite handles.
Process countThe service user can run the daemon plus expected agent child processes.
PATHProvider commands such as npx, codex, or gemini are available to the service environment.
ShutdownThe service manager gives AGH time to stop sessions and close databases before killing it.

For systemd, set resource limits in the service file when needed:

[Service]
LimitNOFILE=8192
TimeoutStopSec=30
Restart=on-failure

8. Upgrade deliberately

Use this flow for binary upgrades:

export AGH_HOME=/var/lib/agh

agh status
agh daemon stop

# Back up AGH_HOME here.
# Install the new agh binary here.

agh daemon start
agh status
agh doctor

Do not rely on old daemon state after replacing the binary. Stop, back up, replace, start, and then confirm status and health.

Final readiness gate

AreaReady when
Daemon lifecycleagh daemon start, agh status, and agh daemon stop work under the service manager.
SocketThe CLI can reach the configured UDS socket as the intended operator user.
HTTPHTTP is bound only where intended and health endpoints are reachable.
LogsLogs rotate and recent errors are actionable.
DatabasesBackups include agh.db, per-session events.db files, metadata, and sidecars.
SessionsTest session creation, stop, list, and resume work with the production service environment.
RecoveryOperators know how to restore a backup into a separate AGH_HOME before touching production state.

On this page