Troubleshooting
Diagnose and resolve common AGH daemon, socket, agent, session, and database issues.
- Audience
- Operators running durable agent work
- Focus
- Operations guidance shaped for scanability, day-two clarity, and operator context.
Use this guide when an operational command fails or the daemon is not behaving as expected. Each entry has symptoms, diagnosis, and a resolution path.
Daemon reports "already running"
| Field | What to check |
|---|---|
| Symptoms | cli: daemon already running (pid=12345) or daemon: already running with pid 12345. |
| Diagnosis | The lock file is held by a live process, or daemon.json points at a process that is still alive. |
| Resolution | Use the existing daemon, or stop it with agh daemon stop. Do not remove daemon.lock while the PID is alive. |
Inspect the current discovery files:
export AGH_HOME="${AGH_HOME:-$HOME/.agh}"
agh daemon status
cat "$AGH_HOME/daemon.json"
cat "$AGH_HOME/daemon.lock"If the recorded PID is not alive, a new daemon start should acquire the lock, remove the stale socket path, and clean up orphan child processes from the old daemon.
Detached start times out or exits before readiness
| Field | What to check |
|---|---|
| Symptoms | cli: daemon did not become ready before timeout or cli: detached daemon exited before readiness. |
| Diagnosis | The detached child failed before /api/daemon/status became available over UDS. Common causes are invalid config, a port conflict, a socket path conflict, or database open failure. |
| Resolution | Read the recent log lines, then run foreground mode to see the startup error directly. |
Commands:
export AGH_HOME="${AGH_HOME:-$HOME/.agh}"
tail -n 120 "$AGH_HOME/logs/agh.log"
agh daemon start --foregroundFix the error reported by foreground mode, then start normally:
agh daemon startUnix socket cannot be created or opened
| Field | What to check |
|---|---|
| Symptoms | The daemon fails with udsapi: existing path ".../daemon.sock" is not a unix socket, or the CLI cannot connect to the daemon socket. |
| Diagnosis | The configured socket path is occupied by a regular file, or the CLI user cannot read and write the socket. The live socket is chmodded to 0600. |
| Resolution | Stop the daemon, move the non-socket file out of the way, and run the daemon and CLI as the same OS user. |
Inspect the socket path:
export AGH_HOME="${AGH_HOME:-$HOME/.agh}"
socket="$AGH_HOME/daemon.sock"
agh daemon status
ls -ld "$AGH_HOME" "$(dirname "$socket")"
ls -l "$socket"
file "$socket"If file "$socket" shows a regular file and the daemon is stopped, move it aside:
mv "$socket" "$socket.stale"
agh daemon startIf the socket lives outside AGH_HOME, confirm the configured [daemon].socket path and parent
directory ownership in config.toml.
HTTP UI or API is unavailable
| Field | What to check |
|---|---|
| Symptoms | The browser cannot load http://localhost:2123, or curl cannot connect to the API. |
| Diagnosis | The daemon is not running, the HTTP port is different, or another process prevented the daemon from binding the configured port. |
| Resolution | Check daemon status for the active HTTP host and port. If startup fails from an HTTP bind error, change [http].port or stop the conflicting process. |
Commands:
agh daemon status
curl -s http://localhost:2123/api/daemon/status | jq '.daemon'The default HTTP bind is localhost:2123. For local production use, keep the host on localhost
unless you intentionally place AGH behind a protected reverse proxy.
Agent fails to spawn
| Field | What to check |
|---|---|
| Symptoms | Session creation fails with an ACP subprocess or initialize error. Logs mention acp: start subprocess, initialize session, command not found, or permission denied. |
| Diagnosis | The provider command cannot be parsed or executed, the workspace path is invalid, required provider environment variables are missing, or the upstream ACP runtime failed initialization. |
| Resolution | Validate the agent definition, provider command, daemon environment, and workspace paths; then restart or recreate the session. |
Commands:
agh agent info <agent-name>
agh daemon start --foreground
tail -n 120 "$AGH_HOME/logs/agh.log"
command -v npx
command -v codex
command -v geminiAgent subprocesses inherit the daemon environment and receive provider credentials from bound
credential_slots. If a provider uses an env:NAME secret ref, set that variable in the shell or
service manager that starts the daemon. If it uses a vault:providers/<provider>/<slot> ref, save
the credential through the settings API or web provider editor and confirm the provider status
reports the credential as present.
See Spawning for the exact launch and ACP negotiation flow.
Session is stuck after a crash
| Field | What to check |
|---|---|
| Symptoms | A session appears to stay in starting, active, or stopping after a daemon or agent crash. |
| Diagnosis | The metadata on disk may describe an in-flight state from a previous daemon process. |
| Resolution | Restart the daemon, then list or inspect the session. AGH repairs stale session metadata during boot and session reads. |
Commands:
agh daemon start
agh session list --all
agh session status <session-id>
agh session repair <session-id> --dry-runThe repair rules are:
| Stale state | Repaired state |
|---|---|
active | stopped with stop reason agent_crashed |
stopping | stopped with stop reason agent_crashed |
starting | stopped with stop reason error |
For sessions already stopped with agent_crashed or error, boot also repairs interrupted
transcripts by appending terminal repair events. If a transcript or chat replay still shows a
dangling tool call or streaming assistant message after restart, run:
agh session repair <session-id> --dry-run
agh session repair <session-id>If resume still fails, check that the workspace directory, agent definition, and
$AGH_HOME/sessions/<session-id>/events.db still exist.
Database is locked or corrupted
| Field | What to check |
|---|---|
| Symptoms | SQLite reports database is locked, database disk image is malformed, or the daemon cannot open agh.db or events.db. |
| Diagnosis | A live daemon may still hold the database, a filesystem copy may have missed WAL sidecars, or SQLite detected a recoverable corruption marker. |
| Resolution | Stop the daemon before manual inspection. Restore from a backup if the database is corrupt. Preserve any .corrupt.<timestamp> files for diagnosis. |
Commands:
export AGH_HOME="${AGH_HOME:-$HOME/.agh}"
agh daemon stop
sqlite3 "$AGH_HOME/agh.db" "pragma integrity_check;"For backup and restore details, see Database Operations.
Permission errors in AGH_HOME
| Field | What to check |
|---|---|
| Symptoms | Startup logs show errors creating the home layout, lock file, log file, socket parent, or database directory. |
| Diagnosis | The daemon user does not own AGH_HOME, or a service manager starts AGH with a different home path than the CLI. |
| Resolution | Use one stable AGH_HOME, ensure the daemon user owns it, and run the CLI with the same AGH_HOME when managing that daemon. |
Commands:
export AGH_HOME="${AGH_HOME:-$HOME/.agh}"
ls -ld "$AGH_HOME" "$AGH_HOME/logs" "$AGH_HOME/sessions"
agh daemon statusFor systemd or launchd installs, define AGH_HOME in the service environment and keep provider API
keys in the same environment.
Related pages
- Daemon Operations covers lock, socket, logs, and service manager behavior.
- Database Operations covers backup and inspection procedures.
- Production Checklist turns these checks into a readiness gate.