Common Runbooks
Common runbooks keep chain operations predictable across Ethereum, Sui, Aptos, and Solana. Chain pages provide command details; this page defines the shared structure every operator should follow.
Required runbook sections
| Section | Purpose | Minimum content |
|---|---|---|
| Trigger | When to start the runbook | Alert name, dashboard panel, manual observation, or upstream advisory. |
| Scope | What the runbook affects | Chain, network, node role, customer-facing endpoint, and expected blast radius. |
| Preconditions | Safety checks before action | Current sync status, peer count, recent backups, maintenance window, and rollback path. |
| Procedure | Ordered steps | Commands, expected output, timeout, and where to stop if output differs. |
| Validation | How success is proven | Health checks, RPC smoke tests, metrics recovery, and log patterns. |
| Rollback | How to return to the previous state | Previous image/config, snapshot, DNS or load balancer reversal, and data-dir handling. |
| Escalation | Who gets involved | On-call owner, chain specialist, security owner, and communications lead. |
:::warning Safety boundary
Never improvise destructive actions against a validator, archive node, or production RPC fleet. If a runbook does not cover the condition, pause at the first safe boundary and escalate using /operations/incident-response.
:::
Common operational flows
Node restart
- Confirm the node is safe to restart: it is not the only healthy node behind a production endpoint, and the peer set has redundancy.
- Drain traffic from the gateway or load balancer when the node serves RPC.
- Capture current status: block height or checkpoint, peer count, process image, and recent error logs.
- Restart with the deployment runtime documented on the chain page.
- Validate local health and sync recovery before returning traffic.
# Example smoke pattern; replace the URL and method with the chain-specific endpoint.
curl -fsS http://127.0.0.1:8545 \
-H 'content-type: application/json' \
-d '{"jsonrpc":"2.0","id":1,"method":"eth_syncing","params":[]}'
Configuration change
- Review the chain-specific reference pages for ports, flags, and images.
- Apply the change to a non-production node first.
- Render or preview the deployment artifact before applying.
- Roll one node at a time unless the change is a security emergency.
- Keep the previous config and image tag available until validation completes.
Upgrade or rollback
Use the chain page for version-specific instructions. The shared rule is simple: separate binary upgrades from state changes whenever possible, and prove a rollback path before touching production.
| Check | Upgrade | Rollback |
|---|---|---|
| Backup | Fresh snapshot or volume backup exists | Snapshot from before the upgrade is available |
| Compatibility | Upstream release notes reviewed | Downgrade is supported or data restore is planned |
| Traffic | Node drained before restart | Node remains drained until healthy |
| Validation | Sync resumes and RPC smoke passes | Previous version serves expected responses |
Cross-links
- Use
/operations/monitoring-standardsfor alert and dashboard expectations. - Use
/operations/backup-standardsbefore state-changing work. - Use
/operations/security-standardsbefore exposing any endpoint. - Use
/operations/rpc-exposure-policyfor the canonical endpoint exposure classes.