Skip to main content

Common Runbooks

Common runbooks keep chain operations predictable across Ethereum, Sui, Aptos, and Solana. Chain pages provide command details; this page defines the shared structure every operator should follow.

Required runbook sections

SectionPurposeMinimum content
TriggerWhen to start the runbookAlert name, dashboard panel, manual observation, or upstream advisory.
ScopeWhat the runbook affectsChain, network, node role, customer-facing endpoint, and expected blast radius.
PreconditionsSafety checks before actionCurrent sync status, peer count, recent backups, maintenance window, and rollback path.
ProcedureOrdered stepsCommands, expected output, timeout, and where to stop if output differs.
ValidationHow success is provenHealth checks, RPC smoke tests, metrics recovery, and log patterns.
RollbackHow to return to the previous statePrevious image/config, snapshot, DNS or load balancer reversal, and data-dir handling.
EscalationWho gets involvedOn-call owner, chain specialist, security owner, and communications lead.

:::warning Safety boundary Never improvise destructive actions against a validator, archive node, or production RPC fleet. If a runbook does not cover the condition, pause at the first safe boundary and escalate using /operations/incident-response. :::

Common operational flows

Node restart

  1. Confirm the node is safe to restart: it is not the only healthy node behind a production endpoint, and the peer set has redundancy.
  2. Drain traffic from the gateway or load balancer when the node serves RPC.
  3. Capture current status: block height or checkpoint, peer count, process image, and recent error logs.
  4. Restart with the deployment runtime documented on the chain page.
  5. Validate local health and sync recovery before returning traffic.
# Example smoke pattern; replace the URL and method with the chain-specific endpoint.
curl -fsS http://127.0.0.1:8545 \
-H 'content-type: application/json' \
-d '{"jsonrpc":"2.0","id":1,"method":"eth_syncing","params":[]}'

Configuration change

  1. Review the chain-specific reference pages for ports, flags, and images.
  2. Apply the change to a non-production node first.
  3. Render or preview the deployment artifact before applying.
  4. Roll one node at a time unless the change is a security emergency.
  5. Keep the previous config and image tag available until validation completes.

Upgrade or rollback

Use the chain page for version-specific instructions. The shared rule is simple: separate binary upgrades from state changes whenever possible, and prove a rollback path before touching production.

CheckUpgradeRollback
BackupFresh snapshot or volume backup existsSnapshot from before the upgrade is available
CompatibilityUpstream release notes reviewedDowngrade is supported or data restore is planned
TrafficNode drained before restartNode remains drained until healthy
ValidationSync resumes and RPC smoke passesPrevious version serves expected responses
  • Use /operations/monitoring-standards for alert and dashboard expectations.
  • Use /operations/backup-standards before state-changing work.
  • Use /operations/security-standards before exposing any endpoint.
  • Use /operations/rpc-exposure-policy for the canonical endpoint exposure classes.