Node Troubleshooting

This page lists common troubleshooting scenarios and solutions for node operators.

401 Unauthorized: Signature Invalid

If you see a log that looks like this in devium--batcher:

WARN [12-13|15:53:20.263] Derivation process temporary error       attempts=80 err="stage 0 failed resetting: temp: failed to find the L2 Heads to start from: failed to fetch current L2 forkchoice state: failed to find the finalized L2 block: failed to determine L2BlockRef of finalized, could not get payload: 401 Unauthorized: signature is invalid

It means that the devium--batcher is unable to authenticate with devium-geth's authenticated RPC using the JWT secret.

Solution

Check that the JWT secret is correct in both services.
Check that devium-geth's authenticated RPC is enabled, and that the URL is correct.

403 Forbidden: Invalid Host Specified

If you see a log that looks like this in devium--batcher:

{"err":"403 Forbidden: invalid host specified\n","lvl":"error","msg":"error getting latest header","t":"2022-12-13T22:29:18.932833159Z"}

It means that you have not whitelisted devium--batcher's host with devium-geth.

Solution

Make sure that the --authrpc.vhosts parameter in devium-geth is either set to the correct host, or *.
Check that devium-geth's authenticated RPC is enabled, and that the URL is correct.

Failed to Load P2P Config

If you see a log that looks like this in devium--batcher:

CRIT [12-13|13:46:21.386] Application failed                       message="failed to load p2p config: failed to load p2p discovery options: failed to open discovery db: mkdir /p2p: permission denied"

It means that the devium--batcher lacks write access to the P2P discovery or peerstore directories.

Solution

Make sure that the devium--batcher has write access to the P2P directory. By default, this is /p2p.
Set the P2P directory to somewhere the devium--batcher can access via the --p2p.discovery.path and --p2p.peerstore.path parameters.
Set the discovery path to memory to disable persistence via the --p2p.discovery.path and --p2p.peerstore.path parameters.

Wrong Chain

If you see a log that looks like this in devium--batcher:

{"attempts":183,"err":"stage 0 failed resetting: temp: failed to find the L2 Heads to start from: wrong chain L1: genesis: 0x4104895a540d87127ff11eef0d51d8f63ce00a6fc211db751a45a4b3a61a9c83:8106656, got 0x12e2c18a3ac50f74d3dd3c0ed7cb751cc924c2985de3dfed44080e683954f1dd:8106656","lvl":"warn","msg":"Derivation process temporary error","t":"2022-12-13T23:31:37.855253213Z"}

It means that the devium--batcher is pointing to the wrong chain.

Solution

Verify that the devium--batcher's L1 URL is pointing to the correct L1 for the given network.
Verify that the devium--batcher's rollup config/--network parameter is set to the correct network.
Verify that the devium--batcher's L2 URL is pointing to the correct instance of devium-geth, and that devium-geth is properly initialized for the given network.

Unclean Shutdowns

If you see a log that looks like this in devium-geth:

WARN [03-05|16:18:11.238] Unclean shutdown detected                booted=2023-03-05T11:09:26+0000 age=5h8m45s

It means devium-geth has experienced an unclean shutdown. The geth docs (opens in a new tab) say if Geth stops unexpectedly, the database can be corrupted. This is known as an "unclean shutdown" and it can lead to a variety of problems for the node when it is restarted.

It is always best to shut down Geth gracefully, i.e. using a shutdown command such as ctrl-c, docker stop -t 300 <container ID> or systemctl stop (although please note that systemctl stop has a default timeout of 90s - if Geth takes longer than this to gracefully shut down it will quit forcefully. Update the TimeoutSecs variable in systemd.service to override this value to something larger, at least 300s).

This way, Geth knows to write all relevant information into the database to allow the node to restart properly later. This can involve >1GB of information being written to the LevelDB database which can take several minutes.

Solution

If an unexpected shutdown does occur, the removedb subcommand can be used to delete the state database and resync it from the ancient database. This should get the database back up and running.

Monitoring Network Upgrades