Skip to content
Discussions/App Development/Splice-validator-participant-1 keeps restarting during reconnect-participants on mainnet (v0.5.18)Forum ↗

Splice-validator-participant-1 keeps restarting during reconnect-participants on mainnet (v0.5.18)

App Development2 posts12 viewsLast activity 29d ago
LI
lixiubaoOP
May 2026

We’re running a Canton validator node (v0.5.18) on mainnet and the splice-validator-participant-1 container keeps restarting in a loop.

Observed behavior:

  • Participant starts and connects to 13 sequencers successfully
  • Gets stuck on Task reconnect-participants still not completed
  • CPU spikes to ~785% (nearly maxing all 8 cores), RSS grows to ~4.3 GB
  • Participant crashes ~6 minutes after startup with no ERROR-level logs
  • Validator loses its gRPC connection to participant:5002 and also restarts
  • Cycle repeats

Server spec: 8 vCPU / 16 GB RAM, 100 GB data disk. Memory and disk are not the bottleneck.

Is this expected behavior during the initial ACS sync on mainnet? Does the reconnect-participants task eventually complete after enough retry cycles, or is there something we need to configure to stabilize the
participant?

JA
Jatin_Pandya_cf
29d ago

The reconnect-participants task is the participant’s internal task that reestablishes its connections to the synchronizer’s sequencer connections after a restart, task stalls when the participant is under extreme resource pressure during the initial ACS commitment reconciliation process. crash at ~6 minutes with no ERROR logs seems a JVM out of memory kill which explains why you see nothing in the application logs.

I’d suggest a few things to try:

  • Increase the participant JVM heap as in add explicit heap flags to the participant container via _JAVA_OPTIONS in your Docker Compose.

  • High CPU usage is the JVM’s parallel GC threads competing with each other so capping container CPU to like 6 cores forces the JVM to use fewer GC threads and often results in faster overall startup because GC becomes less chaotic.

  • ACS commitment processing is also db heavy so if Postgres is on the same 100 GB data disk with standard IOPS it can become a bottleneck that backs up the participant’s memory queues.

← Back to Discussions