CIP-0064 Delegateless Automation
- Hi, we'd like to propose a CIP to remove the need for the DSO delegate as a single point of failure and thereby address some of the issues we've observed over the last few months on dev and testnet where rounds stopped advancing because the delegate had issues.Please let us know if you have any questions.______________________________________CIP: N/ATitle: Delegateless Automation
Author: Julien Tinguely
Comments-Summary: No comments yet
Status: Draft
Type: Governance
Created: 2025-04-21
Approved: N/A
License: CC0-1.0
Abstract
This CIP proposes to replace the elected SV delegate mechanism to improve fault tolerance. Instead of relying on a single delegate, we propose an approach where the backend automation for each instance of the Super Validator application, (one instance operating on each of the different Super Validator nodes), attempts to exercise delegate choices. Each application will apply a random delay, tuned to minimize contention.
Specification
Daml
In the new implementation, all current choices on the `DsoRules` template having `dsoDelegate` as controller will introduce a new optional argument named `sv` to be used instead. This allows all SVs to exercise the choice by specifying their own party. Daml will reject, with an authorization error, any transaction where the SV application tries to specify any party other than its own.
SV Application
The SV application is modified such that all SVs attempt to submit the commands currently only being submitted by the delegate.
This introduces some potential for conflict: if multiple SV applications submit the same transaction at the same time, for example, all attempt to advance the mining rounds at the same time, only one of them will succeed and the others fail, that is, they contend with each other. Importantly, this is not a correctness issue: The Daml code already handles this case; for example, it can successfully manage if the delegate crashes during a submission and tries to resubmit the same command. But the failed transactions introduce unnecessary load on the network.
To minimize the impact, each SV application adds a random delay before each attempt and retries with a random backoff. A new metric named `splice_trigger_attempted_total` is added to monitor trigger failures caused by contention and help in choosing good delay and backoff parameters.
SV UI
In the current implementation, there is a dedicated page in the SV UI used to elect a new delegate for the next round, intended to unlock the system if a delegate comes down. In the new implementation, delegate election and its dedicated page in the SV UI are fully removed and associated Daml choices cannot be executed anymore.
Motivation
A delegate performs critical network-related tasks, such as for example advancing mining rounds or triggering vote proposal executions when the required 2/3 voting threshold is reached. In the current version, there is no automatic fault tolerance to mitigate issues that may occur on the node. SVs can vote on a new delegate, but this is a manual process, and the single point of failure is just moved to a different SV. A concrete problem is rounds stop progressing when the delegate is down. This has been observed multiple times on devnet and testnet and highlights the need for a better fault tolerance mechanism.
Rationale
All SV applications attempting to complete tasks provide the right fault tolerance guarantees and introduce no correctness issues as commands are already all idempotent.
Given that the current delegate-based choices do not require high throughput, e.g., round advancement happens only every 10 minutes, random delays and backoffs are sufficient to minimize contention.
Our approach does not degrade performance in practice, as it introduces only minimal overhead in SV app automation while significantly enhancing fault tolerance. Most tasks succeed on the first attempt, rendering the backoff mechanism largely inactive. Empirical benchmarks show that delegate transactions accounted for approximately 1% of total transactions on DevNet and 1.6% on MainNet, based on 24-hour sampling periods. The script used for this analysis is available in Splice under _delegate_txs.py_.
While more complex approaches to minimize contention would be possible, e.g., defining a clear order in which SVs should attempt to process a given task that is determined based on some task identifier, the implementation complexity of that does not seem to be justified.
Backwards compatibility
This new approach is only active when all SV operators have vetted the new governance dar file. All choices and backend code involved are backward compatible.
Reference implementation
The new implementation includes backend and Daml changes. An early version of the code is available in Splice branch: feature/delegateless-automation.
Copyright
This CIP is licensed under CC0-1.0: Creative Commons CC0 1.0 Universal
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.digitalasset.com/emaildisclaimer.html. If you are not the intended recipient, please delete this message. - On Wed, May 21, 2025 at 6:14 PM Julien Tinguely <julien.tinguely@...> wrote:Hi, we'd like to propose a CIP to remove the need for the DSO delegate as a single point of failure and thereby address some of the issues we've observed over the last few months on dev and testnet where rounds stopped advancing because the delegate had issues.Please let us know if you have any questions.______________________________________CIP: N/ATitle: Delegateless Automation
Author: Julien Tinguely
Comments-Summary: No comments yet
Status: Draft
Type: Governance
Created: 2025-04-21
Approved: N/A
License: CC0-1.0
Abstract
This CIP proposes to replace the elected SV delegate mechanism to improve fault tolerance. Instead of relying on a single delegate, we propose an approach where the backend automation for each instance of the Super Validator application, (one instance operating on each of the different Super Validator nodes), attempts to exercise delegate choices. Each application will apply a random delay, tuned to minimize contention.
Specification
Daml
In the new implementation, all current choices on the `DsoRules` template having `dsoDelegate` as controller will introduce a new optional argument named `sv` to be used instead. This allows all SVs to exercise the choice by specifying their own party. Daml will reject, with an authorization error, any transaction where the SV application tries to specify any party other than its own.
SV Application
The SV application is modified such that all SVs attempt to submit the commands currently only being submitted by the delegate.
This introduces some potential for conflict: if multiple SV applications submit the same transaction at the same time, for example, all attempt to advance the mining rounds at the same time, only one of them will succeed and the others fail, that is, they contend with each other. Importantly, this is not a correctness issue: The Daml code already handles this case; for example, it can successfully manage if the delegate crashes during a submission and tries to resubmit the same command. But the failed transactions introduce unnecessary load on the network.
To minimize the impact, each SV application adds a random delay before each attempt and retries with a random backoff. A new metric named `splice_trigger_attempted_total` is added to monitor trigger failures caused by contention and help in choosing good delay and backoff parameters.
SV UI
In the current implementation, there is a dedicated page in the SV UI used to elect a new delegate for the next round, intended to unlock the system if a delegate comes down. In the new implementation, delegate election and its dedicated page in the SV UI are fully removed and associated Daml choices cannot be executed anymore.
Motivation
A delegate performs critical network-related tasks, such as for example advancing mining rounds or triggering vote proposal executions when the required 2/3 voting threshold is reached. In the current version, there is no automatic fault tolerance to mitigate issues that may occur on the node. SVs can vote on a new delegate, but this is a manual process, and the single point of failure is just moved to a different SV. A concrete problem is rounds stop progressing when the delegate is down. This has been observed multiple times on devnet and testnet and highlights the need for a better fault tolerance mechanism.
Rationale
All SV applications attempting to complete tasks provide the right fault tolerance guarantees and introduce no correctness issues as commands are already all idempotent.
Given that the current delegate-based choices do not require high throughput, e.g., round advancement happens only every 10 minutes, random delays and backoffs are sufficient to minimize contention.
Our approach does not degrade performance in practice, as it introduces only minimal overhead in SV app automation while significantly enhancing fault tolerance. Most tasks succeed on the first attempt, rendering the backoff mechanism largely inactive. Empirical benchmarks show that delegate transactions accounted for approximately 1% of total transactions on DevNet and 1.6% on MainNet, based on 24-hour sampling periods. The script used for this analysis is available in Splice under _delegate_txs.py_.
While more complex approaches to minimize contention would be possible, e.g., defining a clear order in which SVs should attempt to process a given task that is determined based on some task identifier, the implementation complexity of that does not seem to be justified.
Backwards compatibility
This new approach is only active when all SV operators have vetted the new governance dar file. All choices and backend code involved are backward compatible.
Reference implementation
The new implementation includes backend and Daml changes. An early version of the code is available in Splice branch: feature/delegateless-automation.
Copyright
This CIP is licensed under CC0-1.0: Creative Commons CC0 1.0 Universal
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.digitalasset.com/emaildisclaimer.html. If you are not the intended recipient, please delete this message. - Hi Julien,SV-Nodeops-Limited is happy to endorse CIP for Delegateless AutomationStas