Skip to content
Mailing Lists/CIP-0064 Delegateless AutomationSource on lists.sync.global ↗

CIP-0064 Delegateless Automation

cip-discussCIP-00643 messagesstarted 21-05-2025
  1. #1Julien Tinguely21-05-2025source ↗
    Hi, we'd like to propose a CIP to remove the need for the DSO delegate as a single point of failure and thereby address some of the issues we've observed over the last few months on dev and testnet where rounds stopped advancing because the delegate had issues.

    Please let us know if you have any questions.

    ______________________________________
     CIP: N/A
     Title: Delegateless Automation
     Author: Julien Tinguely
     Comments-Summary: No comments yet
     Status: Draft
     Type: Governance
     Created: 2025-04-21
     Approved: N/A
     License: CC0-1.0

    Abstract

    This CIP proposes to replace the elected SV delegate mechanism to improve fault tolerance. Instead of relying on a single delegate, we propose an approach where the backend automation for each instance of the Super Validator application, (one instance operating on each of the different Super Validator nodes), attempts to exercise delegate choices. Each application will apply a random delay, tuned to minimize contention.

    Specification

    Daml

    In the new implementation, all current choices on the `DsoRules` template having `dsoDelegate` as controller will introduce a new optional argument named `sv` to be used instead. This allows all SVs to exercise the choice by specifying their own party. Daml will reject, with an authorization error, any transaction where the SV application tries to specify any party other than its own.

    SV Application

    The SV application is modified such that all SVs attempt to submit the commands currently only being submitted by the delegate.

    This introduces some potential for conflict: if multiple SV applications submit the same transaction at the same time, for example, all attempt to advance the mining rounds at the same time, only one of them will succeed and the others fail, that is, they contend with each other. Importantly, this is not a correctness issue: The Daml code already handles this case; for example, it can successfully manage if the delegate crashes during a submission and tries to resubmit the same command. But the failed transactions introduce unnecessary load on the network.

    To minimize the impact, each SV application adds a random delay before each attempt and retries with a random backoff. A new metric named `splice_trigger_attempted_total` is added to monitor trigger failures caused by contention and help in choosing good delay and backoff parameters.

    SV UI

    In the current implementation, there is a dedicated page in the SV UI used to elect a new delegate for the next round, intended to unlock the system if a delegate comes down. In the new implementation, delegate election and its dedicated page in the SV UI are fully removed and associated Daml choices cannot be executed anymore.

    Motivation

    A delegate performs critical network-related tasks, such as for example advancing mining rounds or triggering vote proposal executions when the required 2/3 voting threshold is reached. In the current version, there is no automatic fault tolerance to mitigate issues that may occur on the node. SVs can vote on a new delegate, but this is a manual process, and the single point of failure is just moved to a different SV. A concrete problem is rounds stop progressing when the delegate is down. This has been observed multiple times on devnet and testnet and highlights the need for a better fault tolerance mechanism.

    Rationale

    All SV applications attempting to complete tasks provide the right fault tolerance guarantees and introduce no correctness issues as commands are already all idempotent.

    Given that the current delegate-based choices do not require high throughput, e.g., round advancement happens only every 10 minutes, random delays and backoffs are sufficient to minimize contention.

    Our approach does not degrade performance in practice, as it introduces only minimal overhead in SV app automation while significantly enhancing fault tolerance. Most tasks succeed on the first attempt, rendering the backoff mechanism largely inactive. Empirical benchmarks show that delegate transactions accounted for approximately 1% of total transactions on DevNet and 1.6% on MainNet, based on 24-hour sampling periods. The script used for this analysis is available in Splice under _delegate_txs.py_.

    While more complex approaches to minimize contention would be possible, e.g., defining a clear order in which SVs should attempt to process a given task that is determined based on some task identifier, the implementation complexity of that does not seem to be justified.

    Backwards compatibility

    This new approach is only active when all SV operators have vetted the new governance dar file. All choices and backend code involved are backward compatible.

    Reference implementation

    The new implementation includes backend and Daml changes. An early version of the code is available in Splice branch: feature/delegateless-automation.

    Copyright

    This CIP is licensed under CC0-1.0: Creative Commons CC0 1.0 Universal



    Julien Tinguely
    Software Engineer / +41 79 368 15 14
    Digital Asset, creators of Daml

    This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.digitalasset.com/emaildisclaimer.html. If you are not the intended recipient, please delete this message.
  2. #2Julien Tinguely21-05-2025source ↗

    Julien Tinguely
    Software Engineer / +41 79 368 15 14
    Digital Asset, creators of Daml


    On Wed, May 21, 2025 at 6:14 PM Julien Tinguely <julien.tinguely@...> wrote:
    Hi, we'd like to propose a CIP to remove the need for the DSO delegate as a single point of failure and thereby address some of the issues we've observed over the last few months on dev and testnet where rounds stopped advancing because the delegate had issues.

    Please let us know if you have any questions.

    ______________________________________
     CIP: N/A
     Title: Delegateless Automation
     Author: Julien Tinguely
     Comments-Summary: No comments yet
     Status: Draft
     Type: Governance
     Created: 2025-04-21
     Approved: N/A
     License: CC0-1.0

    Abstract

    This CIP proposes to replace the elected SV delegate mechanism to improve fault tolerance. Instead of relying on a single delegate, we propose an approach where the backend automation for each instance of the Super Validator application, (one instance operating on each of the different Super Validator nodes), attempts to exercise delegate choices. Each application will apply a random delay, tuned to minimize contention.

    Specification

    Daml

    In the new implementation, all current choices on the `DsoRules` template having `dsoDelegate` as controller will introduce a new optional argument named `sv` to be used instead. This allows all SVs to exercise the choice by specifying their own party. Daml will reject, with an authorization error, any transaction where the SV application tries to specify any party other than its own.

    SV Application

    The SV application is modified such that all SVs attempt to submit the commands currently only being submitted by the delegate.

    This introduces some potential for conflict: if multiple SV applications submit the same transaction at the same time, for example, all attempt to advance the mining rounds at the same time, only one of them will succeed and the others fail, that is, they contend with each other. Importantly, this is not a correctness issue: The Daml code already handles this case; for example, it can successfully manage if the delegate crashes during a submission and tries to resubmit the same command. But the failed transactions introduce unnecessary load on the network.

    To minimize the impact, each SV application adds a random delay before each attempt and retries with a random backoff. A new metric named `splice_trigger_attempted_total` is added to monitor trigger failures caused by contention and help in choosing good delay and backoff parameters.

    SV UI

    In the current implementation, there is a dedicated page in the SV UI used to elect a new delegate for the next round, intended to unlock the system if a delegate comes down. In the new implementation, delegate election and its dedicated page in the SV UI are fully removed and associated Daml choices cannot be executed anymore.

    Motivation

    A delegate performs critical network-related tasks, such as for example advancing mining rounds or triggering vote proposal executions when the required 2/3 voting threshold is reached. In the current version, there is no automatic fault tolerance to mitigate issues that may occur on the node. SVs can vote on a new delegate, but this is a manual process, and the single point of failure is just moved to a different SV. A concrete problem is rounds stop progressing when the delegate is down. This has been observed multiple times on devnet and testnet and highlights the need for a better fault tolerance mechanism.

    Rationale

    All SV applications attempting to complete tasks provide the right fault tolerance guarantees and introduce no correctness issues as commands are already all idempotent.

    Given that the current delegate-based choices do not require high throughput, e.g., round advancement happens only every 10 minutes, random delays and backoffs are sufficient to minimize contention.

    Our approach does not degrade performance in practice, as it introduces only minimal overhead in SV app automation while significantly enhancing fault tolerance. Most tasks succeed on the first attempt, rendering the backoff mechanism largely inactive. Empirical benchmarks show that delegate transactions accounted for approximately 1% of total transactions on DevNet and 1.6% on MainNet, based on 24-hour sampling periods. The script used for this analysis is available in Splice under _delegate_txs.py_.

    While more complex approaches to minimize contention would be possible, e.g., defining a clear order in which SVs should attempt to process a given task that is determined based on some task identifier, the implementation complexity of that does not seem to be justified.

    Backwards compatibility

    This new approach is only active when all SV operators have vetted the new governance dar file. All choices and backend code involved are backward compatible.

    Reference implementation

    The new implementation includes backend and Daml changes. An early version of the code is available in Splice branch: feature/delegateless-automation.

    Copyright

    This CIP is licensed under CC0-1.0: Creative Commons CC0 1.0 Universal



    Julien Tinguely
    Software Engineer / +41 79 368 15 14
    Digital Asset, creators of Daml

    This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.digitalasset.com/emaildisclaimer.html. If you are not the intended recipient, please delete this message.
  3. #3Stanislav German-Evtushenko28-05-2025source ↗
    Hi Julien,
     
    SV-Nodeops-Limited is happy to endorse CIP for Delegateless Automation
     
    Stas