DAML Ledger API as a Message Queue?
Internally DABL heavily relies on Request contracts; a good example is the creation of a ledger:
template LedgerRequest
with
user : Party
operator : Party
ledgerName : Text
projectId : Text
where
signatory user
controller user can
LedgerRequestCancel : ()
do return ()
controller operator can
LedgerRequestAccept : ContractId Ledger
with
acceptTime : Time
ledgerId : Text
do
create Ledger with
createTime = acceptTime
owner = user
ledgerData = LedgerData with
metadata = empty
..
..
LedgerRequestReject : ()
do return ()
template Ledger
with
operator : Party
owner : Party
ledgerData : LedgerData
createTime : Time
where
signatory operator, owner
key (operator, ledgerData.ledgerId) : (Party, Text)
maintainer key._1
controller operator can
LedgerUpdateMetadata : ContractId Ledger
with
newMetadata : [(Text, Text)]
do
create this with
ledgerData = ledgerData with
metadata = fromList newMetadata
LedgerOperatorArchive :
ContractId ArchivedLedger
with
archiveTime : Time
do
create ArchivedLedger with ..
controller owner can
LedgerOwnerArchive :
ContractId ArchivedLedger
with
archiveTime : Time
do
create ArchivedLedger with ..
The UI issues a POST HTTP call to a web service, which in turn uses the gRPC Ledger API to create LedgerRequest with a party that corresponds to the user. A bot in a separate process sees the request, and calls either LedgerRequestAccept or LedgerRequestReject. The pattern works well when you stay on the happy path.
The Unhappy Path: Some failure modes that we are starting to encounter
Bot not running when a Request contract is created
Not all of our handlers consider both the ACS and transaction event stream as sources of Request contracts. There is no single gRPC Ledger API primitive for this stream; the HTTP JSON Websocket API does implement semantics that allow the client to abstract away whether a Request contract existed or not. However, that is not necessarily without its problems as well…
Poison pill problem
Bot receives a contract that confuses it enough to crash, hard.
If the bot ignores all requests that are issued when the bot is not running, then the bot recovers, but an unacknowledged request remains sitting on the ledger forever.
If the bot retroactively acts on requests that already exist in the ledger on startup, the bot runs the risk of crashing again on the same payload (poison pill), preventing it from ever starting again.
This can somewhat be mitigated by all clients following a pattern with affordances in the models to allow this:
def onContract(event) {
if event.cdata.allowedRetryCount > 0 {
try {
process(event)
} catch {
decrementAllowedRetryCount(event.cid)
}
} else {
killRequest(event.cid)
}
}
However, a poison pill could still easily occur by exploiting mistakes in the models/bot that allow a bot visibility into a contract that it can’t actually act on, even though the bot is coded with the expectation that it can with respect to tracking the retry count as a field in a template.
Temporarily failure to process problem
If a poison pill does not crash the process but merely fails to process temporarily (we observe this with bots that make decisions based on the result of an external service call), there is no way, gRPC or HTTP JSON API, to “rewind” the tape and re-process failed requests. In all of DABL’s current systems, this results in requests remaining “stuck” until processes are restarted (and even then, that’s only if the ACS is considered). However, maniacal retries may also cause the process to be stuck endlessly retrying a request that is doomed to forever fail.
There isn’t a question that needs to be answered for this post; this is merely our current state thinking on the various pros/cons to employing different approaches to application building over the Ledger API, particularly when using it in a similar fashion to what message queues might traditionally be used for.
References
http://zguide.zeromq.org/php:chapter4
https://www.rabbitmq.com/dlx.html
Great post, this captures the whole problem in detail.
I’d like to add one other category of problem here for the unhappy path, but with more nuance: specifically the scenario where the Request is rejected for a valid reason - a non error case. In this case we’d want the reason for the rejection to be made available to the requester in a way that they could process and make sense of. For instance in the example above, assume that there is a quota of three ledgers per user, the fourth request for a ledger should be failed with a quota related reason. If it were failed, but the user was unaware of the quota level, how might they be made aware of the reason that their request was not completed? Is there a way to do this without additional template design to account for the visibility of the resultant rejection reason?
I go back and forth on whether this is a good idea. On the one hand it would be nice to observe how contracts get archived; akin to a process exit status, via a simple primitive. On the other hand, if you do care about the end result, it makes sense to model it explicitly in DAML via templates. I think that the latter is probably the better approach because it keeps DAML simpler and consequently easier to deploy to different persistence layers.
@dtanabe Could you give more examples of your poison pill contracts?
Hey @Max_DeLiso, we did something similar to this when building our 3rd party external API tool. We use tool-metagen n.b. ‘swagger’ branch, to convert Swagger JSON specifications into DAML RequestXXX / ResponseXXX contracts, and within the response we use a sum type to distinguish successful and unsuccessful replies.
The possible replies need to be specified in the input Swagger file, so the input JSON might look like this:
"responses": {
"200": {
"description": "Returned if the request is successful.",
"examples": {
"application/json": "{\"key\":\"jira-software\",\"groups\":[\"jira-software-users\",\"jira-testers\"],\"name\":\"Jira Soft
},
"schema": {
"$ref": "#/definitions/ApplicationRole"
}
},
"401": {
"description": "Returned if the authentication credentials are incorrect or missing."
},
"403": {
"description": "Returned if the user is not an administrator."
},
"404": {
"description": "Returned if the role is not found."
}
},
And the output DAML type will look like this:
template AdminapplicationroleApplicationRoleResourcegetAllApplicationRolesGetResponse with
requestor : Party
requestId : ContractId AdminapplicationroleApplicationRoleResourcegetAllApplicationRolesGet
body : AdminapplicationroleApplicationRoleResourcegetAllApplicationRolesGetResponseBody
where
signatory requestor
data AdminapplicationroleApplicationRoleResourcegetAllApplicationRolesGetResponseBody
= AdminapplicationroleApplicationRoleResourcegetAllApplicationRolesGetResponseBody_401 ()
| AdminapplicationroleApplicationRoleResourcegetAllApplicationRolesGetResponseBody_403 ()
| AdminapplicationroleApplicationRoleResourcegetAllApplicationRolesGetResponseBody_200 [ApplicationRole]
deriving (Eq, Ord, Show)
You then get a type-safe way of checking the response at runtime.
You can learn more about this integration piece, ‘dagger’ here (internal to DA employees only).