AWS RDS Postgres password-less authentication in Canton
Hello,
We are trying to setup canton domain and participant with AWS RDS Aurora Postgres and the idea is to connect to Postgres without password by using IAM user and service accounts. Can you point me to the right resources or any suggestions/ideas that may be helpful for us to achieve this?
Thanks in advance.
@nagendra, does this article summarize the Postgres configuration you are aiming for? And your question is if Canton can support this?
Hello @WallaceKelly, yes this is exactly what i am looking for, does Canton support this through some means? The password/token generated by AWS for RDS will expire in 15 minutes as far as my understanding goes.
Hi @nagendra,
We have not tested deploying Canton against an IAM-based persistence layer, so I am somewhat skeptical that expiring tokens would work well out of the box. Our recommendation would therefore be to create a postgres user with password for use by Canton.
Thanks,
– Oliver
Hello @oliverse,
We use this hack in our spring boot project to release the DB connection every 14 minutes so that we get a new connection with new token every 14 minutes. The property to set it being spring.datasource.hikari.max-lifetime=840000, do we have any such property in canton so that we can think of a workaround? I’ve gone through these docs already but still prying.
Hello @nagendra
The link that you mention contains the comment: “After 14 minutes, the application does a new request to RDS for a fresh authentication token”
This reads to me as if the driver does not perform the authentication token refresh, but would require canton to request a new token. Canton does not have a facility in place to request a new database token, and in fact such a mechanism is unlikely to work well with High-Availability setups for example.
Thanks,
– Oliver
Maybe, possibly, you can use the AWS JDBC Wrapper. This is completely untested, unsupported, and at your own risk, but it sounds like it may just work. And it does claim to support RDS IAM authentication.
You’d have to download the jdbc-wapper JAR, put it on the classpath for the Canton process, and then construct a JDBC connection string of the form jdbc:aws-wrapper:postgresql://... with all the needed parameters.
We do use AWS JDBC wrapper for Keycloak. The driver is added to the classpath before build, like below.
ARG VERSION
FROM quay.io/keycloak/keycloak:$VERSION as builder
ENV KC_HEALTH_ENABLED=true
ENV KC_METRICS_ENABLED=true
ENV KC_DB=postgres
ENV KC_TRANSACTION_XA_ENABLED=false
ADD --chmod=0666 https://github.com/aws/aws-advanced-jdbc-wrapper/releases/download/2.3.9/aws-advanced-jdbc-wrapper-2.3.9-bundle-federated-auth.jar /opt/keycloak/providers/aws-advanced-jdbc-wrapper.jar
ENV KC_DB_DRIVER=software.amazon.jdbc.Driver
COPY cache-ispn-jdbc-ping.xml /opt/keycloak/conf/cache-ispn-jdbc-ping.xml
ENV KC_CACHE_CONFIG_FILE=cache-ispn-jdbc-ping.xml
RUN /opt/keycloak/bin/kc.sh build
And we build the canton image like so
FROM docker.io/eclipse-temurin:11-jdk-focal
WORKDIR /canton
COPY ./bin/canton bin/canton
COPY ./lib lib
COPY ./simple-topology.conf .
RUN echo "Precompiling canton console. Please ignore the following output" && bin/canton --config simple-topology.conf --no-tty < /dev/null && rm -rf log
ENTRYPOINT ["bin/canton"]
Where /bin contains canton and /lib contains the jar. When you say “put it on the classpath for the Canton process”, i’m guessing it has to be under /lib?
I’m not sure everything in /lib is automatically picked up. You can run Canton manually by running something like
java -cp lib/aws-wrapper.jar:lib/canton-enterprise-X.Y.Z.jar com.digitalasset.canton.CantonEnterpriseApp -c canton.conf
Remove the Enterprise if using open source.
Tried this, but the it failed to get a connection. My config file:
canton {
domains {
domain {
public-api {
port = 3001
address = 0.0.0.0
}
admin-api {
port = 3002
}
storage {
type = postgres
config {
driver = "software.amazon.jdbc.Driver"
url = "jdbc:aws-wrapper:postgresql://${DB_HOST}/${DB_NAME}?&sslmode=verify-ca&sslfactory=org.postgresql.ssl.DefaultJavaSSLFactory&wrapperPlugins=iam,failover"
user = demo
}
parameters.max-connections = 30
}
sequencer {
writer = {
type = low-latency
}
}
}
}
features.enable-testing-commands = yes
}
Logs:
org.postgresql.util.PSQLException: The connection attempt failed.
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:354)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54)
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:263)
at org.postgresql.Driver.makeConnection(Driver.java:443)
at org.postgresql.Driver.connect(Driver.java:297)
at software.amazon.jdbc.DriverConnectionProvider.connect(DriverConnectionProvider.java:136)
at software.amazon.jdbc.plugin.DefaultConnectionPlugin.connectInternal(DefaultConnectionPlugin.java:203)
at software.amazon.jdbc.plugin.DefaultConnectionPlugin.connect(DefaultConnectionPlugin.java:191)
at software.amazon.jdbc.ConnectionPluginManager.lambda$connect$6(ConnectionPluginManager.java:373)
Is there a Caused By lower down in the stack trace?
Oh yeah, its SocketTimeoutException.
Caused by: java.net.SocketTimeoutException: connect timed out
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.base/java.net.Socket.connect(Socket.java:609)
at org.postgresql.core.PGStream.createSocket(PGStream.java:243)
at org.postgresql.core.PGStream.<init>(PGStream.java:98)
at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:132)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:258)
... 83 common frames omitted
I’m checking on the SSL front.
We are supposed to trust the certs from https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem, we are doing it in keycloak image by adding the certs to a truststore path. In case of canton i tried adding these to java truststore as below, but it didn’t help.
keytool -import -v -trustcacerts -alias rds.amazonaws.com -file global-bundle.pem -keystore /cacerts
SocketTimeoutException sounds like a fairly generic networking issue. Have you checked in some way (eg ping) that the host has connectivity to ${DB_HOST} and that the config parameter interpolation is working as you expect?
Yes, i’m checking few things at my end, i was able to resolve the timeout issue, yet i couldn’t manage to get it working. I will post here once i have an update.
I managed to get it working, the issue was with sslmode=verify-ca property set in JDBC url, changing it to sslmode=require did the trick. Thanks much for all the help, may be this will be helpful to someone else too.
One final query, we are used to starting canton in daemon mode and with this change we are starting it manually like below as you suggested
java -cp lib/aws-jdbc-wrapper.jar:lib/canton-enterprise-2.8.0.jar com.digitalasset.canton.CantonEnterpriseApp --config=canton.conf --bootstrap=domain.canton --log-level-stdout=INFO
do you foresee any problems if we do it this way?
Hi, i’m back again. The canton domain is working fine, but when i start the participant, the app crashes with below exception.
2024-09-25 14:07:26,266 [canton-env-ec-47] DEBUG c.d.c.p.ParticipantNodeBootstrap:participant=participant - Successfully completed shutdown of participant
2024-09-25 14:07:26,268 [main] ERROR c.d.c.e.CommunityEnvironment tid:15cc14a823e0376024640b371aad2680 - Failed to start participant: Ledger API server failed to start: FailedToStartLedgerApiServer(
java.lang.RuntimeException: JDBC URL doesn't match any supported databases (h2, pg, oracle)
at scala.sys.package$.error(package.scala:27)
at com.digitalasset.canton.platform.store.DbType$.jdbcType(DbType.scala:47)
at com.digitalasset.canton.platform.store.FlywayMigrations.<init>(FlywayMigrations.scala:26)
at com.digitalasset.canton.platform.indexer.IndexerServiceOwner.acquire(IndexerServiceOwner.scala:48)
at com.digitalasset.canton.platform.indexer.IndexerServiceOwner.acquire(IndexerServiceOwner.scala:26)
at com.daml.resources.AbstractResourceOwner$$anon$2.acquire(AbstractResourceOwner.scala:38)
at com.daml.resources.AbstractResourceOwner$$anon$2.$anonfun$acquire$1(AbstractResourceOwner.scala:38)
at scala.concurrent.impl.Promise$Transformation.run(Promise.scala:467)
at com.daml.executors.QueueAwareExecutorService$TrackingRunnable.run(QueueAwareExecutorService.scala:98)
at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
My config file goes like below
canton {
participants {
participant {
admin-api {
port = 4001
address = 0.0.0.0
}
ledger-api {
port = 4002
address = 0.0.0.0
auth-services = [{
type = jwt-rs-256-jwks
url = ${JWT_URL}
}]
postgres-data-source.synchronous-commit = off
}
storage {
type = postgres
config {
queueSize = 10000
driver = "software.amazon.jdbc.Driver"
url = "jdbc:aws-wrapper:postgresql://${DB_HOST}/${DB_NAME}?sslmode=require&wrapperPlugins=iam,failover"
user = demo
}
parameters.max-connections = 30
}
}
}
features.enable-testing-commands = yes
}
The tables have been created in participant schema, hence the DB connection looks fine. The error seems to be when its trying to start ledger API sever.
Any insight is much appreciated.
Hey!
Unfortunately, I think you’ve hit a rather strict check on the db-type in the ledger-api server.
github.comdigital-asset/canton/blob/20bd47a9c2ce0392ffb10d2da58ae57afc72d07a/community/ledger/ledger-api-core/src/main/scala/com/digitalasset/canton/platform/store/DbType.scala#L47
- "oracle.jdbc.OracleDriver",
- supportsParallelWrites = true,
- supportsAsynchronousCommits = false,
- )
- def jdbcType(jdbcUrl: String): DbType = jdbcUrl match {
- case h2 if h2.startsWith("jdbc:h2:") => H2Database
- case pg if pg.startsWith("jdbc:postgresql:") => Postgres
- case oracle if oracle.startsWith("jdbc:oracle:") => Oracle
- case _ =>
- sys.error(s"JDBC URL doesn't match any supported databases (h2, pg, oracle)")
- }
- }
So I think we can say that without changing this check, the JDBC URL will not be recognized as Postgres. On what version of canton are you?
Cheers,
Ratko
What you could do to test this out is the following:
- Checkout the repo GitHub - digital-asset/canton: Global Workflow Composition that is Scalable, Secure, and GDPR-compliant, branch main-2.x
- Edit the file that throws
+++ b/community/ledger/ledger-api-core/src/main/scala/com/digitalasset/canton/platform/store/DbType.scala
@@ -41,7 +41,7 @@ object DbType {
def jdbcType(jdbcUrl: String): DbType = jdbcUrl match {
case h2 if h2.startsWith("jdbc:h2:") => H2Database
- case pg if pg.startsWith("jdbc:postgresql:") => Postgres
+ case pg if pg.startsWith("jdbc:") && pg.contains(":postgresql:") => Postgres
- Compile the open source version using
sbt community-app/bundle(needs sbt installed) - Start canton with a special configuration, where you define the ledger-api-jdbc-url manually, the same way you would do it for Oracle (Persistence — Daml SDK 2.9.5 documentation). In 2.x, Canton puts the ledger api data into a separate schema and therefore creates a JDBC URL internally. That JDBC URL generation doesn’t work with the aws-wrapper, but if you just bypass it, you will be fine. A bigger fix is not necessary as with 3.x, there is no separate ledger-api DB, so it wouldn’t be worth the effort.
- If it works for you in AWS, open a PR on the public canton repository. We’ll merge it and it will be pushed out with the next release of Canton.
Hi, we are using canton 2.8.10.
Thank you, i’ll try this and get back.
@Ratko_Veprek - Sorry for the delay, we finally got to test this, but it needed few more changes to get it working than what was suggested above. Especially in the below file,
+++ b/community/ledger/ledger-api-core/src/main/scala/com/digitalasset/canton/platform/store/backend/postgresql/PostgresDataSourceStorageBackend.scala
@@ -6,7 +6,7 @@ package com.digitalasset.canton.platform.store.backend.postgresql
import anorm.SqlParser.get
import anorm.SqlStringInterpolation
import com.daml.resources.ProgramResource.StartupException
-import com.digitalasset.canton.logging.{NamedLoggerFactory, NamedLogging}
+import com.digitalasset.canton.logging.{NamedLoggerFactory, NamedLogging, TracedLogger}
import com.digitalasset.canton.platform.store.backend.DataSourceStorageBackend
import com.digitalasset.canton.platform.store.backend.common.{
DataSourceStorageBackendImpl,
@@ -14,7 +14,7 @@ import com.digitalasset.canton.platform.store.backend.common.{
}
import com.digitalasset.canton.platform.store.backend.postgresql.PostgresDataSourceConfig.SynchronousCommitValue
import com.digitalasset.canton.tracing.TraceContext
-import org.postgresql.ds.PGSimpleDataSource
+import com.zaxxer.hikari.HikariDataSource
import java.sql.Connection
import javax.sql.DataSource
@@ -25,6 +25,7 @@ final case class PostgresDataSourceConfig(
tcpKeepalivesIdle: Option[Int] = Some(10), // corresponds to: tcp_keepalives_idle
tcpKeepalivesInterval: Option[Int] = Some(1), // corresponds to: tcp_keepalives_interval
tcpKeepalivesCount: Option[Int] = Some(5), // corresponds to: tcp_keepalives_count
+ driverClassName: Option[String] = None,
)
object PostgresDataSourceConfig {
@@ -57,8 +58,15 @@ class PostgresDataSourceStorageBackend(
connectionInitHook: Option[Connection => Unit],
): DataSource = {
import DataSourceStorageBackendImpl.exe
- val pgSimpleDataSource = new PGSimpleDataSource()
- pgSimpleDataSource.setUrl(dataSourceConfig.jdbcUrl)
+ implicit val traceContext: TraceContext = TraceContext.empty
+ val logger = TracedLogger(loggerFactory.getLogger(getClass))
+ val hikariDataSource = new HikariDataSource()
+ hikariDataSource.setJdbcUrl(dataSourceConfig.jdbcUrl)
+
+ dataSourceConfig.postgresConfig.driverClassName.foreach(i => {
+ logger.info(s"Using driver class name: $i")
+ hikariDataSource.setDriverClassName(i)
+ })
val hookFunctions = List(
dataSourceConfig.postgresConfig.synchronousCommit.toList
@@ -71,7 +79,7 @@ class PostgresDataSourceStorageBackend(
.map(i => exe(s"SET tcp_keepalives_count TO $i")),
connectionInitHook.toList,
).flatten
- InitHookDataSourceProxy(pgSimpleDataSource, hookFunctions, loggerFactory)
+ InitHookDataSourceProxy(hikariDataSource, hookFunctions, loggerFactory)
}
Summarizing the changes:
- Use of HikariDataSource instead of PGSimpleDataSource, as PGSimpleDataSource doesn’t support the use of aws jdbc driver.
- Addition of an optional property ‘driverClassName’ to the existing Ledger API PostgresDataSourceConfig. This can be used to specify ‘software.amazon.jdbc.Driver’ in this case.
- Set ledger-api-jdbc-url manually to include the “jdbc:aws-wrapper:” url format.
- Set the jdbcUrl and the driverClassName for the HikariDataSource
Basic testing seems to be fine, full fledged testing is in progress, will update your further.
Let me know what do you think about these changes, happy to open a PR if everything goes good.
Any updates on this?
Ah sorry. Missed your update. Let me check it quickly.
It seems to cause some issues with lock allocation in the HA coordinator. Did it work for you?
Haven’t seen any errors till now, do you see any errors in the logs? Please post more details, I can check.
Yes, so I’ve looked at some of the tests. Effectively there is a high level and a fundamental problem with the change. The high level is likely that the pool never gets closed and therefore doesn’t release the database lock, which breaks HA failover.
I then checked with the author of that part and his response was:
except we want to have a hikari pool backed by another hikari pool, I would not do it. The purpose of DataSourceStorageBackend.createDataSource is to create the the pristine/simple/most importantly NOT POOLED data source, which will be used appropriately later for example put in as an input for a hikari pool
So it seems to me that this is a bit more invasive as you need to load a specific driver for AWS RDS. So instead of returning HikariDataSource, you will likely need to return AwsWrapperDataSource.
Depending on the variation of the configuration, this could be either done within the PG Storage Backend or create an explicit AwsRDSDataSourceStorageBackend: aws-advanced-jdbc-wrapper/docs/using-the-jdbc-driver/DataSource.md at f5b9dd63a894c21d5319513856ff3581d9747ddc · aws/aws-advanced-jdbc-wrapper · GitHub
Ideally, we’d load the AWS data source using reflection so we don’t need to link the JAR at compile time.
Actually, I asked me why it worked for you at all, as Canton has two storage backends (historical reasons, we are working on getting rid of one). The other one seems to automatically figure out which data source to use: canton/community/base/src/main/scala/com/digitalasset/canton/resource/Storage.scala at b5183318993b0201676627ad78ce85c88e9e64b4 · digital-asset/canton · GitHub
Yeah, so this is a bit more involved
Thanks for the insights and makes sense. Let me try something along these lines and get back.