Skip to content
Discussions/App Development/AWS RDS Postgres password-less authentication in CantonForum ↗

AWS RDS Postgres password-less authentication in Canton

App Development28 posts616 views5 likesLast activity Dec 2024
NA
nagendraOP
Sep 2024

Hello,

We are trying to setup canton domain and participant with AWS RDS Aurora Postgres and the idea is to connect to Postgres without password by using IAM user and service accounts. Can you point me to the right resources or any suggestions/ideas that may be helpful for us to achieve this?

Thanks in advance.

WA
WallaceKelly
Sep 2024

@nagendra, does this article summarize the Postgres configuration you are aiming for? And your question is if Canton can support this?

NA
nagendra
Sep 2024

Hello @WallaceKelly, yes this is exactly what i am looking for, does Canton support this through some means? The password/token generated by AWS for RDS will expire in 15 minutes as far as my understanding goes.

OL
oliverse
Sep 2024

Hi @nagendra,

We have not tested deploying Canton against an IAM-based persistence layer, so I am somewhat skeptical that expiring tokens would work well out of the box. Our recommendation would therefore be to create a postgres user with password for use by Canton.

Thanks,
– Oliver

NA
nagendra
Sep 2024

Hello @oliverse,

We use this hack in our spring boot project to release the DB connection every 14 minutes so that we get a new connection with new token every 14 minutes. The property to set it being spring.datasource.hikari.max-lifetime=840000, do we have any such property in canton so that we can think of a workaround? I’ve gone through these docs already but still prying.

OL
oliverse
Sep 2024

Hello @nagendra

The link that you mention contains the comment: “After 14 minutes, the application does a new request to RDS for a fresh authentication token”

This reads to me as if the driver does not perform the authentication token refresh, but would require canton to request a new token. Canton does not have a facility in place to request a new database token, and in fact such a mechanism is unlikely to work well with High-Availability setups for example.

Thanks,
– Oliver

BE
bernhard
Sep 2024

Maybe, possibly, you can use the AWS JDBC Wrapper. This is completely untested, unsupported, and at your own risk, but it sounds like it may just work. And it does claim to support RDS IAM authentication.

You’d have to download the jdbc-wapper JAR, put it on the classpath for the Canton process, and then construct a JDBC connection string of the form jdbc:aws-wrapper:postgresql://... with all the needed parameters.

NA
nagendra
Sep 2024

We do use AWS JDBC wrapper for Keycloak. The driver is added to the classpath before build, like below.

ARG VERSION
FROM quay.io/keycloak/keycloak:$VERSION as builder
ENV KC_HEALTH_ENABLED=true
ENV KC_METRICS_ENABLED=true
ENV KC_DB=postgres
ENV KC_TRANSACTION_XA_ENABLED=false
ADD --chmod=0666 https://github.com/aws/aws-advanced-jdbc-wrapper/releases/download/2.3.9/aws-advanced-jdbc-wrapper-2.3.9-bundle-federated-auth.jar /opt/keycloak/providers/aws-advanced-jdbc-wrapper.jar
ENV KC_DB_DRIVER=software.amazon.jdbc.Driver

COPY cache-ispn-jdbc-ping.xml /opt/keycloak/conf/cache-ispn-jdbc-ping.xml
ENV KC_CACHE_CONFIG_FILE=cache-ispn-jdbc-ping.xml

RUN /opt/keycloak/bin/kc.sh build

And we build the canton image like so

FROM docker.io/eclipse-temurin:11-jdk-focal
WORKDIR /canton
COPY ./bin/canton bin/canton
COPY ./lib lib
COPY ./simple-topology.conf .
RUN echo "Precompiling canton console. Please ignore the following output" && bin/canton --config simple-topology.conf --no-tty < /dev/null && rm -rf log
ENTRYPOINT ["bin/canton"]

Where /bin contains canton and /lib contains the jar. When you say “put it on the classpath for the Canton process”, i’m guessing it has to be under /lib?

BE
bernhard
Sep 2024

I’m not sure everything in /lib is automatically picked up. You can run Canton manually by running something like

java -cp lib/aws-wrapper.jar:lib/canton-enterprise-X.Y.Z.jar com.digitalasset.canton.CantonEnterpriseApp -c canton.conf

Remove the Enterprise if using open source.

NA
nagendra
Sep 2024

Tried this, but the it failed to get a connection. My config file:

canton {
      domains {
        domain {
          public-api {
            port = 3001
            address = 0.0.0.0
          }
          admin-api {
            port = 3002
          }
          storage {
              type = postgres
              config {
                driver = "software.amazon.jdbc.Driver"
                url = "jdbc:aws-wrapper:postgresql://${DB_HOST}/${DB_NAME}?&sslmode=verify-ca&sslfactory=org.postgresql.ssl.DefaultJavaSSLFactory&wrapperPlugins=iam,failover"
                user = demo
              }
              parameters.max-connections = 30
            }
            sequencer {
            writer = {
                type = low-latency
            }
          }
        }
      }
      features.enable-testing-commands = yes
    }

Logs:

org.postgresql.util.PSQLException: The connection attempt failed.
	at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:354)
	at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54)
	at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:263)
	at org.postgresql.Driver.makeConnection(Driver.java:443)
	at org.postgresql.Driver.connect(Driver.java:297)
	at software.amazon.jdbc.DriverConnectionProvider.connect(DriverConnectionProvider.java:136)
	at software.amazon.jdbc.plugin.DefaultConnectionPlugin.connectInternal(DefaultConnectionPlugin.java:203)
	at software.amazon.jdbc.plugin.DefaultConnectionPlugin.connect(DefaultConnectionPlugin.java:191)
	at software.amazon.jdbc.ConnectionPluginManager.lambda$connect$6(ConnectionPluginManager.java:373)
BE
bernhard
Sep 2024

Is there a Caused By lower down in the stack trace?

NA
nagendra
Sep 2024

Oh yeah, its SocketTimeoutException.

Caused by: java.net.SocketTimeoutException: connect timed out
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412)
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255)
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.base/java.net.Socket.connect(Socket.java:609)
	at org.postgresql.core.PGStream.createSocket(PGStream.java:243)
	at org.postgresql.core.PGStream.<init>(PGStream.java:98)
	at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:132)
	at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:258)
	... 83 common frames omitted

I’m checking on the SSL front.

NA
nagendra
Sep 2024

We are supposed to trust the certs from https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem, we are doing it in keycloak image by adding the certs to a truststore path. In case of canton i tried adding these to java truststore as below, but it didn’t help.

keytool -import -v -trustcacerts -alias rds.amazonaws.com -file global-bundle.pem -keystore /cacerts
BE
bernhard
Sep 2024

SocketTimeoutException sounds like a fairly generic networking issue. Have you checked in some way (eg ping) that the host has connectivity to ${DB_HOST} and that the config parameter interpolation is working as you expect?

NA
nagendra
Sep 2024

Yes, i’m checking few things at my end, i was able to resolve the timeout issue, yet i couldn’t manage to get it working. I will post here once i have an update.

NA
nagendra
Sep 2024

I managed to get it working, the issue was with sslmode=verify-ca property set in JDBC url, changing it to sslmode=require did the trick. Thanks much for all the help, may be this will be helpful to someone else too.

One final query, we are used to starting canton in daemon mode and with this change we are starting it manually like below as you suggested

java -cp lib/aws-jdbc-wrapper.jar:lib/canton-enterprise-2.8.0.jar com.digitalasset.canton.CantonEnterpriseApp --config=canton.conf --bootstrap=domain.canton --log-level-stdout=INFO

do you foresee any problems if we do it this way?

NA
nagendra
Sep 2024

Hi, i’m back again. The canton domain is working fine, but when i start the participant, the app crashes with below exception.

2024-09-25 14:07:26,266 [canton-env-ec-47] DEBUG c.d.c.p.ParticipantNodeBootstrap:participant=participant - Successfully completed shutdown of participant
2024-09-25 14:07:26,268 [main] ERROR c.d.c.e.CommunityEnvironment tid:15cc14a823e0376024640b371aad2680 - Failed to start participant: Ledger API server failed to start: FailedToStartLedgerApiServer(
  java.lang.RuntimeException: JDBC URL doesn't match any supported databases (h2, pg, oracle)
	at scala.sys.package$.error(package.scala:27)
	at com.digitalasset.canton.platform.store.DbType$.jdbcType(DbType.scala:47)
	at com.digitalasset.canton.platform.store.FlywayMigrations.<init>(FlywayMigrations.scala:26)
	at com.digitalasset.canton.platform.indexer.IndexerServiceOwner.acquire(IndexerServiceOwner.scala:48)
	at com.digitalasset.canton.platform.indexer.IndexerServiceOwner.acquire(IndexerServiceOwner.scala:26)
	at com.daml.resources.AbstractResourceOwner$$anon$2.acquire(AbstractResourceOwner.scala:38)
	at com.daml.resources.AbstractResourceOwner$$anon$2.$anonfun$acquire$1(AbstractResourceOwner.scala:38)
	at scala.concurrent.impl.Promise$Transformation.run(Promise.scala:467)
	at com.daml.executors.QueueAwareExecutorService$TrackingRunnable.run(QueueAwareExecutorService.scala:98)
	at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)

My config file goes like below

canton {
      participants {
        participant {
          admin-api {
              port = 4001
              address = 0.0.0.0
          }
          ledger-api {
              port = 4002
              address = 0.0.0.0
              auth-services = [{
                type = jwt-rs-256-jwks
                url = ${JWT_URL}
              }]
              postgres-data-source.synchronous-commit = off
          }
          storage {
              type = postgres
              config {
                queueSize = 10000
                driver = "software.amazon.jdbc.Driver"
                url = "jdbc:aws-wrapper:postgresql://${DB_HOST}/${DB_NAME}?sslmode=require&wrapperPlugins=iam,failover"
                user = demo
              }
              parameters.max-connections = 30
          }
        }
      }
      features.enable-testing-commands = yes
    }

The tables have been created in participant schema, hence the DB connection looks fine. The error seems to be when its trying to start ledger API sever.

Any insight is much appreciated.

RA
Ratko_Veprek
Oct 2024

Hey!

Unfortunately, I think you’ve hit a rather strict check on the db-type in the ledger-api server.

github.com

digital-asset/canton/blob/20bd47a9c2ce0392ffb10d2da58ae57afc72d07a/community/ledger/ledger-api-core/src/main/scala/com/digitalasset/canton/platform/store/DbType.scala#L47


      
  1. "oracle.jdbc.OracleDriver",
  2. supportsParallelWrites = true,
  3. supportsAsynchronousCommits = false,
  4. )
  5. def jdbcType(jdbcUrl: String): DbType = jdbcUrl match {
  6. case h2 if h2.startsWith("jdbc:h2:") => H2Database
  7. case pg if pg.startsWith("jdbc:postgresql:") => Postgres
  8. case oracle if oracle.startsWith("jdbc:oracle:") => Oracle
  9. case _ =>
  10. sys.error(s"JDBC URL doesn't match any supported databases (h2, pg, oracle)")
  11. }
  12. }

So I think we can say that without changing this check, the JDBC URL will not be recognized as Postgres. On what version of canton are you?

Cheers,
Ratko

RA
Ratko_Veprek
Nov 2024

What you could do to test this out is the following:

  1. Checkout the repo GitHub - digital-asset/canton: Global Workflow Composition that is Scalable, Secure, and GDPR-compliant, branch main-2.x
  2. Edit the file that throws
+++ b/community/ledger/ledger-api-core/src/main/scala/com/digitalasset/canton/platform/store/DbType.scala
@@ -41,7 +41,7 @@ object DbType {

   def jdbcType(jdbcUrl: String): DbType = jdbcUrl match {
     case h2 if h2.startsWith("jdbc:h2:") => H2Database
-    case pg if pg.startsWith("jdbc:postgresql:") => Postgres
+    case pg if pg.startsWith("jdbc:") && pg.contains(":postgresql:") => Postgres
  1. Compile the open source version using sbt community-app/bundle (needs sbt installed)
  2. Start canton with a special configuration, where you define the ledger-api-jdbc-url manually, the same way you would do it for Oracle (Persistence — Daml SDK 2.9.5 documentation). In 2.x, Canton puts the ledger api data into a separate schema and therefore creates a JDBC URL internally. That JDBC URL generation doesn’t work with the aws-wrapper, but if you just bypass it, you will be fine. A bigger fix is not necessary as with 3.x, there is no separate ledger-api DB, so it wouldn’t be worth the effort.
  3. If it works for you in AWS, open a PR on the public canton repository. We’ll merge it and it will be pushed out with the next release of Canton.
NA
nagendra
Nov 2024

Hi, we are using canton 2.8.10.

NA
nagendra
Nov 2024

Thank you, i’ll try this and get back.

SE
senrav
Dec 2024

@Ratko_Veprek - Sorry for the delay, we finally got to test this, but it needed few more changes to get it working than what was suggested above. Especially in the below file,

+++ b/community/ledger/ledger-api-core/src/main/scala/com/digitalasset/canton/platform/store/backend/postgresql/PostgresDataSourceStorageBackend.scala
@@ -6,7 +6,7 @@ package com.digitalasset.canton.platform.store.backend.postgresql
 import anorm.SqlParser.get
 import anorm.SqlStringInterpolation
 import com.daml.resources.ProgramResource.StartupException
-import com.digitalasset.canton.logging.{NamedLoggerFactory, NamedLogging}
+import com.digitalasset.canton.logging.{NamedLoggerFactory, NamedLogging, TracedLogger}
 import com.digitalasset.canton.platform.store.backend.DataSourceStorageBackend
 import com.digitalasset.canton.platform.store.backend.common.{
   DataSourceStorageBackendImpl,
@@ -14,7 +14,7 @@ import com.digitalasset.canton.platform.store.backend.common.{
 }
 import com.digitalasset.canton.platform.store.backend.postgresql.PostgresDataSourceConfig.SynchronousCommitValue
 import com.digitalasset.canton.tracing.TraceContext
-import org.postgresql.ds.PGSimpleDataSource
+import com.zaxxer.hikari.HikariDataSource
 
 import java.sql.Connection
 import javax.sql.DataSource
@@ -25,6 +25,7 @@ final case class PostgresDataSourceConfig(
     tcpKeepalivesIdle: Option[Int] = Some(10), // corresponds to: tcp_keepalives_idle
     tcpKeepalivesInterval: Option[Int] = Some(1), // corresponds to: tcp_keepalives_interval
     tcpKeepalivesCount: Option[Int] = Some(5), // corresponds to: tcp_keepalives_count
+    driverClassName: Option[String] = None,
 )
 
 object PostgresDataSourceConfig {
@@ -57,8 +58,15 @@ class PostgresDataSourceStorageBackend(
       connectionInitHook: Option[Connection => Unit],
   ): DataSource = {
     import DataSourceStorageBackendImpl.exe
-    val pgSimpleDataSource = new PGSimpleDataSource()
-    pgSimpleDataSource.setUrl(dataSourceConfig.jdbcUrl)
+    implicit val traceContext: TraceContext = TraceContext.empty
+    val logger = TracedLogger(loggerFactory.getLogger(getClass))
+    val hikariDataSource = new HikariDataSource()
+    hikariDataSource.setJdbcUrl(dataSourceConfig.jdbcUrl)
+    
+    dataSourceConfig.postgresConfig.driverClassName.foreach(i => {
+        logger.info(s"Using driver class name: $i")
+        hikariDataSource.setDriverClassName(i)
+      })
 
     val hookFunctions = List(
       dataSourceConfig.postgresConfig.synchronousCommit.toList
@@ -71,7 +79,7 @@ class PostgresDataSourceStorageBackend(
         .map(i => exe(s"SET tcp_keepalives_count TO $i")),
       connectionInitHook.toList,
     ).flatten
-    InitHookDataSourceProxy(pgSimpleDataSource, hookFunctions, loggerFactory)
+    InitHookDataSourceProxy(hikariDataSource, hookFunctions, loggerFactory)
   }

Summarizing the changes:

  1. Use of HikariDataSource instead of PGSimpleDataSource, as PGSimpleDataSource doesn’t support the use of aws jdbc driver.
  2. Addition of an optional property ‘driverClassName’ to the existing Ledger API PostgresDataSourceConfig. This can be used to specify ‘software.amazon.jdbc.Driver’ in this case.
  3. Set ledger-api-jdbc-url manually to include the “jdbc:aws-wrapper:” url format.
  4. Set the jdbcUrl and the driverClassName for the HikariDataSource

Basic testing seems to be fine, full fledged testing is in progress, will update your further.

Let me know what do you think about these changes, happy to open a PR if everything goes good.

SE
senrav
Dec 2024

Any updates on this?

RA
Ratko_Veprek
Dec 2024

Ah sorry. Missed your update. Let me check it quickly.

RA
Ratko_Veprek
Dec 2024

It seems to cause some issues with lock allocation in the HA coordinator. Did it work for you?

SE
senrav
Dec 2024

Haven’t seen any errors till now, do you see any errors in the logs? Please post more details, I can check.

RA
Ratko_Veprek
Dec 2024

Yes, so I’ve looked at some of the tests. Effectively there is a high level and a fundamental problem with the change. The high level is likely that the pool never gets closed and therefore doesn’t release the database lock, which breaks HA failover.

I then checked with the author of that part and his response was:

except we want to have a hikari pool backed by another hikari pool, I would not do it. The purpose of DataSourceStorageBackend.createDataSource is to create the the pristine/simple/most importantly NOT POOLED data source, which will be used appropriately later for example put in as an input for a hikari pool

So it seems to me that this is a bit more invasive as you need to load a specific driver for AWS RDS. So instead of returning HikariDataSource, you will likely need to return AwsWrapperDataSource.

Depending on the variation of the configuration, this could be either done within the PG Storage Backend or create an explicit AwsRDSDataSourceStorageBackend: aws-advanced-jdbc-wrapper/docs/using-the-jdbc-driver/DataSource.md at f5b9dd63a894c21d5319513856ff3581d9747ddc · aws/aws-advanced-jdbc-wrapper · GitHub

Ideally, we’d load the AWS data source using reflection so we don’t need to link the JAR at compile time.

Actually, I asked me why it worked for you at all, as Canton has two storage backends (historical reasons, we are working on getting rid of one). The other one seems to automatically figure out which data source to use: canton/community/base/src/main/scala/com/digitalasset/canton/resource/Storage.scala at b5183318993b0201676627ad78ce85c88e9e64b4 · digital-asset/canton · GitHub

Yeah, so this is a bit more involved

SE
senrav
Dec 2024

Thanks for the insights and makes sense. Let me try something along these lines and get back.

← Back to Discussions