Debugging (potential) issue with http-json-api
Hi there,
I’m working on a project where we have an architecture something like this:
client -> custom-http-json-api-proxy -> http-json-api -> ledger. Under load under circumstances we’re having trouble reproducing, we’re getting an issue where calls to /exercise from http-json-api-proxy succeed on the ledger, but fail to return. The sequence of events is something like:
-
http-json-api-proxyforwards aPOSTrequest - stuff gets written to the ledger
-
http-json-apiwaits 2 seconds, logs that it has returned a 200 response -
http-json-api-proxywaits ~30 seconds, then reports a503error.
This smells like some kind resource exhaustion (thread pools maybe, b/c akka), but because we can’t yet reproduce it in a non-prod environment (and for the moment the application is turned off in prod), it’s a little difficult to diagnose. Notably, we’re also proxying websocket requests, which are consistently getting borked (probably by our proxy), closed, and retried, so there’s a potential mechanism for thread pool exhaustion, either on our proxy, or on http-json-api. Also haven’t repro’d, and probably need tcpdump + better visibility into what’s happening on http-json-api to diagnose.
So, anyways, that’s background. Got a couple of questions.
- What facility is there to see the runtime workings of
http-json-api? - How much effort would it be to build my own
http-json-apimaybe withakka-http-metricsbaked in? - Any suggestions on how to diagnose this?
- Any idle thoughts on how
http-json-apimight log a success without actually returning it?
What facility is there to see the runtime workings of
http-json-api?
I’m not 100% sure of what exactly you would like to see here, but we do have a set of metrics and logging that is available, which is described more in depth on the documentation.
How much effort would it be to build my own
http-json-apimaybe withakka-http-metricsbaked in?
I cannot quantify the effort, but I wouldn’t say much more than half a day. Perhaps @Stephen or @victor.pr.mueller can give you a more informed answer.
As for the two remaining questions, I would again ask either Stephen or Victor to suggest their ideas. Looking at the description of the problem it appears that it’s something happening at the proxy level (I had a quick look about this issue a few days and found a StackOverflow answer that seemed to point out at the proxy being unable to support request timeouts for streams – but I’m not sure whether you’re using Envoy as a proxy and/or if this applies in your case).
I’m not 100% sure of what exactly you would like to see here
At the moment what I’d like to see in particular is akka thread pool information sans having to do a thread dump and make sense of it.
- are there any facilities or planned facilities for being able to increase log level without having to bounce the JVM process?
- ditto for metrics
- is there any plan for being able to supply a
trace-id/correlation-id for tracing requests end-to-end through the system, a la canton? - is there any supplied way to get visibility into the threadpool?
I also suspect the 503 issue is being caused by the proxy, but is unclear if it’s manifesting in our proxy or in http-json-api. Our architecture is really more something like cloudarmor → istio → json-api-proxy → http-json-api → ledger, where json-api-proxy is a proxy we’ve written with akka. I think envoy is in the mix, but it’s not sitting between the two failing components.
are there any facilities or planned facilities for being able to increase log level without having to bounce the JVM process?
Not without bounding the process, no.
ditto for metrics
All metrics are always recorded.
is there any plan for being able to supply a
trace-id/correlation-id for tracing requests end-to-end through the system, a la canton?
There is a request_id that is part of the context of each log entry. There is also an instance_id in case you are aggregating logs for multiple HTTP JSON API instances in a single collector.
is there any supplied way to get visibility into the threadpool?
Not currently, no.
Daniel_Porter:
is there any supplied way to get visibility into the threadpool?
Not currently, no.
Are there plans in the future to expose this?
Are there plans in the future to expose this?
There are no plans currently to expose the thread pool metrics.
How much effort would it be to build my own
http-json-apimaybe withakka-http-metricsbaked in?
It’s pretty easy to get building on the daml repo. Have git installed, follow the instructions, then bazel build //ledger-service/http-json:http-json-binary to make yourself a standalone jar and runscript if you need one.
Some starting points are Endpoints.scala and HttpService.scala in that tree, and bazel-java-deps.bzl for manipulating the available libraries from Maven. I couldn’t say what the difficulty of working in akka-http-metrics would be, though, nor its efficacy.
Any idle thoughts on how
http-json-apimight log a success without actually returning it?
At that point control should have passed out of Endpoints and back to akka-http, unless the log you’re talking about is specifically the one that says "Responding to client with HTTP...". There would be more interesting things to look at were this one of the streaming responses, but exercise is a strict response, so everything interesting has already happened by that log.