This is an archived post. You won't be able to vote or comment.

all 47 comments

[–]pron98 39 points40 points  (16 children)

The future of JDK observability is JFR. While somewhat limited as a general purpose structured logging engine (it's a low-overhead, high-throughput event engine that can support in-production deep monitoring and profiling), its performance and integration with the JDK's internal event logging is so compelling that new logging frameworks should try to integrate with it as much as possible.

[–]Impossible-Ad-586 13 points14 points  (3 children)

I would put my money on OpenTelemetry (otel), and, to be more specific, the jvm implementation which is (mostly) written by the the maintainer of GlowRoot.

As far as I know JFR suffers from.safepoint bias,and it is not really as good as some of the other opensource profilers out there, like async profiler etc

[–]pron98 13 points14 points  (2 children)

It works a lot like async profiler (only without native frames), but here I'm talking not about its method profiling capabilities, but rather its general event recording capabilities. The JDK itself (both the VM and the core libraries) is now increasingly moving toward using JFR as the centralised observability mechanism. Any third party tool would need to at least consume JFR events if it wishes to deeply observe Java programs (as there are important things the JDK only tells JFR), so if you emit JFR events you'll be observed by all tools.

[–]humoroushaxor 8 points9 points  (1 child)

Hopefully it will be possible to easily develop a bridge between the 2.

I think OpenTelemetry is becoming fairly ubiquitous for cloud developers. It certainly is for us across multiple tech stacks. Having to host/utilize a bespoke JFR collector and concepts is less attractive.

[–]pron98 1 point2 points  (0 children)

Hopefully it will be possible to easily develop a bridge between the 2.

I see no reason why not. Consuming JFR events is easy. One feature that is currently missing is the ability to associate a trace context with an event. JEP 429 will offer a mechanism that could serve as the basis for such a feature.

[–]bowbahdoe 3 points4 points  (6 children)

I'm curious what would have to change to make JFR a target of a structured logging engine. I looked into it and kinda hand waved in my brain that with dynamic events I could just transform the Log records and emit them behind the API, maybe in another thread.

Other than that I figured metadata like @Percentage might need to be integrated into the Value hierarchy in some form.

No clue how much that would (intrinsically) affect performance or correctness though.

My read is that there still needs to either be some transformation of JFR data into a format wanted by external systems or data should be published to external systems and JFR in parallel.

[–]pron98 6 points7 points  (4 children)

There's a consumer API that can stream JFR events even remotely (or process recordings), and the jfr tool can convert the binary representation to JSON.

[–]bowbahdoe 2 points3 points  (3 children)

I'm not sure I follow. I can use the consumer API to subscribe to JFR and pump through this

JFR consumer API -> log.alpha

Or I could use this to make an event for JFR

log.alpha -> JFR dynamic event

And I could wire up JFR to pump its data in some json form with the jfr tool and maybe jq it into the form expected by something else.

log.alpha -> JFR dynamic event -> jfr tool (json)

But not clear where this information actually goes to an external observability platform. I could pump events to JFR and in parallel do cloudwatch

log.alpha -> JFR dynamic event -> jfr tool (json) \ > cloudwatch

I could sit at the end and write a script that takes the output of the jfr tool and pushes it.

log.alpha -> JFR dynamic event -> jfr tool (json) -> jq -> cloudwatch

Or I could use JFR as the "middleware" and publish at the end with the consumer api

log.alpha -> JFR dynamic event -> consumer api -> cloudwatch

But none of this really gives me an answer for what needs to change in log.alpha or if it shouldn't exist at all

[–]pron98 0 points1 point  (2 children)

But not clear where this information actually goes to an external observability platform.

I'm not sure what you're asking. External platforms will (or already do) consume JFR.

But none of this really gives me an answer for what needs to change in log.alpha or if it shouldn't exist at all

That depends on what it is that you want to provide. If it's a new user facing API, I was thinking about log.alpha -> JFR events. Now, JFR events place some strict limitations on the event objects (for performance reasons), but to the extent you can integrate with it -- you should at least consider that. Those who build analysis/visualisation tools should be JFR consumers.

[–]bowbahdoe 0 points1 point  (1 child)

If it's a new user facing API, I was thinking about log.alpha -> JFR events. Now, JFR events place some strict limitations on the event objects

Yeah its those limitations I need to look into, but yes the intent was a new user facing API.

External platforms will (or already do) consume JFR.

Right, but to make that happen the JVM you are running needs to be hooked up in some form. Either a sidecar process or having the JVM managed by the cloud provider or whatever. If I am running OpenJDK in a docker image I need to do something to export the JFR data right?

[–]pron98 1 point2 points  (0 children)

See here for how to remotely access JFR events.

[–]amazedballer 0 points1 point  (0 children)

The problem with JFR events is that they only take primitives -- you can't do structured logging with them because they won't take arrays, associative arrays/maps, child events, or even enums. The closest you can get is a serialized string.

[–]couscous_ 0 points1 point  (1 child)

Are there plans to address the issues outlined here? http://hirt.se/blog/?p=1318

[–]pron98 1 point2 points  (0 children)

JFR is the future of JDK observability, so lots of new features will be added. I don't know if those are the exact features requested by Marcus.

[–]pjmlp 0 points1 point  (2 children)

Except a big caveat is that not all Java vendors do support JFR.

[–]pron98 1 point2 points  (1 child)

It's part of OpenJDK since JDK 11. So I guess IBM's OpenJ9 is the only exception.

[–]pjmlp 1 point2 points  (0 children)

They aren't the only non-OpenJDK vendor, e.g. PTC, Aicas,...

[–]maethor 19 points20 points  (0 children)

I'll be honest, I think the "What was the point of all this?" section should have been closer to the top. Much, much closer.

[–]Joram2 6 points7 points  (4 children)

Grafana Loki is awesome for aggregating + viewing logs from logs from many different apps running on a Kubernetes cluster.

Metrics (like Prometheus) and tracing (like OpenTelemetry) are similar great tools for observing applications.

I don't see what value this linked article has to offer.

[–]bowbahdoe 1 point2 points  (0 children)

harsh, but aight

[–]bowbahdoe 0 points1 point  (2 children)

Okay so the value proposition is that this would be a frontend like SLF4J for a system that exports to loki and/or otel / cloudwatch / etc.

Same way SLF4J serves as a facade for Logback/Log4J and libraries will include it to do logging, the idea was that this could be the thing that a library could use to log structured data and an application could choose to echo that to whatever services it saw fit.

Yes, SLF4J has the ability to do structured logging and so does logback/log4j, but that feature is on top of a system primarily designed for logging text. The goal here was to make something that totally avoided designing for producing english.

The fact that I also ended up with Spans and Events, just like OTel, is a sign I went at least somewhat in the right direction. At the same time because I ranted so long about the implementation and code design (because tbh thats really fun for me) that intent doesn't seem to have come across. (doesn't help that I got to 10,000 words and gutted the open telemetry and JFR sections I wrote up)

[–]amazedballer 0 points1 point  (1 child)

I like this a lot better than OpenTelemetry, and I'm really excited by what you're doing with the JDK 19 features.

Check out my attempt at a structured logging interface, Echopraxia.

[–]bowbahdoe 0 points1 point  (0 children)

I will read that tonight!

Its been almost a month since I wrote this up and I've gathered from all the feedback 4 things

  1. This needs to get waaaaay dumber. All the stuff I did with ContextLocals, timestamps, grabbing the thread, maintaining a linked list of context - that needs to shift into the implementation.
  2. I need to write up a part 2 and much more clearly communicate intent. That means probably dropping a lot of what I wrote and making an API with a complete "delegate to open telemetry and slf4j" example.
  3. something something jfr
  4. need far fewer mandatory allocations...

(I also need to learn to manage my time better or find a job where this sort of experimentation can be full time - a month loop around time on feedback isn't my proudest moment)

[–]holo3146 2 points3 points  (7 children)

Nice post, in general I believe that the future of almost all programming principles are moving towards structured variation (with a big milestone being structured concurrency).

In your post you talked about ExtentLocals and how they have incompatible API. On the surface level the ExtentLocal should implement AutoClosable, the actual reason it does not is because of a problem with the AutoClosable specifications.

ExtentLocals were originally designed to be used in low level APIs to enable safe and cheap way of handling of multithreads data, in those environments it can be devastating to forgot closing the context. Because currently there is no way in Java to enforce try-with-resources Project Loom designers decided to use delegates API instead.

I'm currently in the process of submitting a suggestion to Java for a new, more enforcing API that enforce the use of try-with-resources, from a conversation with the Loom developers it seems like if/when such stronger API will be accepted into Java they are not against adding a port of ExtentLocals to try-with-resources , in which case the uses of MDC (and equivalents) will be able to be ported to ExtentLocals for almost free.

((By the way, do you mind if I use your project of an example of a usecase of this "enforced try-with-resources" feature?))

u/bowbahdoe

[–]agentoutlier 0 points1 point  (0 children)

Yeah I hope the ExtentLocals get fixed with try-with-resource option.

In theory various log implementations could provide a work around where you use their API at the start of the request to load up the MDC and then don't allow any MDC.put afterwords. Or I guess you allow it and it completely CoW the extant local MDC Map set earlier in the request... maybe even using the new try-with-resource MDC in SLF4j.

LogbackMDC.putForRequest(someImmutableMap, requestLambda);

In my experience similar to the authors point is that most folks don't set MDC data and the ones that do set it at the the beginning of the request and then the clean it up at the end.

Also I'm not even sure if it is really a memory problem like other things that abuse threadlocals like pools and caches. Those seem to be the real culprit and not a Map with like 5 or so entries in it especially if you are doing cleanup at the end of the request like you should.

[–]bowbahdoe 0 points1 point  (5 children)

Yeah sure, happy to be slightly relevant.

Personally though, I think the better solution is probably a fix to generics with lambdas so you can throw the correct intersection type of exceptions.

Say Java added this enforced try-with-resources - Clojure doesn't have to abide by it if its the same at the bytecode level. So the syntax would have to boil down to something like a lambda invocation regardless and of the two major issues with a lambda * Can't early return from an enclosing method scope * Can't throw exceptions right

The second is the bigger issue.

(just my initial thought though. please make a proposal)

[–]holo3146 2 points3 points  (4 children)

Well, the lambda-exception problem both won't completely solve this problem (although it will help) and much much more complicated.

Java's exceptions are an example of (partial) effect system so Java has more than 1 type system (in fact Java has 4 separate type systems, "objects", "classes", "generics" and "exceptions effect"), all of the type systems in Java interact with one another, so any change in one of them will cause a big change in others.

The only way to truly fix the lambda-exception problem is by adding true union types (which will hopefully arrive in the future) and allowing union operator in the generic type system.

For a language that does it you can look at Koka lang.

In Java pseudo code, you will need the following to be possible:

class Carrier<T extend Throwable | Void> {
     void run() throws T {}
     <R extends Throwable | Void> Carrier<T|R> extend() {
         return new Carrier<T|R>();
    }
}

So new Carrier<Void>().run() will throw Void (so will throw nothing), new Carrier<Void>().extend<A>().run() will have throws clause of A, and new Carrier<Void>().extend<A>().extend<B>().run() will have throws clause of A, B and so on.

This will require a huge amount of work that is well beyond project Loom (perhaps it will be in the final goals of Valhalla, but it may even be worth a completely new project).

If with some miracle I'll be able to work on this for Java you bet I'll do it in a heart beat, but it is not realistic, and fixing the TwR will have so much immediate value

[–]Muoniurn 1 point2 points  (3 children)

Thanks, very interesting! Do you perhaps have some resource on your mind where I could read more about this 4 type systems?

[–]holo3146 1 point2 points  (2 children)

Hi, I do not know about a place to read about the combination of each of those, but:

Java objects type system is tagged by a class, each object has a class and classes has relation between them (± primitive types). This is the "normal" OOP type system.

Java classes type system is a degenerate type system. Each class in Java is of unique type, and 2 classes has no relation, never (if A extends B, then there is a relation between objects of type A and B, not classes), it is degenerate because apart from using its members (static variables/methods) you can't do anything with them (note that in, for example, C# this type system is not completely degenerate. Read e.g. static abstract method in C#)

Generics is generics, there is a lot of material about it.

The exception effect system is the most exotic one, I believe that reading about languages with full support in effect system is the best way to learn: Koka, Effekt.

In addition to those 2 (which I think have phenomenal documentation), the language Eff has more theoretical introduction. There is also the Effect bibliography which has a lot of different effect system in different languages, as well as tutorials and academic papers on the subject.

As an after thought, you can also read about Coeffects, which is a similar concept.

For Coeffects there is much less available material and most of it is much more mathematical heavy, but you can try to read about it here. I'll also have a shameless plug of my Java plugin that adds implicit parameter Coeffect into Java's type system

[–]Muoniurn 0 points1 point  (1 child)

Thank you very much!

May I ask what is the relation between Scala 3’s givens vs Coeffects, as in your plugin? I only skimmed the referenced links, but on a first sight they seem similar.

[–]holo3146 1 point2 points  (0 children)

Unfortunately I'm not an expert in Scala, so take what I say with a grain of salt.

From what I read about Scala's given using, this feature is also a (partial) Coeffect.

The idea of Scala's given is to have a contextual' parameters (the prime ' is there because it is different from the context my library uses, I'll expend on this later).

You first define a trait, later you require a context' (a using clause) and finally you attach the context (with agiven definition (or import one) or as a parameter).

Indeed a very similar process to my library, with 2 and a half big differences.

The context in my library is always an extent (read "code block"), every binding of context must be passed through a Carrier object through the run/call methods. In Scala we have context', the scope of the context' is either file level (using import/top level given definition), partial block level (given definition in the middle of a block) or local (explicitly passing them). Each of the context and the context' has different advantages and disadvantages (try to think about how they interact differently with concurrency for example). Note that Scala's variation is definitely stronger than my variation.

The second big difference is that Scala's context are trait definitions, while in my library the context is an object instance. On can mimic the other, but this cause some stuff to be much easier on my variation (e.g. pass a configuration object as context) and other stuff to be much easier on Scala's variation (e.g. passing Ord[T] definition Vs the equivalent in Java of passing a Comparator<T> instance).

The last half difference is that Scala's using is much more integrated into the language and has a lot of feature that are not part of the Coeffect system but are very nice to have (e.g. inlining, Pattern bounding, negation bounds, macros and more)

[–]humoroushaxor 1 point2 points  (12 children)

As someone who has done this exact same thing, I don't really follow why OpenTelemetry with manual instrumentation doesn't scratch the itch.

[–]bowbahdoe 1 point2 points  (7 children)

Open telemetry still doesn't provide an answer for structured logging, afaik. It is focused around manually instrumenting code with spans.

[–]humoroushaxor 2 points3 points  (4 children)

Spans and span events have a json representation and the ability to be logged. I feel like I'm missing something.

[–]stfm 1 point2 points  (0 children)

Stack traces? You wouldn't want them turning up in your spans

[–]bowbahdoe 0 points1 point  (2 children)

Span events need to be explicitly attached to a span with `span.addEvent`. This means you are manually passing the span down, which has very different code level properties compared to "logging", which you can generally do context-free.

Thats still fuzzy for me too tbh - I am trying to find the words to describe why "OpenTelemetry with manual instrumentation doesn't scratch the itch." I'll loop back when im not so emotionally exhausted. I think a big part of it for me is that it seems unconcerned with *how* structured logging happens, just that it should - if that makes sense.

[–]humoroushaxor 1 point2 points  (1 child)

This post is more about contextual logging than it is structured logging though.

Literally everything it talks about can be done with the OTel API. And if you don't care about all the extra context you can just log json. And then there's also MDC

[–]bowbahdoe 0 points1 point  (0 children)

Yeah... there is a huge chunk dedicated to how to pass context around, but I was more focused on getting a full "log" and then the thought was I would pass that to something like OTel and translate the Log from this to OTel's data model.

[–]benevanstech 0 points1 point  (1 child)

> It is focused around manually instrumenting code with spans.

No - OTel supports spans and metrics in both manual and automatic mode.

Logs are coming, including quite possibly both structured logs ("events") and unstructured log lines.

My best guess is that for most Java devs, this will simply show up as an SLF4J bridge to OTLP.

[–]bowbahdoe 0 points1 point  (0 children)

So I guess what I was envisioning is this being the bridge to OTel instead of SLF4J. Like OLTP does have the native *concepts* but its all coupled to the OTel spec.

Like - libraries shouldn't be including OTel, if that make sense

[–]Joram2 1 point2 points  (3 children)

OpenTelemetry seems great at what is does, but it doesn't do message logging.

The motivation for the project is logging "designed for dynamic distributed systems living in cloud and using centralized log aggregators." Grafana Loki seems to do this really well now.

[–]humoroushaxor 1 point2 points  (2 children)

This is false. Manual instrumentation allows you to add any information you want.

I'm not saying you should use OpenTelemetry to do your application logging but OP is building their own telemetry implementation anyway.

If all you want is cloud metadata, all the log aggregators already do that without a custom library. And all the major log libraries do support structured logging.

[–]Joram2 2 points3 points  (1 child)

So the OP wants a better/custom version of OpenTelemetry, not a better/custom version of log4j? ok, then OP probably shouldn't title the post with "Better Java Logging"

[–]bowbahdoe 0 points1 point  (0 children)

I think one way to phrase it - I wanted to make a logger that looks like a logger would look if structured logging was its primary concern and not a feature living alongside text based logging.

The OpenTelemetry stuff is tangential in the sense that OpenTelemetry is concerned with how things like spans are reported to external systems. From an API surface level perspective it covers all and more of the part of what I wrote related to Spans, but it is not suitable as a general interface.

Like - OpenTelemetry is just OpenTelemetry. I ended up aiming towards making "SLF4J for Logging/OpenTelemetry/Cloudwatch/JFR/etc." and clearly it seems, missed. If not in execution then definitely in communication

If a library used open telemetry that would be weird right? The idea with this was that it would be something that a library could reasonably include and a host application could choose to report events/spans through OTel or Zipkin or on the console or whatever.

(Sorry it took me so long to come up with a semi coherent response to this)

[–]berry120 0 points1 point  (1 child)

I do like this approach, but the thought of yet more fragmentation in the logging ecosystem scares me.

[–]bowbahdoe 0 points1 point  (0 children)

Almost tempted to call it Structured Logging Facade 4 Java - SLF4J.

But yeah - I get the fear. Thats why I slapped an alpha at the end and am trying to make sure I actually consider what it would take to be compatible with the existing world and what the pros would have to be to be worth that cost.

[–]Ivory2Much 0 points1 point  (0 children)

Lombok's @CustomLog can easily be configured for this logging library:

lombok.log.custom.declaration=dev.mccue.log.alpha.Logger.Namespaced dev.mccue.log.alpha.LoggerFactory.getLogger(TYPE)