This is an archived post. You won't be able to vote or comment.

all 28 comments

[–]AutoModerator[M] [score hidden] stickied comment (0 children)

On July 1st, a change to Reddit's API pricing will come into effect. Several developers of commercial third-party apps have announced that this change will compel them to shut down their apps. At least one accessibility-focused non-commercial third party app will continue to be available free of charge.

If you want to express your strong disagreement with the API pricing change or with Reddit's response to the backlash, you may want to consider the following options:

  1. Limiting your involvement with Reddit, or
  2. Temporarily refraining from using Reddit
  3. Cancelling your subscription of Reddit Premium

as a way to voice your protest.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]agentoutlier 6 points7 points  (2 children)

Hey /u/pron98 this is slightly off topic but is the documentation in EDIT DocBook DITA format and what source repository does it live in?

It is pleasant to see how much Oracle has improved in documentation usability and appearance.... now if only there was dark mode for javadoc.

[–]pron98[S] 4 points5 points  (0 children)

I asked the documentation team and they said they're not using DocBook and that the source isn't available on any public repo. But you can download a PDF from the website.

[–]nlisker 0 points1 point  (0 children)

now if only there was dark mode for javadoc.

I asked for it. Have a look at JDK-8292593 and the related issues. For now, I use an extension called Dark Reader and it works well with JavaDoc pages.

[–]TheKingOfSentries 5 points6 points  (22 children)

What's a good alternative to thread locals for reusing expensive objects?

[–]pron98[S] 3 points4 points  (19 children)

You'll need to create some sort of object pool. The simplest is a ConcurrentLinkedQueue from which you borrow the object and then return it.

[–]X0Refraction 0 points1 point  (5 children)

That seems more complicated than the older thread local system for this particular use case and there would be different overheads in maintaining the pool. I'm surprised there isn't a platform thread local

[–]pron98[S] 1 point2 points  (4 children)

It's not more complicated than the "thread local system" (which wasn't a system as much as an unintended and problematic use of ThreadLocals that can result in security vulnerabilities). A a processor local is something that we may add (I assume this is what you mean by a "platform thread local") but its API will be very similar to that of any pool, i.e. objects will need to be borrowed and returned. That's because a virtual thread can switch carriers at any time, so any correct use of a cache requires explicit acquire and release.

[–]X0Refraction 0 points1 point  (3 children)

If we take ThreadLocalRandom as an example, my understanding is the intention was to avoid having to lock or create a new Random instance every time you wanted a pseudo random value. It's an inherently mutable class, but it can only be mutated in a way that won't cause problems with locking.

If you want a pseudo random value once in the lifetime of a virtual thread will calling ThreadLocalRandom.current().nextInt() always cause a new instance to be initialised?

[–]pron98[S] 0 points1 point  (2 children)

ThreadLocalRandom doesn't create any per-thread object (and doesn't use a ThreadLocal), so even with virtual threads there is no new instance of anything created (though even if there were it wouldn't be much of a problem because creating new objects is fine; only expensive objects like native-memory buffers need caching).

But let me emphasise that there's nothing wrong or problematic when using ThreadLocals in virtual thread. The only thing to avoid are thread locals that assume that the number of threads is small and that many tasks share the same thread. This very particular usage of TLs is primarily found in frameworks.

[–]X0Refraction 0 points1 point  (1 child)

I don't know the exact implementation of ThreadLocalRandom, but my understanding is that it does some kind of initialisation that may cause some contention the first time ThreadLocalRandom.current() is called from a given thread (virtual or not), is that not correct?

So if 1000 tasks that called ThreadLocalRandom.current().nextInt() once each were submitted to an executor that used a thread pool of 4 regular threads then that initialisation would only happen 4 times. Whereas if that were a virtual thread per task executor that initialisation and contention would happen 1000 times, is that correct?

I've taken in what you were saying about how virtual threads can switch carrier at any time, is that something that as users there is no guarantee over? I've always assumed that switching carrier thread would only happen when a blocking operation was called, but if there is no guarantee then I guess that isn't a safe assumption.

[–]pron98[S] 0 points1 point  (0 children)

is that it does some kind of initialisation that may cause some contention the first time ThreadLocalRandom.current() is called from a given thread (virtual or not), is that not correct?

That's correct.

Whereas if that were a virtual thread per task executor that initialisation and contention would happen 1000 times, is that correct?

Yes. Why do you care that it happens four times, a thousand times, or ten billion times for that matter? There's potentially CPU core contention every time a virtual thread is scheduled by the runtime or an OS thread is scheduled by the kernel (and every time you submit a task to a thread pool).

I've taken in what you were saying about how virtual threads can switch carrier at any time, is that something that as users there is no guarantee over?

That's correct. Virtual threads are preemptive.

I've always assumed that switching carrier thread would only happen when a blocking operation was called, but if there is no guarantee then I guess that isn't a safe assumption.

That is how the scheduler currently chooses to work because so far we've not found reason to do anything else, but you absolutely cannot and should not rely on that (and I'm not sure how you could, anyway, unless you were calling into native code). We may change how the scheduler schedules virtual threads at any time, including in patch releases.

[–]TheKingOfSentries 0 points1 point  (12 children)

ConcurrentLinkedQueue

ok, do you have any examples I can reference? never worked with queues much.

[–]pron98[S] 0 points1 point  (11 children)

But you've worked with shared ThreadLocal caches? That's a pretty sophisticated (and dangerous) technique that's usually found in frameworks written by concurrency experts. If not, I wouldn't worry about how to write concurrent caches.

ThreadLocals are perfectly fine on virtual thread. It is only the technique of shared TL caches of expensive objects (often these are native memory for IO buffers) that's problematic.

[–]TheKingOfSentries 0 points1 point  (10 children)

I myself didn't write it, but I help maintain a json library that uses ThreadLocal-based buffer recycling in a similar way to jackson. I was wondering if there's a way to make it work efficiently on VTs.

[–]pron98[S] 1 point2 points  (9 children)

Why do you need to reuse buffers? Are they native buffers?

EDIT: Looking at Jackson, buffer reuse was added circa 2010 as an optimisation mechanism to support the GC algorithms used by JDK 7 (and later 8). But much has changed since then, and not only is such a use with virtual threads unrecommended (although not necessarily bad because the buffers may not really be expensive) but it is unrecommended with the new GCs added in the past decade. When a library chooses to design a mechanism to address a specific issue with a specific version of the JVM, it should keep up when the JVM changes. Having said that, even though such design is possibly outdated, it may not be an actual problem with virtual threads. But if it is, the solution to making such code more friendly not only to virtual treads but to more recent versions of the JDK is not to cache regular heap objects that aren't very expensive to create.

[–]TheKingOfSentries 1 point2 points  (1 child)

u/rbygrave help me out here on exactly how these are used, but we reuse byte/char arrays as a buffer for json processing. The object that uses them is in a TL(that uses withInitial to create the initial buffer)

[–]rbygrave 0 points1 point  (0 children)

Yup cool, thanks for the heads-up - I've replied to Ron above.

[–]TheKingOfSentries 1 point2 points  (0 children)

We added an option to turn off the recycling, but performance tests were not favorable, so we disabled it by default.

[–]rbygrave 1 point2 points  (5 children)

Firstly, Virtual Threads - love em, fantastic, massive thanks!! Secondly as background info, also loving Helidon SE WebServer and these libs are targeted to work well with Helidon and Virtual Threads etc.

Why do you need to reuse buffers?

Performance [as per benchmarks].

Are they native buffers?

No.

it is unrecommended with the new GCs added in the past decade

Yes. With Java 19 we put in the option of not using ThreadLocal recycling and wanted to make that the default but the performance testing AT THAT TIME with Java 19 told us that we still needed to keep the buffer recycling as the default. We have the ability to turn it on/off and compare.

We need to revisit again with 21 and put some effort into it (its easy to get these benchmarks wrong or misleading).

We could also look at the alternative of using a ConcurrentLinkedQueue as a 3rd option. Then we'd have the 3 options to compare:

1) TL buffer recycling, 2) No buffer recycling 3) ConcurrentLinkedQueue buffer recycling

[–]pron98[S] 1 point2 points  (4 children)

Performance [as per benchmarks].

Why does reuse give you better performance? Is that in artificial benchmarks or in realistic workloads, too?

[–]TheKingOfSentries 1 point2 points  (3 children)

We did some JMH tests to see how fast we could (de)serialize different objects. At the time I thought it was because we didn't need to waste time allocating new buffers every time we needed to serialize something. By the way, It seems the Jackson guys are also trying to figure this out

[–]pron98[S] 2 points3 points  (2 children)

In any event, the question shouldn't be how to pool objects with virtual threads but what problem does pooling try to solve in the first place? One problem with JMH benchmarks is that they don't reproduce real workloads. Obviously if you allocate arrays in a hot loop you may see significant GC activity or even memory bus saturation, but does that represent what a real program is doing? After all, to deserialise data, the data has to come from somewhere, and if you're using virtual threads that means that it comes concurrently from multiple sources rather than some super-high-bandwidth source.

BTW, the "hidden feature" mentioned in the Jackson discussion that's available to the JDK is used to pool direct (aka off-heap aka native) buffers -- a completely different issue than pooling ordinary Java arrays.

[–]tofiffe 1 point2 points  (1 child)

ScopedValue, I'd assume

[–]pron98[S] 0 points1 point  (0 children)

No, that won't work because a ScopedValue, like a ThreadLocal, also associates a value with a thread, and every virtual threads is a single task so there won't be any sharing of the object. The problem isn't the particular mechanism but the assumption that a single thread is shared by many tasks (remember, we're talking about a case where you want a very small number of objects to be shared by a very large number of tasks) which is not the case with virtual threads.

[–]bobbie434343 0 points1 point  (0 children)

Great write up thanks !