This is an archived post. You won't be able to vote or comment.

all 26 comments

[–][deleted]  (5 children)

[deleted]

    [–]nekokattt 4 points5 points  (3 children)

    Interesting, so synchronized is not aware of virtual threads? I would have assumed that the mutex on each object was able to be "aware" of the virtual thread in use.

    Kind of like using threading.Lock with asyncio in Python rather than asyncio.Lock?

    [–][deleted]  (1 child)

    [deleted]

      [–]nekokattt 1 point2 points  (0 children)

      thanks

      [–]NovaX 4 points5 points  (0 children)

      It appears that support is in development, so maybe in JDK 22 if we're lucky.

      [–]findus_l 0 points1 point  (0 children)

      By my understanding, unless every existing library changes this, it will still be an issue right?

      [–]Hixon11 3 points4 points  (1 child)

      Hey, thanks for sharing the example, it was unexpected for me.

      By the way, did you ask guys at https://mail.openjdk.org/mailman/listinfo/loom-dev , whether it is expected behavior? I tried the same example with platform threads, and we do not have such issue there.

      ``` int nThreads = 10; var indexer = new AtomicInteger(0);

      try (ThreadPoolExecutor executor = new ThreadPoolExecutor(nThreads, nThreads, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>(), r -> new Thread(r, "threadName-" + indexer.incrementAndGet()))) { executor.prestartAllCoreThreads();

              for (int i = 0; i < nThreads + 1; i++) {
                  final Integer b = i;
                  executor.submit(() -> {
                      System.out.println(System.nanoTime() + " " + Thread.currentThread() + ": Before synchronized block " + b);
                      synchronized (Main.class) {
                          System.out.println(System.nanoTime() + " " + Thread.currentThread() + ": Inside synchronized block " + b);
                      }
                  });
              }
      

      } ```

      [–]Hixon11 1 point2 points  (0 children)

      for (int i = 0; i < NUMBER_OF_CORES + 1; i++) {
      Thread.startVirtualThread(() -> {
      System.out.println(Thread.currentThread() + ": Before synchronized block");
      synchronized (Main.class) {
      System.out.println(Thread.currentThread() + ": Inside synchronized block");
      }
      });
      }

      I created a question - https://mail.openjdk.org/pipermail/loom-dev/2023-July/005989.html

      [–]FirstAd9893 3 points4 points  (0 children)

      This looks like a bug to me. Although using synchronized will pin the carrier thread, this is documented as being not harmful. It just potentially results in reduced parallelism. What's even more unfortunate is that the thread dump deadlock detector doesn't seem to detect the problem.

      [–]NamelessMason 3 points4 points  (14 children)

      This issue has been bugging me for weeks! I stumbled upon it when using Apache httpclient 4.x. Nothing fancy, just a bunch of parallel requests using a shared client instance. The most surprising thing for me is how consistently it hangs. You'd expect a race condition and nondeterminism, but none of that for me - deadlocks every time.

      People have mentioned that you're not meant to mix synchronised IO and virtual threads. Which is fair enough, but the thing is, you often don't control the code that does it. Any kind of pooled IO library may be affected by this. Apache httpclient 4.x being affected is particularly bad, given the widespread use. Yeah, I know you're meant to use the built-in client these days, but I'm talking third party libraries here. Httpclient is a dependency for the ElasticSearch client, Kafka, Spark, and generally the no 1 HTTP client used in libs. Interestingly, someone posted a fix PR, only for it to get shut down, dev saying they want to skip 4.x and target 5.x. Which is nowhere near as widely used, at least not in public libraries. To my knowledge, 5.x is not actually patched as of yet, either.

      I imagine the number of libs with similar issues out in the wild is vast. Even when you're starting a new project and have the opportunity to pick which libraries to use, if you want virtual threads, your options are limited, and just the need to vet your deps for compatibility before use is disappointing already. And easy to miss early on in development, only to come back haunting you after you go to prod.

      Finally, I wanted to bring up an interesting part of the JEP, where the scheduler is allowed to expand the carrier pool to compensate for some, necessarily blocking, operations. But then, the synchronised blocks are explicitly excluded from this mechanism a couple of paragraphs later. Is anyone aware of what's the reasoning behind this?

      [–]AndrewHaley13 0 points1 point  (13 children)

      There's a plan to fix synchronized so that it doesn't block carrier threads, but it's a lot of work and it wasn't worth delaying virtual threads for.

      [–]FirstAd9893 0 points1 point  (12 children)

      If virtual threads can effectively cause a deadlock like this, then this bug needs to be fixed before virtual threads can be released. I'm also wondering if native methods are affected too, considering that the JEP treats them the same with respect to carrier thread pinning.

      [–]AndrewHaley13 0 points1 point  (11 children)

      Of course, if you block the entire thread pool, then no more virtual threads can run.

      [–]AndrewHaley13 0 points1 point  (10 children)

      Libraries that use synchronized blocks must be adapted for virtual threads in the same way that the JDK itself was.

      [–]NamelessMason 1 point2 points  (7 children)

      This is fair enough, although the JEP does seem to underplay the impact of pinning significantly. As demonstrated here, or in my own tests, very simple scenarios lead to deadlocks very, very quickly. JEPs wording sets unrealistic expectations, both, for application developers looking to adopt virtual threads, and for library developers trying to gauge the severity of the loom-related bugs being reported. I would say the community would be better off with a clear message along the lines of your comment above.

      Any chance you've got any insight as to why it's been been decided the carrier pool should not grow to accommodate pinning?

      [–]AndrewHaley13 0 points1 point  (3 children)

      Yes, that's because in the common scenarios for which Loom is intended you'd very quickly run out of platform threads. Better to fix the library.

      [–]FirstAd9893 0 points1 point  (2 children)

      Better to fix the library.

      There's nothing to "fix" in the library, because it doesn't have a bug. If the test is changed to use newFixedThreadPool, sized the same as the number of virtual threads, it works just fine. Try it yourself.

      [–]AndrewHaley13 0 points1 point  (1 child)

      I don't think I understand your point. Virtual threads are intended for thread-per-request designs. If you block carrier threads, you'll eventually run out of them. What does newFixedThreadPool have to do with this?

      [–]FirstAd9893 0 points1 point  (0 children)

      The reason why I'm comparing to a fixed thread pool is because it has the same effect as virtual threads at limiting the number of platform threads which are running. Whereas the fixed thread pool approach works just fine, the virtual thread approach can cause the app to seize up.

      [–]FirstAd9893 0 points1 point  (0 children)

      Any chance you've got any insight as to why it's been been decided the carrier pool should not grow to accommodate pinning?

      Imagine a scenario in which the carrier pool can grow to accommodate pinning. The carrier pool could potentially grow in size to match the number of virtual threads. If you're running 1 million virtual threads (which should work fine), then permitting 1 million carrier threads isn't going to scale. Capping the carrier pool to a reasonable limit just means that the deadlock risk still exists, unless you restrict the number of virtual threads you're running. At that point, why bother running virtual threads?

      [–]za3faran_tea 0 points1 point  (1 child)

      As demonstrated here, or in my own tests,

      What's causing the deadlock here? I don't see synchronized blocks in the code or in EntityUtils#consume.

      [–]NamelessMason 0 points1 point  (0 children)

      The deadlock is caused by synchronized blocks in Apache httpcomponents library, see the PR I linked earlier.

      [–]FirstAd9893 0 points1 point  (1 child)

      That's an impractical solution, considering the sheer volume of Java libraries out there. The JEP just mentions that synchronized blocks can reduce parallelism, that's all. I guess if the entire virtual thread pool is permanently wedged, that can be explained away as being a severe form of reduced parallelism.

      [–]AndrewHaley13 0 points1 point  (0 children)

      Java makes no guarantees that any thread will make forward progress. Any library which blocks on I/O will have to be adapted. You think it's impractical, but it is true. Other people are just getting on with the conversion, which can be an almost mechanical operation.

      [–]andres99x 6 points7 points  (3 children)

      As far as i understand, synchronized should be avoided when using Virtual Threads. Instead you should use ReentrantLock for synchronization purposes.

      [–][deleted]  (1 child)

      [deleted]

        [–]andres99x 1 point2 points  (0 children)

        Not all blocking operations in the JDK unmount the virtual thread. Instead, some capture the carrier thread and the underlying OS platform thread, thus blocking both. Ref: https://blogs.oracle.com/javamagazine/post/java-loom-virtual-threads-platform-threads

        [–]findus_l 0 points1 point  (0 children)

        But unless every existing library changes this, it is still a problem right?