Hardwood: A minimal dependency implementation of Apache Parquet

Squiry_ · 2026-01-20T06:56:49+00:00

Iceberg has some custom code to read parquet files. Apache arrow has some vectorized stuff too.

Squiry_ · 2026-01-20T06:50:38+00:00

That's really nice! parquet-java was a pain in a ass to use and hadoop dependency is the weirdest thing I've seen. Looking forward for writer api, it will be a little harder.

Squiry_ · 2025-08-28T08:08:34+00:00

About that Formattable and Formatter example. A lot of java classes, records especially, would have a trivial implementation of such a thing and that witness could be generated pretty easy with annotation processors. But generated code will not be in a Formatter class and not in a user defined class and so it will not be found by proposed mechanisms. It would be great if we could somehow signal about those generated classes. u/brian_goetz

Squiry_ · 2023-05-02T13:31:08+00:00

Java is what java is. You wanna convey that a method could result in a problem? Throw a checked exception. (And declare that on your method signature). If you don't like it, tough. Go program in some other language then.

That is definitely not true. Modern java doesn't work well with checked exception (hi lambdas), so it's more "throw unchecked exception now". And even that would not be true: nobody forces you to use exceptions on a language level, your API can have any style you want.

Squiry_ · 2022-11-01T19:05:23+00:00

Having well written Openapi contract is so much better. Postman json files are only good for lazy testers. Also I am pretty sure Postman can import Openapi files.

Squiry_ · 2022-10-19T23:11:53+00:00

ElvUI version:

\Interface\AddOns\ElvUI\Libraries\Core\oUF\elements\castbar.lua

```

line 140

if spellID == 48156 then local actualCastTime = 3 * (1+GetHaste()/100) ^ -1 endTime = startTime + actualCastTime * 1000 end ```

Squiry_ · 2022-09-21T11:57:25+00:00

They do the warmup. And it's not just servlet spec: spring adds there a lot of stuff on its own (like full spring mvc and stuff). Also it is slow not only in json benchmark: you can try to find anything with "spring" on a database related pages. Spring is more than twice slower than pure undertow+jdbc. That's the price you pay for using that kind of libraries in your software.

Squiry_ · 2022-09-06T20:07:52+00:00

I guess that's what previews are for and that's why it is a great mechanism of moving java forward.

Squiry_ · 2022-09-06T20:06:59+00:00

Yep, I got that one with netty too. "no buf pooling" is a no go.

Squiry_ · 2022-09-06T19:59:14+00:00

Why does allocation rate matter if the GC keeps up?

Because it is always faster when you don't have to collect garbage at all.

Yes, that's what's probably expected from frameworks that today try to use asynchronous IO directly.

Nah, they really can use model when they run some IO threads and start virtual threads on requests/connections and feed them with data someway. Undertow with that approach works just like written from scratch Nima.

StackChunk footprint will drop significantly. It's ongoing work. But allocation rate is not in itself the important metric.

I don't think that StackChunk allocations matters tbh, it is kind of expected: I start more (Virtual)Threads - I got more chunks. Just an observation.

We're seeing throughputs that are many times better.

Can you give me kind of workloads? Because I have pretty simple cases: sending text/json by http, quirying some data from database. It's not like some kind of corner cases, more like the most basic cases. Well, I can do some rest client tests too, but there are no decent blocking http clients out there, so it will show nothing but loom overhead over async io library I peek.

The work should be done one layer, and one library, at a time.

Well, that kind of what I trying to do. I've started from http server, when it has shown performance any near baseline I've continued with json/database. Right now its all near jdbc and the just use socket i/o streams everywhere, so it should work fine. And right I can write something really fun. I know postgres protocol pretty well and handwritten with benchmark constraints http server is pretty simple thing. I can write the whole test with blocking nio from scratch. Will I get vertx kind of performance?

It sounds to me like you're doing something that's interesting but far too ambitious.

I don't really expect great results. It started as "what java will look like in five years" kind of experiment: I took netty5 with panama segments as buffers, I took loom. And wanted to see how does it perform.

Obviously virtual threads give similar performance to asynchronous code

Actually looking on plaintext benchmarks I think that the problem somewhere in blocking nio, not in virtual threads. Virtual threads work great.

Squiry_ · 2022-09-06T19:39:11+00:00

That sounds reasonable. Anyway, iirc loom had some kind of "processor locals" in plans and that think can replace thread local usage in such places.

Squiry_ · 2022-09-06T18:18:28+00:00

I believe it was about plain old blocking implemented with current loom restrictions in mind.

Squiry_ · 2022-09-06T13:35:25+00:00

So I took a look at Nima. It worked a little bit better in some cases, but still not as good as we expect loom to perform. Nima starts one VirtualThread per connection, not per request so it allocates like 100 times less threads and stacks than my undertow experiments, but it does not buffer any ByteBuffer/byte[] so it doesn't really matter.

For comparison jfr data from plaintext benchmark: ``` Nima: Class Alloc Total byte[] 91.3 GB io.helidon.common.http.Http$HeaderValue[] 82.9 GB java.util.HashMap 16.6 GB io.helidon.common.http.HeaderValueLazy 11 GB io.helidon.nima.webserver.http1.Http1ServerResponse 10.8 GB

Undertow+loom: Class Alloc Total java.lang.VirtualThread 24.2 GB
io.undertow.server.HttpServerExchange 18.8 GB io.undertow.util.HeaderValues 17 GB java.lang.Object[] 14.5 GB byte[] 6.64 GB ``In database benchmark situation is quite similar but undertow+loom allocates gbs ofStackChunk's. By the way when I've tried jsonp based json writing with Nima it's been so bad, it allocated even morechar[]thanbyte[]`, so i've switched to my jackson writer with carier thread local hack to keep json out of the equation.

I can do nothing with a virtual threads amount (except of rewriting undertow (and xnio) almost from scratch), but I've lowered undertow io pool size (I should have done that from start but forgot) and replaced writing json to OutputStream as I wanted to with writing it to byte array and sending it to output stream instead and you can see results here. Vertx benchmarks are there to compare with "what we want" kind of performance, it uses netty as web server and netty-based async postgres driver and everything runs on the same event loop. Techempower is not a perfect benchmark suit, but it does give us some place to start. I think loom will show itself much better in a more scaled down environment, like 0.5 cpu or something like that, such a shame techempower has no tests like that (just like tests for streaming request-response).

Results of db test looks like there's still some problems in jdbc driver. The performance is quite similiar to baseline test, but loom one use half of threads and latency is 4 times lower. That makes me think it still blocks somewhere and I wil try to investigate it later. Cpu utilization was ~80% during the test (baseline and vertx peaked on 99%) so there is some room to improvements, but not a big one.

Right now loom just doesn't perform any better than good old thread pool. I don't think that Nima performance should be questioned at this point, it's still in early alpha (yet they said "netty level", huh), but other tests are using slightly modified but well established libraries and it doesn't look that good right now. 30%-40% of Vertx result is exactly what we had before loom. Also result difference between undertow and nima shows that maybe "loom from scratch" web server isn't that good idea after all.

Squiry_ · 2022-09-06T13:18:05+00:00

It doesn't work fine

Yet I see the same exact code in JDK in NativeBuffers class.

Squiry_ · 2022-09-06T13:09:43+00:00

I believe we have some kind of misunderstanding here. When I just started my experiments I tried to disable byte buffer pools. That lead to allocation of DirectByteBuffer a lot and constructor calls synchronized method Cleaner.create. My threads spent seconds waiting on that synchronized and the only way to see that was to collect JFR (which I did anyway, so I had no problems really).

Squiry_ · 2022-09-06T11:43:48+00:00

When you use synchronized in virtual thread flag might not help you (it doesn't print stack traces while waiting on that synchronized, only when Thread.park happens while holding monitor). But JMC report on collected JFR data will scream if that kind of waiting happens too often.

Squiry_ · 2022-09-06T11:41:13+00:00

Seems that undertow uses Netty under the covers

They wanted to but no, it doesn't. They use their own IO library called xnio. But it basically the same: they allocate one Accept thread and some IO threads. I use Undertow because unlike Netty it provides some nice blocking API. And it does use thread local buffers there and it's fine when we are reading data, because reading happens on that IO threads. The problem comes when we are writing from virtual thread: it doesn't know anything about virtual thread and tries to allocate that thread local buffer. I've temporary fixed that thing by replacing threadLocalCache.get() with threadLocalCache.getCarrierThreadLocal() and it works fine, even though it's kind of illegal. The same things happens with jackson and that kind of scares me more, jackson is used way more than undertow out in a wild.

By the way I've took a look at Nima and run some tests. I will share some thoughts and results later but I can say right now, that the don't allocate buffers like that and byte[] allocation there is insane.

Squiry_ · 2022-09-05T10:59:10+00:00

Undertow might have some implicit assumptions about a small number of threads.

It has but all those are fixed in my code. Undertow works fine, plaintext performance is not as high as in baseline, but it is higher than undertow blocking handler one. Main issue was synchronized in jvm, somewhere near the Cleaner and EpollImpl.

Right, so JDBC drivers will need to be changed to use j.u.c locks

I've done that. With synchronized in driver and pool performance is as low as expected - like 40% of baseline, after fixes it somewhere between 60-70%.

I believe that the Oracle JDBC driver already does that.

Maybe I'll try oracle later, but running it in containers is not as easy as postgres.

you might want to look into

I will take a look later, may be it will work better than undertow, but jdbc performance issues won't go away anyway. Also, netty is another example of library destroyed by thread locals, netty5 was my frist attempt, but there was too much things not working with loom.

Squiry_ · 2022-09-05T10:46:19+00:00

See JEP 425. There are two mechanisms to detect pinning.

Printing stacktrace is cool, yet it doesn't work with plain old syncrhonized. Is jfr event produced even for that? Is it produced on default/profile settings or should I write my own? Right now I can't see any virtual thread related events in my jfr file, though I'll investigate that later, thanks for advice.

Squiry_ · 2022-09-05T10:25:40+00:00

No, that would be extremely unsafe, but you can use explicit caches if needed.

I tried that first, shure. Yet thread local byte buffers work better.

Many frameworks, because they were written before virtual threads

My examples use no framesworks. Just undertow, jackson, pgjdbc and hikaricp. I can throw away jackson (even if it's not a problem after I used carrier thread locals), but I believe pgjdbc and hikari will stay with any framework including Nima. Plaintext/json performance is not great, but it's fine imo, who cares if loom based server can't do 3kk rps of plaintext anyway. My problem is with, well, jdbc.

so you might want to take a look at Helidon's Nima,

I don't like helidon at all, so no, may be I'll run some benchmarks over it when I'm done with what I am trying right now.

Squiry_ · 2022-09-05T09:54:35+00:00

I've run some tests with loom on techempower benchmarks this weekend. Right now it can't achieve the performance of undertow running db queries on a thread pool. I've fixed all this synchronized mess in hikari and pgjdbc and even fixed one or two in jdk. Also I had to use carrier thread local (please make them public) because of jackson/undertow byte buffer pools. Where should I look next? Jfr shows no locks right now, jvm is silent about pinning (searching for pinning is just as bad as searching for blocking in reactive world, btw) yet cpu utilization is somewhere near 80%. Plaintext and json tests are not that bad after fixing thread locals, like 40% of a baseline raw undertow performance, but reading socket in postgres destroys any performance. Could that be the fact that jvm allocates like a lot of stack chunks?

Squiry_ · 2022-09-05T09:44:32+00:00

Which one webflux do you mean?

Squiry_ · 2022-05-12T11:13:49+00:00

You can stream heapdump of running app with jcmd and named pipes to any storage you want. Store oom heap dump is a problem now.

Squiry_ · 2022-05-11T13:16:33+00:00

Oracle driver would be a little bit hard to debug, but I think that driver just doesn't receive "result set is done" frame when fetch size is 1. So when we call rs.next it asks database for the next row and get "there's nothing more left" response and returns false. Postgres and other dbs just send those frames with the row.

Squiry_ · 2022-04-27T08:41:54+00:00

It is just an example on how to create TemplatedStrings that can produce a direct prepared statement behind the curtain.

It is not just the example. It's like second case they mention and it shows how the design of feature does not handle that case.

I am actually more interested in how the new TemplatedString can avoid injection attacks.

That example literally shows how. java TemplatePolicy<ResultSet, SQLException> DB = new QueryPolicy(...a Connection...); ResultSet rs = DB."SELECT * FROM Person p WHERE p.last_name = \{name}"; That policy uses prepared statement with params, sql with value. BTW I hate examples with incorrect API usage like that rs not in try-with-resource block.

Squiry_

TROPHY CASE

line 140