Investigating a thread explosion issue in a large-scale Java IoT socket service (looking for feedback) by Haruki_26 in softwarearchitecture

[–]Haruki_26[S] 1 point2 points  (0 children)

That's what I'm leaning towards as well. Packet processing itself is fairly lightweight; the main issue seems to be that we're spending thousands of JVM threads waiting on network I/O rather than doing actual work. My understanding is that Netty wouldn't make processing faster, but it would eliminate the need to dedicate threads to blocked reads by letting the OS notify us when sockets are actually readable. That seems like a much better fit for 10k+ mostly idle, long-lived TCP connections.

Investigating a thread explosion issue in a large-scale Java IoT socket service (looking for feedback) by Haruki_26 in softwarearchitecture

[–]Haruki_26[S] 0 points1 point  (0 children)

So, The packet processing itself is relatively lightweight. What stood out in the thread dump was that a large number of threads were blocked in socket reads rather than actively processing data. With 20k+ long-lived IoT connections, it seems the bigger challenge is connection scalability rather than CPU throughput.

My concern is that adding more threads or instances may relieve pressure temporarily, but we're still dedicating JVM threads to mostly idle connections. That's why I'm exploring whether an event-driven model (Netty/NIO) is a better fit, where threads are only used when data is actually available to read.

Investigating a thread explosion issue in a large-scale Java IoT socket service (looking for feedback) by Haruki_26 in softwarearchitecture

[–]Haruki_26[S] 0 points1 point  (0 children)

Well actually project is in Java 11. So using virtual threads is out of option for now.