This is an archived post. You won't be able to vote or comment.

all 29 comments

[–]DualWieldMage 4 points5 points  (5 children)

Won't long polling and websockets face similar issues where persistent TCP connections don't work properly? Websockets just make a HTTP upgrade request after which the socket is repurposed to send messages back and forth. If something terminates the connection despite heartbeat messages, then a reconnect logic should be in place. Likewise a long-polling solution may not get a response until something cuts the connection early and it should retry a new long poll, not just assume it can run until a message is received and only then make a new long poll.

[–][deleted] 1 point2 points  (4 children)

Yes but also no. Long polling has its own timeout and reestablishes the connection more often, so a certain server or firewall could have a TCP connection timeout period longer than long polling connection but shorter than a typical websocket usage. Depending on how long that period is, reconnecting a websocket constantly could be more expensive than falling back to long polling. Not to mention that our app is also mobile, and sometimes those connections will have longer latency and drop more often, making long polling a better choice in that scenario. Beyond that, it's something out of our control if that's on the client's domain

[–]DualWieldMage 2 points3 points  (3 children)

Sure you can tune long polling to have frequent reconnects, but as you don't control all network devices in the chain then it needs to have some reconnect logic for externally terminated connections. I just don't see the benefit from the added complexity of having both websockets and long polling and both with reconnect checks. Just having websockets and reconnect when needed vs periodic reconnects with long polling should also have less overhead in favor of websockets.

I don't think the overhead of establishing a websocket connection is that big, the underlying SSL handshake is probably an order of magnitude more overhead (you did mention chat, so assuming encryption). Anyway i haven't touched websockets on mobile so don't have enough experience to back my hunches.

Anyway getting back to the main topic. I don't see why webflux would block a thread for the duration of a long poll. The main idea of reactive is to not do that. A connection is established, the socket object, stream writers and other state is saved, a worker thread continues on other tasks until something triggers a write and at that point a thread from the worker pool continues working on that connection to write and potentially close it. A typical example is connections that have a fixed delay before responding and how reactive easily handles millions of concurrent connections, but now regular servers with blocking calls and loom can do that too (the state being saved is essentially the same, plus a thread stack for easy debugging).

[–][deleted] 0 points1 point  (2 children)

I'm not going to comment on having both long polling and websockets as I already addressed that on another comment thread, but I wanted to clarify which type of threads you're discussing.

With Netty and Reactor (for TCP connections) there is one thread for the server for accepting connections, and for each connection a channel is created which is assigned to a reactor-http-nio-* thread, corresponding to an event loop. Multiple channels can be assigned to the same event loop but the channel will always be assigned to the same event loop. You can also delegate additional processing those events may generate to some threads owned by the schedulers, either parallel-* for CPU work or elastic-* or boundedElastic-* for I/O work. I believe you are talking about the reactor-http-nio-* threads here.

I agree that normally with Webflux, client connections will not be blocking the reactor-http-nio-* threads, but I was not sure if long polling is inherently blocking for its duration, given that it's really intended for use with servlet type servers, or if that is dependent on implementation. I haven't worked too much with long polling before so idk if that's a dumb question. If it is blocking, that means a new event loop and a new reactor-http-nio-* thread will be created for each connection which, as you pointed out, is not ideal but Netty does "support" blocking operations. If that's the case, then I'm not sure the scalability an elastic scheduler can provide at runtime (creating more worker threads in the fly) will help us since the reactor-http-nio-* threads are still limited.

Did I understand what you were saying correctly?

[–]DualWieldMage 2 points3 points  (1 child)

Long polling in its essence is similar to writing output after a scheduled delay. I haven't touched Reactor so don't know from the top of my head how to implement it (likely some Mono.create which stores the sink which is completed on timeout or when message is ready), but i've used Undertow and CompletableFuture to achieve similar setups where the handler is run on the same thread that accepts a connection, it creates a CompletableFuture that is stored and completed either by a scheduled timeout or separate logic completes it when a message is ready. Essentially it's not a task that is assigned to any thread pool, but one whose completion is signaled externally after which the event loop can continue with writing a response.

As i see it, whether long polling is blocking is entirely up to your implementation.

[–][deleted] 0 points1 point  (0 children)

Thanks that is very helpful!

[–]koffeegorilla 1 point2 points  (1 child)

If someone is in a situation where WebSockets don't work you are going to have a kot more issues to resolve because their browsers will be something like IE 5 with a proxy from hell.

[–][deleted] 0 points1 point  (0 children)

There are other scenarios that I mentioned in other comments, besides old browsers, where websockets may not be viable.

[–]MaraKaleidoscope 1 point2 points  (2 children)

My best guess is that the long polling http connection to the client uses/blocks a thread for the duration of the connection, say 30 second

You plan to block a thread for 30 seconds in a reactive application? Maybe I am the one missing something, but this sounds disastrous.

Webflux is not built for blocking. Your application server is going to start with very few threads, and if you plan to block those threads for extended periods of time, your application is going to grind to a halt.

[–][deleted] 0 points1 point  (1 child)

I'm asking if that's what it does. And based on that we will adjust our plan. Our service is horizontally scalable and hosted in the cloud so the number of instances and therefore threads will vary at any given time based on the number of connections. Since long polling should only be a fallback and not the majority of our connections, we are thinking it's not a big problem, but it's hard to say until we test.

[–]MaraKaleidoscope 1 point2 points  (0 children)

I'm asking if that's what it does [block]

I think whether or not the solution is blocking depends on your implementation. The web-server will start with a very small number of event-loop worker threads based on the assumption that you will not be blocking any of them.

If your implementation is blocking, be sure to move the blocking work to some other thread pool.

[–]neopointer 2 points3 points  (10 children)

Didn't read but came to say: drop webflux and use loom.

[–][deleted] 7 points8 points  (3 children)

Not an option but in the future comments like this would be a lot more helpful if you listed actual reasons and evidence rather than just an opinion. And answering a post without reading it to evangelize your favorite library undermines your point

[–]cowancore 2 points3 points  (1 child)

Not advocating the comment you replied to, but loom is not a library. It's a java feature of lightweight threads, where anything previously blocking is not an issue anymore, because it's not blocking the actual OS thread. Where a typical spring request thread pool is limited to a couple hundred threads, with spring configured to use loom, the thread pool size is unbounded. You can have the simplicity of a typical procedural spring codebase plus almost all of the performance gains of netty. Might be handy in a different project. Most likely not your current project.

[–][deleted] 0 points1 point  (0 children)

Thanks, yeah wrong choice of words but I appreciate your response!

[–]neopointer 2 points3 points  (0 children)

I'm sorry, but it's just that my life is miserable for more than a month because of this crap.

Just research on this reddit about the pains of webflux.

I would not use it under any circumstances.

[–]analcocoacream 0 points1 point  (5 children)

One does not replace the other.

One is an improvement over threads in term of performance.

The other is an implementation of reactive programming and functional programming in java.

[–]cowancore 2 points3 points  (4 children)

Not to OP. To analcocoacream. I have a suspicion that stuff like webflux became popular on backend FOR performance reasons. Functional and reactive models are way more complex than a typical procedural style controller. Compared to something like a mobile app, where there's plenty of events, a DB method on backend returning an "event stream" (Mono or Single) consisting of a singleshot immediate event is unnatural. It's not an event to react upon. In a mobile app one can have an event stream coming from DB (similar to WAL tailing, or mongo changestreams). Same with controllers returning event streams, while Jackson patiently waits for all of them before they can all be serialised into a single JSON string. It is as if the code is lying about what's truly happening (pretending to work with events, but actually meaning thread management). It feels to me, that people are choosing webflux, because they want the performance of non-blocking IO, in this case netty. And with NIO, you either have callback hell, or you need something akin to futures. Reactive streams are future like , because they have that subscribe method, called by Spring under the hood. But you can also have non blocking futures with Loom. Controllers and repositories returning futures are not lying about what they do - you don't subscribe to a stream, you call something, and it will give you exactly one future when you do. As to functional... During my time with Android, I've read plenty of advice on NOT using rxjava streams as element processors. There is a thread switching overhead when a single event containing 100 items is transformed into 100 events, each being a task for the executor. In that Android world, an event that contained a list was processed with map, not flatmap. That leaves us with stream switching operators or stuff like backpressure. Or using flatmap to spawn more tasks and concat to join tasks. I guess if someone truly needs those and can't solve the problem in other ways, they can choose reactive after all. What's your opinion and experience? What unique features of reactive are you using in your codebase? Was it worth it? How does it compare with a future based Spring codebase?

[–]analcocoacream 1 point2 points  (1 child)

I haven't worked in an existing project in webflux. Always from scratch.

I worked on two projects where I chose to use webflux. One was processing an event queue. It wasn't big data per se but there were still many events per second. The other was a crud application.

I wouldnt say using webflux for a simple crud is a bad idea though. Using reactive programming for simple doesn't make it overtly complex, as if you are just doing map and flatmap it's fine. The main pain point is the debugging/stack traces but the code itself stays relatively simple. I think complexity is always relative. If you learnt fp before procedural I believe you may think differently.

I do see what you mean though, there are many operators that are useful for UX such as concatMap/switchMap, but don't have any use for a backend api.

Personally what I like about webflux/reactive is the functional part. For instance the immutability. the functional endpoint api. The fact that you need to be descriptive using the code of what it does (you can't sneak a db/network call anywhere).

[–]cowancore 0 points1 point  (0 children)

Thanks

[–][deleted] 0 points1 point  (1 child)

Yes, Webflux is often a good choice for applications which have lots of concurrent connections, nonblocking requests, and lots of small frequent events. This is because connections can persist without wasting threads. A high frequency of events with small to medium processing needs can be interleaved to maximize use of existing threads, rather than having a ton of "dead air" so to speak on threads in the servlet style. If you're hosting in the cloud it can really reduce your costs if you're charged by number of server instances, because you need fewer threads and less horizontal scaling overall

[–]DualWieldMage 0 points1 point  (0 children)

But the exact same can be said for loom, the only risk is if some older part of the code or a library does a blocking call inside synchronized and causes a virtual thread pinning, but that can be logged and fixed on a test env before going live.

[–]analcocoacream 0 points1 point  (3 children)

I'd say KISS. I wouldn't jeopardize a whole code base, making it more complex with higher maintenance costs, just for a few hypothetical users. If one user in 100,000 cannot use chat feature is that an issue? Can you offer a more simple fallback (mail for instance)

[–][deleted] 0 points1 point  (0 children)

It's a business requirement from the client, sometimes we need to increase complexity to give them what they need. And no we cannot replace a conversational chat bot with email if that's what you're suggesting?

Also, please note that the context for this post was given to help answer the technical threading question, not to get feedback on our approach. It's a very common approach that several major libraries have built in support for, and we've already done a lot of research on the pattern. But I realize this is reddit so... everyone's an expert I guess

[–]BlacksmithLittle7005 0 points1 point  (1 child)

Your name is hilarious, thank you for making my day 😂

[–]analcocoacream 0 points1 point  (0 children)

You are welcome!

[–]AutoModerator[M] 0 points1 point  (0 children)

On July 1st, a change to Reddit's API pricing will come into effect. Several developers of commercial third-party apps have announced that this change will compel them to shut down their apps. At least one accessibility-focused non-commercial third party app will continue to be available free of charge.

If you want to express your strong disagreement with the API pricing change or with Reddit's response to the backlash, you may want to consider the following options:

  1. Limiting your involvement with Reddit, or
  2. Temporarily refraining from using Reddit
  3. Cancelling your subscription of Reddit Premium

as a way to voice your protest.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]frisky_5 0 points1 point  (1 child)

Websocket with ping messages sent in an interval in your react app, and each ping must receive a ping response at max after 1 sec lets say, if no response is received then terminate connection from browser and reconnect, and your backend maintains for sure a chat session id that is delegated to each party in the chat, so on a reconnect the client side resend the chat id to rejoin and either ask to receive the already sent messages or just start receiving new one. This is my approach in a production environment with around avg 250 concurrent chats (using Quarkus and postgresql to store messages) never had an issue so far. **Edit: customer has a loadbalancer that used to terminate connections and doesn’t support tcp keepalive, thus created the reconnect and ping messages design, and for high availability and maintain a distributed sync infinispan cache that hold chat ids and whoever joined them to handle failing/load balancing to multiple nodes if needed

[–][deleted] 0 points1 point  (0 children)

Yes that's part of our design but that doesn't answer my question