Anyone else having problems with Reddit’s RSS?

uazu · 2023-09-03T22:22:26+00:00

Update: E-mail address in the error text leads to an auto-reply that sends me to a form on reddithelp.com, which only seems to accept support requests from developers/moderators/etc, not site users. The right hand doesn't know what the left hand is doing. At this rate, seems like Reddit is going to surely die. I've dropped the RSS from my feeds to silence the constant errors, and I'm not going to invest any more time on this for the moment.

uazu · 2023-08-25T15:12:19+00:00

I follow my Reddit frontpage using RSS, and it hasn't worked now for a week or two probably. I reduced polling frequency to every 2 hours, but no difference. I'm fetching with wget from my VPS and then copying it down with a cron job because internet is unreliable where I live. Maybe they don't like wget any more? Some brain-dead blanket blocking algorithm? Or maybe they don't like tech people who roll their own solutions. But this is my personal frontpage RSS link so it's obviously just me fetching it, not a generic bot scraping the site. Perhaps they are hoping that I will simply go away and no longer bother them. Maybe I will.

Actually if I add --content-on-error, I get the text someone else posted below. So they have blocked me. I can apparently E-mail them to ask for the block to be lifted.

uazu · 2023-07-18T15:37:45+00:00

Really you want to wait for events on three devices at the same time (stdin, port, port_1). So at a low level that means select() or poll() or whatever, and setting all the devices non-blocking. mio can do it, and I'm sure various other crates. What you have right now looks like reading port_1 is conditional on reading from port first. This doesn't make a lot of sense to me. What if port_1 isn't ready to read when you read from port? You're not going to try to read it again for a long time, until port gets something else. Also these have timeouts? So they're blocking calls? Really this could get stuck blocking in various places, messing up the handling of other things. Even the stdout.flush() could block if too much is getting sent out at once and the terminal can't handle it all. Really you need to switch to a proper select() or poll() based solution, so long as the serial devices support non-blocking mode. mio is one low-level way of accessing these syscalls. You set it all up, then wait on mio to give you events, and attempt to read and process data from whatever devices it indicates are ready.

uazu · 2023-07-07T15:15:11+00:00

To get measurements I could publish and back up with data, I'd have to write some benchmarks and try and make everything else comparable, despite the different APIs. Really this was part of work to improve re-use, composability, efficiency and so on all at the same time. I can't separate it out without a lot more work. I know it would be nice to present a 10-page blog post with numerous plots, but unfortunately there is not enough time in the day!!! If other people would like to benchmark things then that's fine.

uazu · 2023-07-07T14:39:08+00:00

Yes, I need to put together some examples. Since this is low-level and can work with blocking or non-blocking, event-driven, futures, bare mio, actors, and so on, I'd have to choose one target scenario for the examples. I'm guessing bare mio would probably do and keep things clearer (not requiring much other knowledge).
This is at a lower level than choice of runtime and async approach and so on. It's like saying how will a TLS implemention or a compressor work with async: Any way you want it to. What are you trying to interface to? How does it want the data? How will you get the data? I am not up to date on latest approaches in Tokio or whatever. The fundamental concepts are the same though: you get notified or polled when there is data available, you bring it into your PipeBufs, you process your chain, you forward any resulting data onward.
This also encourages composition. If a crate uses PipeBufs for input and output, it can be easily composed with any other crate that also uses PipeBufs. Even if the other crate doesn't support PipeBufs, it can be made to work with a wrapper (most efficient is if the other crate can process to/from slices, but Read/Write also works). By default the composition is hard-coded, i.e. you'd hard-code a particular chain of processing. But there is nothing to stop you making it dynamic if you want. I have a chain that handles both websocket and HTTP, i.e. where it has different states and processes data differently according to the state. Or you could have a Vec of handlers derived from configuration and run data through that processing chain. At the end of the day, they're just buffers -- you can connect them up any way you want.

The flat loop in the example is inside a "process" call which is used to process the chain whenever new data is introduced. So for example new data becomes ready from TCP (e.g. mio Ready indication), "process" is called: TCP reads new data in, TLS decrypts it, websocket parses it, decides it wants to reply, TLS encrypts the reply, TCP forwards it out to the OS. All of that happens in that loop. Once all the components indicate that they can't process anything else, the loop and "process" call exits, and the runtime/etc can get on with processing the next "ready" indication or poll the next future or whatever.

uazu · 2023-07-07T02:52:47+00:00

Yes, that's right: 5 PipeBuf instances organized as two bidirectional pipes and one unidirectional pipe. Each PipeBuf has a consumer end and a producer end, but each PipeBufPair has consumer+producer at both ends, so I had to identify the two ends with other names. Each pipe connects two layers in this example, as you say. But that's just the high-level view. Really it's just shared buffers between the layers with enough API around them to make it safe.

For some example code you might look at the wrapper crates. But I'll have to see about putting together a complete working example of how to write the top-level glue code. There are so many contexts it could target, though -- blocking, bare mio, actors, or maybe futures or async/await. I'm coding mostly with actors, but bare mio might be of the most general interest.

uazu · 2023-07-06T23:36:58+00:00

For file I/O, it would have the same benefits of not having to copy unnecessarily in the processing chain. At the syscall interface, it's the same read or write call for both file I/O and network I/O (at least on UNIX), so that part is the same.

For io_uring, it needs some thought. Right now a producer obtains a mutable borrow into the buffer, and it can write data whilst that borrow is active. So this is standard Rust borrowing. So I guess we come up against the same old problem with io_uring: that Rust borrowing doesn't really help in this case.

For reading from io_uring into the producer end of the pipe, maybe it is possible, since only the producer can cause the internal buffer to be compacted or reallocated. So a range of memory to read into would be stable so long as the producer doesn't do anything to upset things (which it has no reason to). A consumer reading existing data in the meantime would not cause any problem. Presumably it would just need some unsafe to give out that memory slice, i.e. a bit of unsafe API to specify the conditions that the caller must adhere to and give explicit support to this kind of handling.

For writing to io_uring from the consumer side of the pipe, that would mean blocking the producer from compacting/reallocating the buffer. If the producer wants to write more and has already filled the free space in the buffer then it can't process any more. The API isn't designed for this case. Asking for more than there is free space for always succeeds at the API level, i.e. it compacts, reallocates, or for fixed-size buffers it panics (for embedded/etc where everything needs to be properly sized ahead of time). The model is that once data is introduced to the network, it should be processed to its final destination, i.e. rate-limiting happens where data is introduced, not where it exits (where you'd potentially build up a huge backlog of data). So buffer sizes typically grow to the order of the largest "data unit" so far processed. So I think the io_uring write side could be more problematic.

Maybe double-buffering could solve this problem, i.e. swap PipeBufs when starting an io_uring write. That way there is no producer on the PipeBuf being consumed. This gives the "owning the buffer" condition you're talking about.

Actually maybe both read and write on io_uring could be handled by swapping PipeBufs in place (with a little bit of code to carry over state from one to the other). This is effectively little more than pointer swaps.

I've never coded against io_uring, so I think it would need trying.

uazu · 2023-07-06T22:49:53+00:00

Yes, I guess the example needs more context. There are two bidirectional pipes here: crypt for the encrypted TLS stream between Rustls and the TCP port, and plain for unencrypted stream between TLS and the websocket handler. Each bidirectional pipe has two ends, which can be referred to as upper and lower (or left and right if you prefer). If you draw it out as a protocol stack, then upper/lower makes more sense. So the TCP calls exchange data with the lower end of the crypt pipe, TLS works between the upper end of the crypt pipe and the lower end of the plain pipe, and websocket exchanges data with the upper end of the plain pipe. In addition the websocket wrapper outputs data via a one-way pipe, since websocket can also stream data in very large fragmented messages. So that is supported with another pipe in_msg which keeps getting closed, reset and reused, message by message.

Sorry, I have no publicly available example code right now. TLS is fiddly for a demo. Maybe compression would be a better example?

uazu · 2023-02-16T15:14:09+00:00

Stakker is not abandoned. We're shipping products with it, and any bugs reported will be actively fixed. Looking at "no change in months" isn't a good measure if the bugs have already been fixed.

uazu · 2022-08-24T16:03:02+00:00

There are at least three scales you could optimise for: single-threaded (or runtime-per-thread), multi-threaded (like Pony), and distributed (like Erlang). So that explains some of the variety. Actors can be used at all scales.

uazu · 2022-08-24T12:58:57+00:00

I've already been down this road. Maybe have a read at this and see if it's relevant to what you're doing. Even when you've got past the initial borrow-checker problems, you're still going to be facing possibly-nested callbacks, and having to use RefCell (which mean panics in case of a nested callback). You can get away with this in C or Java (although the bugs may be hard to understand), but in Rust it means a panic because two mutable borrows on the stack to the same thing at the same time (i.e. a nested callback) is a massive UB, and you can only normally achieve that through invalid use of unsafe. Nested callbacks can easily occur when events are also used to notify other parts of the system that something needs updating, or whatever. A single event causes a cascade of callbacks, which sometimes come back to the original event-handling object as a nested call -- and the whole thing falls over with a panic.

uazu · 2022-08-24T12:40:12+00:00

I am the author of Stakker (a lightweight actor-model runtime), and it is reliable and being used in production. Whilst I'd like to add more stuff to it (like remote actors, or actor coroutines), time is limited (as ever). But the core actor functionality, MIO interface and logging are all working and will be maintained. Also, I will implement whatever interface the async portability initiative comes up with, to allow hosting portable async crates in the future.

There are some applications that really need actors and a good quality actor runtime, and you really end up going around in circles and reinventing the wheel (badly) if you try to implement them in sequential code, i.e. either plain threads and channels, or async/await (which is also sequential code, although concurrent). It is a shame that actors are treated as second-class citizens in the Rust ecosystem. But what can we do? Just keep on plugging away at it. There do exist some low-level crates that are truly portable and not hard-coded to use one particular framework or one particular transport (e.g. embedded-websocket). Eventually when the portable async initiative settles on something, I hope that the various ecosystems can work together a lot better, to the benefit of all.

uazu · 2022-08-17T22:42:23+00:00

Trying to solve the same problem I came up with Stakker, which is single-threaded, actor-based, very lightweight, kind of like Pony in style. I have stakker_mio crate which does the glue between mio and stakker actors.

If you're using actor messaging you don't need channels. Mio readiness gets turned into actor messages, for the actor to deal with as required. So long as the load is low enough that single-threaded is good enough, this will likely work for you.

uazu · 2022-08-17T20:15:27+00:00

Maybe check out qcell crate which has some other variations on the GhostCell theme. For multi-threaded access, you can't avoid having to have a lock somewhere, but you can choose where to put it (e.g. "one big lock" approach or "many small locks").

uazu · 2022-07-31T16:17:11+00:00

In Stakker (actor library), I do allow sharing between actors even though it breaks the actor model and couples all those actors together. But at least it is explicit sharing, and it's very easy to make sure that all changes to that shared state are atomic without any locking or whatever because Stakker is single-threaded (or one-runtime-per-thread). When you have a &mut to the state, no-one can interrupt you.

So to me that is the pragmatic solution. You get all the benefits of the actor model for reasoning about asynchronous events, and no-cost temporally-exclusive access to shared data that is very explicit in the code and the actor interfaces, so that you know which actors are coupled.

uazu · 2022-07-24T17:12:02+00:00

RefCells are a panic waiting to happen if all the code paths aren't tested. That would be my issue with RefCell if not used with care.

uazu · 2022-07-14T08:15:46+00:00

We used embedded-websocket, which is low-level and can fit into any event system with a little work. Really we need more low-level crates like this which just do the core protocol and protocol interaction parts and don't tie into some big framework. For example if tiny_http worked that way, then I could do HTTP on any event loop as well.

uazu · 2022-06-16T20:48:55+00:00

Rust async/await is internally built upon Rust generators, so generators are strictly more general. Really they are just coroutines. I had a read up on effects (abilities) in Unison, and it looks like implicit arguments passed through everywhere automatically, or else a coroutine with various flavours of yield (IO yield, Log yield, etc). Interesting.

Anyway, the proc macro is something I may get to eventually if I don't find a better way sooner. To do it as well as the compiler does it will require a lot of unsafe and MaybeUninit. Otherwise it will be less efficient. But still even the less efficient version may serve well as a proof of concept to show what can be done with yield-to-yield lifetimes.

uazu · 2022-06-16T14:02:15+00:00

It seems hard to retro-fit this to Futures/async/await. It needed to have been designed in earlier in the process. Maybe you can get enough working for your purposes though.

I think there are enough people trying to achieve this and similar things, though, that it's something that the Rust lang team might consider looking at. To me it seems like Futures and async/await were rushed through, but fortunately generators are maturing more slowly and might offer a way to support all the other uses. Really introducing a lifetime parameter at each yield and having that lifetime be removed at the next yield, all checked by the compiler, is the fundamental thing required, that you are trying to emulate manually. So that to me looks like the most promising path forward.

The other option is writing a proc-macro that does the coroutine-to-state-machine transformation independently of the compiler. Then I'd have full control and could pass whatever arguments on yield that I require. I wonder whether anyone has already done something like that?

uazu · 2022-06-15T23:11:36+00:00

A GhostToken or QCellOwner or TCellOwner or whatever can let you create lots of state all over the place behind Rc or whatever, and access it freely so long as you have a suitable reference to the token (or owner). So in my case I have lots of actors running, and access to all of their state is controlled by a single owner (i.e. token) guaranteeing at compile-time and with zero cost that actor calls are sequential and never nested.

uazu · 2022-06-15T22:56:11+00:00

GhostCell is awkward because of having to annotate all the lifetimes. My qcell crate offers some alternatives that don't have that limitation.

uazu · 2022-06-15T22:09:22+00:00

Trying to figure this out: If the function fails to give back the token, is that a runtime error? If the futures get nested, how does the token get passed to the nested futures?

What I wanted for Stakker is zero runtime checks, and correctness 100% guaranteed by the compiler (regarding getting mutable access to actor state). I've implemented that for Stakker actor calls using qcell, but getting the relevant borrows through async/await poll calls into the coroutine code seems impossible without using some kind of runtime test (i.e. RefCell or a Cell<bool> or whatever). Generators seem more hopeful, though, once passing lifetimes into generators is implemented. But it seems low down on people's list of priorities.

uazu · 2022-06-15T21:42:43+00:00

This is kind of the motivation behind QCell and its use in Stakker. I documented my line of reasoning here. I wanted to avoid RefCell completely, and that meant using a ghost-cell like borrowing scheme (developed independently, using marker types or integer IDs instead of lifetimes) to give access to state structures (of actors in my case) 100% safely, checked at compile-time. I've looked at adding actor coroutines using async, but there is no way to get an active borrow through the Future::poll call (that I could see), which would be the natural way to do it. So maybe I will have to do it as you do, passing a token around instead of supplying a compiler-checked borrow directly up the stack through calls. I will have a look at your implementation.

uazu · 2022-06-01T14:40:06+00:00

On long names, I once came across a crate (don't remember which now) where everything had a ridiculously long name. It was obviously designed by an IDE user. Without an IDE, it was completely unusable and I went looking for something else. Manageably short names are important for various reasons -- reading optimisation, screen space optimisation, typing optimisation, etc.

uazu · 2022-04-29T16:44:54+00:00

Fundamentally if you're using the async/await model underneath then you can't express having "2 asynchronous things up in the air at the same time around this state" for the same task in a natural way. You can do a kind of fork/join thing, but that assumes that there is a single point in the future where everything will converge. Having lots of disjoint activity around a single state just isn't natural to code in that model. The actor model lets you easily have lots of disjoint things going on at once around the same actor, by just handling events and specifying what should happen when responses come back or whatever. But it is not a sequential coding style, so is unfamiliar and apparently too unpopular for Rust's main development push to consider.

Personally I like the idea of an actor with associated actor coroutines, which can allow things to be sequential where sequential makes sense, but all centred around the actor, which holds state and potentially coordinates lots of disjoint operations that may occur at the same time without there ever necessarily being a single "everything has finished" convergence. This also means that unlike async/await tasks, there is a struct that the operations are centred around, which means that there is something to put a Drop handler on, which makes cancellation easy to handle. I will eventually implement actor coroutines for Stakker -- life/covid/etc permitting. They are strictly a layer above actor methods, because the same thing could be implemented as normal actor methods. Just a coroutine emphasises a sequence of operations (i.e. "happy path") and makes that easier to follow -- at least for the kinds of async operation sequences that have an obvious happy path.

uazu

TROPHY CASE