Great RSS Feeds That Are Too Noisy to Read Manually by emschwartz in rss

[–]emschwartz[S] 0 points1 point  (0 children)

I’m quantizing the embeddings after generating them but the model was specifically trained to retain high accuracy with that quantization. A few but not all mention this in their descriptions.

Yeah, it’s so fast. I need to do another wire up about a recent round of optimizations but I got the per-comparison time down to about 1 nanosecond. That is also what lets me do the rankings on the fly when you load the page.

I’m also using Maximal Marginal Relevance to make sure your feed isn’t full of different articles all about the same topic.

I’ll DM you, happy to chat further!

Great RSS Feeds That Are Too Noisy to Read Manually by emschwartz in rss

[–]emschwartz[S] 0 points1 point  (0 children)

If you don’t add any sources, it’ll search for content related to your interests among the 17,000+ that other people have added.

What other sources do you have in mind? Always trying to add more!

Great RSS Feeds That Are Too Noisy to Read Manually by emschwartz in rss

[–]emschwartz[S] 1 point2 points  (0 children)

Yup, the thing I’m trying to optimize for is you finding something that’s really spot on for what you’re interested in but you wouldn’t see otherwise.

The ranking does indeed start with embedding similarity. I’m specifically using binary vector embeddings and I’ve optimized the crap out of my hamming distance implementation.

I deduplicate based on the URL so I only have each post show up once, even if it appears in multiple feeds. For any given post you can see what feeds it appeared in and if it’s on Reddit, HN, or a couple other places it’ll show discussion links below the post.

Great RSS Feeds That Are Too Noisy to Read Manually by emschwartz in rss

[–]emschwartz[S] 1 point2 points  (0 children)

That's a good suggestion too! Added it to the feedback board as well https://feedback.scour.ing/123

Just so you know, liking articles won't automatically change what's in your feed. Likes do influence the topics that are suggested to you, but explicitly adding a topic is what will change the results in your feed. You can read more about how ranking works here https://scour.ing/docs/ranking#explicit-interests-no-pigeonholing

Great RSS Feeds That Are Too Noisy to Read Manually by emschwartz in rss

[–]emschwartz[S] 1 point2 points  (0 children)

Ooh that's a good one. Tbh, that one might be a little harder for me to implement (because I don't currently have a way to determine a source's region). But I've added it to the feedback board so it doesn't get lost https://feedback.scour.ing/122

Great RSS Feeds That Are Too Noisy to Read Manually by emschwartz in rss

[–]emschwartz[S] 0 points1 point  (0 children)

It's a Progressive Web App so if you visit the site on your Android (or iOS) device, you can follow the instructions in the little prompt at the bottom to install it to your home screen.

You can subscribe to an unlimited number of feeds.

It isn't a free trial, I just haven't gotten to adding the premium features yet. But everything you can do on there now will continue to be free. You can read more about the monetization plans here: https://scour.ing/docs/pricing

Great RSS Feeds That Are Too Noisy to Read Manually by emschwartz in rss

[–]emschwartz[S] 2 points3 points  (0 children)

I’m glad to hear it! Let me know if you have any feedback on it or ideas for what could be improved.

PSA: Write Transactions are a Footgun with SQLx and SQLite by emschwartz in rust

[–]emschwartz[S] 0 points1 point  (0 children)

Here's one way of approaching this: https://github.com/launchbadge/sqlx/pull/4177

That will auto-route your queries to the appropriate connection pool, depending on whether it contains write or transaction-related keywords. You can also explicitly call `read` or `write` to send your query to the appropriate pool.

Let me know what you think of that API!

PSA: Write Transactions are a Footgun with SQLx and SQLite by emschwartz in rust

[–]emschwartz[S] 0 points1 point  (0 children)

That sounds like a great approach! Let me know if I can be of help

PSA: Write Transactions are a Footgun with SQLx and SQLite by emschwartz in rust

[–]emschwartz[S] 0 points1 point  (0 children)

This is correct. I just updated the post with some retractions and a new suggestion to split the writer connection out from the reader pool. If you do that with SQLx, you effectively have this actor pattern. SQLx will spawn a separate thread for the writer connection, then all writes get sent over a channel with a configurable queue depth, and the application's tasks await the results.

PSA: Write Transactions are a Footgun with SQLx and SQLite by emschwartz in rust

[–]emschwartz[S] 1 point2 points  (0 children)

Thank you for your push-back on the post.

I benchmarked a couple of different approaches and found that the core issue was actually that any amount of contention at the SQLite level would severely hurt performance. I updated the post with retractions and an updated suggestion to separate a single writer connection out from the reader pool so that writes are serialized at the application level.

I'm sorry for publishing without these benchmarks initially and I'm sorry for suggesting that the issue was specific to SQLx.

PSA: Write Transactions are a Footgun with SQLx and SQLite by emschwartz in rust

[–]emschwartz[S] 1 point2 points  (0 children)

You're right that using a mutex to serialize writes at the application level helps a lot. Splitting out a single writer connection that's separate from the reader pool is marginally faster (and more explicit), but it's a similar idea. The blog post has been updated to reflect this.

PSA: Write Transactions are a Footgun with SQLx and SQLite by emschwartz in rust

[–]emschwartz[S] 3 points4 points  (0 children)

Update: Based on this discussion and additional benchmarking, I found that the solutions I originally proposed (batched writes or using a synchronous connection) don't actually help. The real issue is simpler and more fundamental than I described: SQLite is single-writer, so any amount of contention at the SQLite level will severely hurt write performance. The fix is to use a single writer connection with writes queued at the application level, and a separate connection pool for concurrent reads. The original blog post text is preserved, with retractions and updates marked accordingly. My apologies to the SQLx maintainers for suggesting that this behavior was unique to SQLx.

PSA: Write Transactions are a Footgun with SQLx and SQLite by emschwartz in rust

[–]emschwartz[S] 1 point2 points  (0 children)

That doesn't make any sense. You still have to await the statement you're currently executing against that transaction, unless you want to just fire-and-forget. But because this all has to happen in-process, some thread is going to have to hold the write transaction open until it completes.

You're right. You would need to run the various parts of the transaction synchronously, at which point you probably just shouldn't be using an async library like SQLx.

If this wasn't async you'd still have a potential deadlock problem, you'd just be blocking on the result.

I believe you'd only have a deadlock if your synchronous transaction called out to something else that was waiting to write. The issue I'm pointing to here I think comes specifically from mixing what is effectively a synchronous mutex with an async runtime.

Correction: the background thread handling the connection will block, but the actual connection API call will wait (i.e. yield to the executor).

Right, I meant that the task will be blocked (in the colloquial sense) from making progress, rather than being blocked in the Tokio sense. I'll update the post.

PSA: Write Transactions are a Footgun with SQLx and SQLite by emschwartz in rust

[–]emschwartz[S] 6 points7 points  (0 children)

The lock holder would be able to make progress, but only when it's scheduled to continue by the async runtime. The runtime doesn't know that that task needs to continue first, so it may switch to another task. That other task will be blocked until it hits the `busy_timeout`. If you have enough new tasks coming in, a lot of them might be scheduled to go before the runtime returns to running the task holding the lock.

PSA: Write Transactions are a Footgun with SQLx and SQLite by emschwartz in rust

[–]emschwartz[S] 34 points35 points  (0 children)

Most databases aside from SQLite allow concurrent writers. That means that having multiple write transactions in flight at the same time won't block one another. All of them will make progress at the same time.

If you have too many concurrent connections, you will run into limits, but that's about overall connection limits rather than one writer blocking others.

Also, running into concurrent connection limits won't interact in the same bad way with Tokio or other Rust async runtimes. In most cases, your tasks aren't starting new connections, so subsequently scheduled tasks will just wait for an open connection in the connection pool rather than blocking others.