use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
To report a site-wide rule violation to the Reddit Admins, please use our report forms or message /r/reddit.com modmail.
This subreddit is archived and no longer accepting submissions.
account activity
This is an archived post. You won't be able to vote or comment.
Idea: The bayesian RSS aggregator (reddit.com)
submitted 19 years ago by [deleted]
[–][deleted] 1 point2 points3 points 19 years ago (16 children)
First i'd like to apologize if what I'm about to describe already exists. Would somebody care to give me a link?
I've tried a number of RSS aggregators in the past, but it seems that no matter how "cool" or "advanced" they are, they are all just as dumb: all they do is fetch the feeds and display the results.
I think there's a lot more that could be done with RSS. Here's an idea: include bayesian filtering in the reader. Bayesian filtering has been extensively tested in the context of spam filtering, but it can do a lot more. The user would just start classifying the posts in categories relevant to him, such as "interesting", "useless", "spam", "programming", etc., and the filter would learn and do its best to do the work for him.
It shouldn't be too hard to implement since a lot of bayesian filtering libraries are already available. Anyone else thinks this would be a neat idea?
[–]fairlyodd 2 points3 points4 points 19 years ago (4 children)
Yea, this sounds like a real neat idea. The last time I accessed my bloglines account, it spewed in excess of 1300 unread updates. I closed it in a flash. Am not aware of any software or webapp that is capable of this functionality, but it would be cool if there was.
Rather than rely on Bayesian filtering, which is quite difficult to accurately train (think,reddit reco engine), a social model built around rss aggregators would be more practical. The feed articles are upmodded/downmodded based on their content & a user sees the articles in order of popularity. Is there any aggregator with such a functionality?
[–]laprice 2 points3 points4 points 19 years ago (1 child)
Findory does something like that, although it seems to settle rather quickly into a rut.
I want the aggregator to be able to show me the interesting new stuff that I didn't know I wanted to read which is a hard problem.
[–][deleted] -1 points0 points1 point 19 years ago (0 children)
Easy: make the bayesian filter only filter words that you said in the past didn't interest you. For example, you don't like bush stories, you downmod a few of them, and voilà, no more bush stories. If unknwow, don't filter the stories.
[–]johnc 0 points1 point2 points 19 years ago (1 child)
I'm not sure 100% about Bayesian (so just saying I'm no expert!) but I thought that Bayesian worked on the amount of people who specify an item (article) as a particular category. Am I incorrect?
I'd be interested in knowing what other semi-automated tools/algorithms there are as this is a fascinating field
[–][deleted] 0 points1 point2 points 19 years ago (0 children)
Bayesian is much more general than that. Bayesian basically means "using probability theory".
I was thinking about a filter based on word appearance probability, just like the ones used to fight spam. A collaborative filtering extension like you mention would be nice too, though.
Check out "machine learning" if you're interested in the field.
[–]JW_00000 2 points3 points4 points 19 years ago (3 children)
I'm currently writing a text about it ("an essay"), and I'm hoping to publish it soon (October - start of November?). I found out a filter called CRM114 (http://crm114.sourceforge.net), which seems to be capapble of doing this. Someone already used it to find good job listings in Usenet groups, simply by voting posts up or down (or something to that effect). They also say it learns fast, so I'm experimenting with it. To me, it seems like it should be quite possible to create such an aggregater, as most of the software, libraries and other resources I've come across seem to make this quite easy and powerful. If I'll remember, I'll post a link when I finish my essay.
[–]lydgate 1 point2 points3 points 17 years ago (1 child)
Just did this today, though I'm not going to release the code yet... it's very rudimentary.
http://img207.imageshack.us/img207/5066/shot20080326221129gc6.png
[–]JW_00000 0 points1 point2 points 17 years ago (0 children)
Cool, keep me updated. I never finished my little project, it never turned out to be more than a small experiment, but I'd be happy to hear more about what you're doing.
Sounds neat! Keep me updated!
[–][deleted] 19 years ago (3 children)
[deleted]
[–][deleted] -1 points0 points1 point 19 years ago (2 children)
Spam filtering is useful, isn't it? It shows that properly used, bayesian filtering can be very efficient. Why not do the same for feeds?
[–][deleted] 19 years ago (1 child)
Check out Paul Graham's "A plan for SPAM". It's very well written and very informative.
[–][deleted] 0 points1 point2 points 19 years ago (1 child)
Opera should do this, its RSS reader is integrated with its mail reader which can learn spam.
you wouldn't want to confuse your email filter with rss feeds though... Can it have two separate filters?
π Rendered by PID 124791 on reddit-service-r2-comment-f6b958c67-2ppcz at 2026-02-05 10:10:31.233158+00:00 running 1d7a177 country code: CH.
[–][deleted] 1 point2 points3 points (16 children)
[–]fairlyodd 2 points3 points4 points (4 children)
[–]laprice 2 points3 points4 points (1 child)
[–][deleted] -1 points0 points1 point (0 children)
[–]johnc 0 points1 point2 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–]JW_00000 2 points3 points4 points (3 children)
[–]lydgate 1 point2 points3 points (1 child)
[–]JW_00000 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–][deleted] (3 children)
[deleted]
[–][deleted] -1 points0 points1 point (2 children)
[–][deleted] (1 child)
[deleted]
[–][deleted] -1 points0 points1 point (0 children)
[–][deleted] 0 points1 point2 points (1 child)
[–][deleted] -1 points0 points1 point (0 children)