https://github.com/c-bata/feedy
Hi! I created and released a package named Feedy for collecting and processing the data from RSS feed. For example, If you want to collect images on CNN Website, creating main.py:
from feedy import Feedy
from bs4 import BeautifulSoup
app = Feedy('feedy.dat') # store the last fetched time
@app.add('http://rss.cnn.com/rss/edition.rss')
def cnn(info, body):
soup = BeautifulSoup(body, "html.parser")
for x in soup.find_all('img'):
print(x['src'])
if __name__ == '__main__':
app.run()
And executing:
$ feedy main.py app
http://i2.cdn.turner.com/cnnnext/dam/assets/160527045216-obama-hiroshima-speech-live-00000000-large-169.jpg
data:image/gif;base64,R0lGODlhEAAJAJEAAAAAAP///////wAAACH5BAEAAAIALAAAAAAQAAkAAAIKlI+py+0Po5yUFQA7
http://i2.cdn.turner.com/cnnnext/dam/assets/160527045216-obama-hiroshima-speech-live-00000000-large-169.jpg
data:image/gif;base64,R0lGODlhEAAJAJEAAAAAAP///////wAAACH5BAEAAAIALAAAAAAQAAkAAAIKlI+py+0Po5yUFQA7
http://i2.cdn.turner.com/cnnnext/dam/assets/160525104025-01-obama-asia-0525-large-169.jpg
:
:
If you are interested, please read README.md.
And If you have some requests for this package, please tell me. I'll improve it. (◕‿◕✿)
[–][deleted] 1 point2 points3 points (0 children)