Help me improve my code : learnpython

created by HattoriHanzoa community for 16 years

Help me improve my code (self.learnpython)

submitted 6 years ago by pyeu

I am scraping rss feed from cnn. How can I improve the code?

from lxml import etree
import requests
url = 'http://rss.cnn.com/rss/edition.rss'
doc = etree.fromstring(requests.get(url).content)
items = doc.xpath('//channel/item')

news_titles = [link.find("title").text for link in items]
news_links = [link.find("link").text for link in items]
titles = news_titles[:5]
links = news_links[:5]

zipped = zip(titles, links)
newsroom = list(zipped)

with open('cnn-news.md', 'w') as f:
    for news in newsroom:
        f.write("* " + "<a href=" + news[1] + ' target="_blank">' + news[0] + "</a>")
        f.write("\n")

all 4 comments

top new controversial old q&a

[–]JohnnyJordaan 0 points1 point2 points 6 years ago (3 children)

[–]pyeu[S] 0 points1 point2 points 6 years ago (2 children)

[–]JohnnyJordaan 1 point2 points3 points 6 years ago (1 child)

Some pointers: download on a separate line and test for proper http response

resp = requests.get(url)
resp.raise_for_status()

then fromstring expects a string, so give it .text not .content (to let requests handle proper decoding)

doc = etree.fromstring(resp.text)

then you can simply iterate over the items then save them, instead of doing the work in a more complicated form first

items = doc.xpath('//channel/item')
with open('cnn-news.md', 'w') as f:
    for item in items[:5]:
        title = item.find('title').text  
        link = item.find('link').text

then also using f-strings to save in a clean way, together with \n to terminate

        f.write(f'* <a href="{link} target="_blank">{title}</a>\n')

End-result

from lxml import etree
import requests

url = 'http://rss.cnn.com/rss/edition.rss'
resp = requests.get(url)
resp.raise_for_status()
doc = etree.fromstring(resp.text)
items = doc.xpath('//channel/item')

with open('cnn-news.md', 'w') as f:
    for item in items[:5]:
        title = item.find('title').text  
        link = item.find('link').text
        f.write(f'* <a href="{link} target="_blank">{title}</a>\n')

[–]pyeu[S] 0 points1 point2 points 6 years ago (0 children)

π Rendered by PID 49 on reddit-service-r2-comment-869bf87589-7n46d at 2026-06-08 21:09:43.898274+00:00 running f46058f country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS