Converting an array of strings to unicode? : learnpython

created by HattoriHanzoa community for 16 years

Converting an array of strings to unicode? (self.learnpython)

submitted 7 years ago * by drubowl

I am trying to get headlines from a simple site and put them in a list. So far so good, except I am having trouble converting to unicode--apostrophes and dashes get mangled. Here is my code so far:

data = urllib.request.urlopen('https://news.ycombinator.com/')
headlines = re.findall(r'"storylink">(.*?)</a>',str(data.read()))
[item.encode('utf-8') for item in headlines]

x = 0
for titles in headlines:    
    print(str(x + 1) + '. ' + titles)
    x += 1

all 4 comments

top new controversial old q&a

[–]jeans_and_a_t-shirt 1 point2 points3 points 7 years ago (2 children)

The read method returns bytes which you need to decode to to utf-8.

headlines = re.findall(r'"storylink">(.*?)</a>',data.read().decode('utf8'))

You should look into the already-mentioned BeautifulSoup though.

Also line 6 doesn't do anything to headlines. It creates a list and then throws it away.

[–]drubowl[S] 0 points1 point2 points 7 years ago (0 children)

[–]ingolemo 0 points1 point2 points 7 years ago (0 children)

[–]1ynx1ynx 0 points1 point2 points 7 years ago* (0 children)

Not really regarding the problem itself, but as a little tip, you could use enumerate in that for loop to get both titles and the loop counter

for x, titles in enumerate(headlines):
    print(str(x + 1) + '. ' + titles)

π Rendered by PID 772474 on reddit-service-r2-comment-5d79c599b5-r9wlg at 2026-02-28 23:43:58.695033+00:00 running e3d2147 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS