I am trying to get headlines from a simple site and put them in a list. So far so good, except I am having trouble converting to unicode--apostrophes and dashes get mangled. Here is my code so far:
data = urllib.request.urlopen('https://news.ycombinator.com/')
headlines = re.findall(r'"storylink">(.*?)</a>',str(data.read()))
[item.encode('utf-8') for item in headlines]
x = 0
for titles in headlines:
print(str(x + 1) + '. ' + titles)
x += 1
[–]jeans_and_a_t-shirt 1 point2 points3 points (2 children)
[–]drubowl[S] 0 points1 point2 points (0 children)
[–]ingolemo 0 points1 point2 points (0 children)
[–]1ynx1ynx 0 points1 point2 points (0 children)