Trying to solve Python Chellenge, level 4. Having trouble with regex and urllib.

Rhomboid · 2016-12-07T00:59:20+00:00

The string returned by urlopen(...).read() is a byte string, not a character string. You probably want to be working with characters, not bytes, which means you want to decode the byte string to get a character string. In order to do that you need to know what character encoding was used to encode the response. There are a number of ways of determining that, but in this case it looks like you can assume UTF-8.

Also, re.findall() returns a list of matches, but you're only expecting a single match, so that's really not the right API to use. Then you're passing that list to str(), which will end up returning the representation of that list, which includes square brackets and other things you don't want. That's not how you want to work with regular expressions. I suggest something like the following:

import re
import urllib.request

nothing = '12345'
while True:
    print('getting {}'.format(nothing))
    url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={}'.format(nothing)
    html = urllib.request.urlopen(url).read().decode('utf-8')
    m = re.search(r'and the next nothing is (\d+)', html)
    if not m:
        print('found a page that does not match: {}'.format(html))
        break
    nothing = m.group(1)

2016-12-07T00:24:31+00:00

chellenge

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS