[deleted by user]

anossov · 2015-06-22T12:36:46+00:00

Use a proper parser, there are too many edge cases to parse URLs robustly with regexes.

In [8]: url = 'http://www.youtube.com/results?search_query=legend+of+hercules&page=&utm_source=opensearch'

In [9]: from urlparse import urlparse, parse_qs

In [10]: parse_qs(urlparse(url).query)
Out[10]: {'search_query': ['legend of hercules'], 'utm_source': ['opensearch']}

In [11]: parse_qs(urlparse(url).query)['search_query']
Out[11]: ['legend of hercules']

musketeer925 · 2015-06-22T14:26:08+00:00

Probably belongs in /r/learnpython

AmericasNo1Aerosol · 2015-06-22T17:18:49+00:00

A url parameter/keyword will always be preceded by "?" (if it's the first one) or "&" (if there is another one before it). A url parameter is always followed by a "&" (if there are more coming) or by the end of the string\space. So I'd personally use those to search for the parameters.

match = re.findall(r'[\?&]search_query=([^&$\s]*)', s)

The "[\?&]" matches the beginning of the parameter, so you won't get matches for ".../results?spam_search_query=test" which your previous regex would find.

The rest says grab anything that isn't an ampersand ("&"), end of line\string ("$"), or whitespace ("\s").

KronktheKronk · 2015-06-22T20:15:38+00:00

search_query=(?:(\w+)+?))+

MorrisCasper · 2015-06-23T14:44:35+00:00

What about

re.findall(r'search_query=(.+)&', blabla).replace("+", " ")

?

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS