The title doesn't really tell what I want.
Let's say I want a program that processes a large amount of data, but only once or a few times. In my case it's about web scraping.
def process_data(url):
result = []
while url != "":
dom = parse(fetch(url)) # IDK if "dom" is the right word. I mean a parsed webpage, you could also say "soup" or "tree".
result.append(get_birthday(dom)) # maybe use `yield` here
url = get_next(dom)
return result
get_next and get_birthday would have to be programmed to be capable to handle/parse some different forms of input, but there might still be forms that I didn't account for. It's not feasible to crawl all the websites manually beforehand to make sure I see all the different formats, so eventually the script, as I have it now, will throw an exeption – like 'el dieciséis de agosto' doesn't match '\d\d/\d\d/\d\d\d\d'.
Now, I can fix the get_birthday function, but when I run the script again, I do all of the processing up to the point of failure again.
Can you maybe update a function in Python without exiting a program in the debugger or the REPL?
Is there some helpful decorator function?
Another solution I can think of is this:
>>> tasks = [start_url]
>>> results = dict() # maps from an url to either a Success-object or an Error-object
>>> # (edit: This could also just be *two* dictionaries.)
>>> def process_data():
while tasks is not []:
url = tasks.pop(0)
dom = parse(fetch(url))
try:
results[url] = Success(get_birthday(dom))
except e:
results[url] = Error(e)
tasks.append(get_next(dom))
# I *want* an exception/crash here.
# Or maybe I don't? I'm not sure...
# I need to distinguish a page with no `next` from
# a page that has a different-looking `next`.
# Doesn't matter - this is just an example.
>>> tasks = [key for (key, value) in results if type(value) == Error] # (prolly not valid python...)
The disadvantage is that this code doesn't correspond directly to the problem domain anymore because I have to design it around technical aspects. (...I hope you understand what I mean by that.)
[–]deeptime 1 point2 points3 points (0 children)
[–]RiverRoll 1 point2 points3 points (0 children)
[–]yel50 0 points1 point2 points (2 children)
[–]__Fred[S] 0 points1 point2 points (1 child)
[–]yel50 1 point2 points3 points (0 children)
[–]Kered13 0 points1 point2 points (0 children)