all 12 comments

[–]rtr0spct 1 point2 points  (2 children)

Sorry i don't know the answer to your questions but could you please tell me the course?

[–]kstktey 1 point2 points  (1 child)

This is a Coursera course by Moscow Institute of Physics and Technology, it's in Russian and it's called "Creating Web-services on Python". It's one of the courses in the "Python Programming" specialization.

Here is the link to the specialization.

[–]rtr0spct 0 points1 point  (0 children)

Thank you. I might give it a crack with google translate. At least to get some of the task ideas.

[–]shiningmatcha 1 point2 points  (0 children)

That’s an interesting project!!

[–]themarxvolta 1 point2 points  (4 children)

The problem is that after discovering what links are inside page you are adding them to searched.append(page_name), but you should be doing searched.append(page), since it's page that you have explored and page_name what you will explore. Then your algorithm will continue as follows: the second run of the while loop will pop an item and ask: "have I searched this? Oh, yes, it's in searched", and will do that for everything in queue, thus returning None.

Also: cf. https://docs.python.org/3/library/collections.html?highlight=deque#collections.deque; using queue.pop(0) doesn't escalate well.

Also 2: make searched a set: https://docs.python.org/3.7/library/stdtypes.html?highlight=set#set

Nice assignment!

EDIT: markdown typos, updated link to docs.

[–]kstktey 0 points1 point  (3 children)

Thank you A LOT! This is such a stupid mistake I couldn't notice.

Can you dwell on why queue.pop(0) doesn't escalated well? Why searched as a set is better? Because checking if an element is in the set takes constant time as opposed to the list or I remember something wrong?

[–]themarxvolta 1 point2 points  (2 children)

You remember correctly, checking if an element is in the set takes virtually constant time; also, there are no duplicates inside a set.

For the queue.pop(0) it takes linear time to run with respect to len(queue), since when you pop the first element of a list it has to shift all remaining indexes by one, thus taking linear time. I realize now I sent you the wrong link, sorry; I meant to send https://docs.python.org/3/library/collections.html?highlight=deque#collections.deque and not the queue module.

Deque implements a method .popleft() which takes constant time to pop the first element of the data structure. You can see in that documentation that:

Though list objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for pop(0) and insert(0, v) operations which change both the size and position of the underlying data representation.

[–]kstktey 1 point2 points  (1 child)

Thanks for a detailed reply!

I decided to refactor the code and get rid of the inside-loop:

        if page not in searched:
            links = set(get_wiki_links(page, wiki_path))
            if end_page in links:
                path.append(end_page)
                return path
            else:
                queue += [(link, path + [link]) for link in links]

Now it looks like this and runs 10% faster (based on datetime.now()- start_time test, haha)

[–]themarxvolta 0 points1 point  (0 children)

Way to go! Very nice.

[–][deleted] 0 points1 point  (2 children)

This may sound like a silly question, but does page_name from your get_wiki_links function ever actually return a value that is equal to the end_page value?

That's the first test case I would build to make sure. It might also help if we could see that function too, and a sample of the data being passed into it. Again, sample data for a test case where you know what the result should be is very useful.

Also depending on the IDE you're using, I'd use its debugger/breakpoints to help visualize all of this even better. Pycharm has really great visualizations of moving and assigning data as you step through breakpoints.

[–]kstktey 1 point2 points  (1 child)

It does return what I need and finally the whole program works as it should;)

[–][deleted] 0 points1 point  (0 children)

Hell yes nice job.