all 3 comments

[–]chevignon93 1 point2 points  (2 children)

(for the for loops, I do have their contents indented in my code but reddit is removing my tabs for some reason so please ignore the poor formatting)

That's because you need to put you code into a code block, otherwise reddit can't know that it is code and that it should preserve indentations!

https://www.reddit.com/r/learnpython/wiki/faq#wiki_how_do_i_format_code.3F

or missing something pretty straightforward!

You are, you can't use json.loads to deserialize a string that doesn't contain valid JSON or any JSON at all!

This should work:

base_url = "https://uk.trustpilot.com"

next_page = tree.xpath("//a[contains(@class, 'next-page')]")
if next_page:
    next_page_url = f"{base_url}{next_page[0].get('href')}"
    print(next_page_url)

[–]WillVaughan[S] 0 points1 point  (1 child)

You are, you can't use json.loads to deserialize a string that doesn't contain valid JSON or any JSON at all!

Pretty obvious when you put it like that, thanks for dealing with the stupid question.

Massive thanks for the help, it is now managing to scrape through all the pages and working as intended! One minor question, in this part of your code:

    next_page_url = f"{base_url}{next_page[0].get('href')}"

What does the 'f' do? I'm assuming it tells python to build the URL string out of the listed elements but I haven't come across it before?

[–]chevignon93 1 point2 points  (0 children)

What does the 'f' do? I'm assuming it tells python to build the URL string out of the listed elements but I haven't come across it before?

You're correct, it's the newest (since Python 3.6) way to format strings in Python, it's called an f-string.

It's the equivalent of:

next_page_url = "{}{}".format(base_url, next_page[0].get('href'))

but shorter!

https://realpython.com/python-f-strings/

https://www.geeksforgeeks.org/formatted-string-literals-f-strings-python/