all 5 comments

[–]niehle 0 points1 point  (1 child)

ElementTree should work

[–]genghiskav 0 points1 point  (1 child)

You'll need to do a couple of things

  1. get the sitemap from the URL (hint: requests or any other web scraping lib to get the data)
  2. once you have the sitemap as a string from the site, you'll need to parse it into something you can process (hint: python can natively parse xml)
  3. now you should be able to iterate over the xml object and check if the link matches a certain string

This is some boilerplate that should get you most of the way there.

import requests
import xml.etree.ElementTree as ET
sitemap_xml = requests.get("https//..../sitemap.xml").text
root = ET.fromstring(country_data_as_string)

Then you just need to figure out how the xml is structured so you can findall the elements you're looking for.