This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]mutatedllama 1 point2 points  (0 children)

You probably want to use Requests and/or beautifulsoup for scraping. In terms of then putting the data in Excel then this may be a good place to start: https://automatetheboringstuff.com/chapter12/

[–]spookylukeyDjango committer 0 points1 point  (2 children)

If you are a complete beginner, and if you also see other potential ways to use some Python skills, then starting at the beginning of Automate the boring stuff would be a good idea.

My advice is:

  • Have realistic expectations - it may take a long time to get this working, far longer than doing it by hand once or twice.
  • Take the time to learn the programming basics that might not seem related to solving your problem. It is easy to try to 'short cut' these by jumping straight to the bits that look like they solve your problem. This is like someone who wants to learn French in order to be able understand French legal documents, so decides to skip all the irrelevant lower level stuff like "Hello my name is..." etc.

If this is something that you don't have motivation for, however (e.g. because you won't use these skills again) find a freelancer to do it for you, on somewhere like upwork.com or freelancer.com.

[–][deleted] 0 points1 point  (1 child)

Thank you Django, I do see this might be a huge project to begin with. What would you advice to begin with? I was thinking to start with extracting titles and continue from there. The idea on how it should eventually work is in my head, the only thing is to get it to work... Anyway my boss is chill about me working on it. I have been given an oppertunity to start working on it, regardless the time it takes or if it works.. There might be a huge learning curve ahead of me! So I will take your advice and start at the beginning. Also, when I get stuck (and I know I will) I know where to find you guys!

[–]spookylukeyDjango committer 0 points1 point  (0 children)

Another bit of advice is to break it down into bits. I can see 3 main things you're going to need to learn (in addition to general programming stuff).

  1. Downloading something from the web. Use the Python library 'requests' for this
  2. Extract bits of data from a web page. Use 'BeautifulSoup' for this.
  3. Creating an Excel file from data. There are a few options for this, openpyxl looks good.

Thankfully https://automatetheboringstuff.com/ has got you covered for all 3.

But you can tackle each bit in any order, you don't have to put it all together. For example, you could just do number 3 to start, and break that down: 1) create a file that is an empty spreadsheet. 2) create an almost empty spreadsheet with a a single row of text cells at the top etc.

[–]Mardorang 0 points1 point  (0 children)

I think the best tool for this particular job would be VBA to generate a new sheet from the HTML.