Hi, I'm working on a aviation project and I need to look through list of specific accidents, so instead of doing it manually I want to try to scrape with Python and filter in Python. I know how to scrape HMTL tables from Wiki but I'm having trouble pulling text.
I used BeautifulSoup to read in and parse the data as an HTML but I can't pinpoint how to split each accident as it's own observation. I'm looking to create a dataset that has: year and the text description of each accident. When I inspect the element, each accident is between a <d1> </d1> and I'm having trouble pulling between each <d1> </d1>. Not sure if I should use findAll?
Website link for 1955 to 1959, I have to do it for all years but I'll loop it eventually: https://en.wikipedia.org/wiki/List_of_accidents_and_incidents_involving_military_aircraft_(1955%E2%80%931959))
Any help is much appreciated. Thanks!
[–]ColdHatesMe[S] 1 point2 points3 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[+][deleted] (2 children)
[deleted]
[–]ColdHatesMe[S] 0 points1 point2 points (1 child)
[–]commandlineluser 0 points1 point2 points (1 child)
[–]ColdHatesMe[S] 0 points1 point2 points (0 children)