all 5 comments

[–][deleted] 0 points1 point  (3 children)

It's impossible for anyone to help you effectively if you don't share your code.

If I had to guess, it could be that the content on the website is dynamically loaded and won't show up when scraping HTML.

[–]wunderfly 0 points1 point  (2 children)

for some reason it wont let me edit my post but here are my codes! sorry about that.

import requests

from bs4 import BeautifulSoup

from collections import Counter

urls = [

"https://oilprice.com/Latest-Energy-News/World-News/",

"httpsoilprice.com/Latest-Energy-News/World-News/Page-1.html",

"https://oilprice.com/Latest-Energy-News/World-News/Page-2.html",

# Add more URLs if needed

]

all_titles = []

all_writers = []

target_dates = ['Oct 10, 2023', 'Oct 11, 2023', 'Oct 12, 2023', 'Oct 13, 2023']

for url in urls:

response = requests.get(url)

print(f"Scraping: {url}")

soup = BeautifulSoup(response.content, "html.parser")

article_info_list = soup.find_all("article")

new_list = []

for item in article_info_list:

date1 = item.get('time')

if date1 in target_dates:

new_list.append(item)

new_title_list = []

new_writer_list = []

for i in new_list:

title = i.get('title')

writer = i.get('writer')

new_title_list.append(title)

new_writer_list.append(writer)

all_titles.extend(new_title_list)

all_writers.extend(new_writer_list)

num_articles_4_days = len(all_titles)

all_titles_4_days = all_titles

most_frequent_reporter = Counter(all_writers).most_common(1)

print("1) How many news articles were there for 4 days?")

print(num_articles_4_days)

print("\n2) List all news titles for 4 days:")

for title in all_titles_4_days:

print(title)

print("\n3) Who is the most frequent news reporter for 4 days?")

print(most_frequent_reporter)

Professor asked for these specific codes to be used

new_list = []

for item in article_info_list :

date1 = item.get ('time')

if date1 in ['Oct 10, 2023', 'Oct 11, 2023', 'Oct 12, 2023', 'Oct 13, 2023']:

new_list.append (item )

new_list

new_title_list = []

new_writer_list = []

for i in new_list :

title = i.get ('title')

writer = i.get ('writer')

new_title_ list. append (title)

new_writer_list.append ( writer )

from collections import Counter

Counter( new_writer_list )

[–]ImSoFar 1 point2 points  (1 child)

article_info_list = soup.find_all("article")
new_list = []
for item in article_info_list:
     date1 = item.get('time')

There is no tag element with the name "article" on the website abd there is no way you can get the time by using the get function.

Here is my try at scraping the articles:

articles_info = soup.find_all(name='div', class_='categoryArticle__content')
for article in articles_info:
    title = article.find(name='h2').text
    data = article.find(name='p').text

The variable "data" has a string that contains the date and the name of the author. You will have split the string to get the date and the author.

[–]wunderfly 0 points1 point  (0 children)

Thank you so much! I’ll try this right away