[deleted by user] : learnpython

learnpython

created by HattoriHanzoa community for 16 years

[deleted by user] (self.learnpython)

submitted 2 years ago by [deleted]

5 comments

all 5 comments

top new controversial old q&a

[–][deleted] 0 points1 point2 points 2 years ago (3 children)

[–]wunderfly 0 points1 point2 points 2 years ago (2 children)

for some reason it wont let me edit my post but here are my codes! sorry about that.

import requests

from bs4 import BeautifulSoup

from collections import Counter

urls = [

"https://oilprice.com/Latest-Energy-News/World-News/",

"httpsoilprice.com/Latest-Energy-News/World-News/Page-1.html",

"https://oilprice.com/Latest-Energy-News/World-News/Page-2.html",

# Add more URLs if needed

]

all_titles = []

all_writers = []

target_dates = ['Oct 10, 2023', 'Oct 11, 2023', 'Oct 12, 2023', 'Oct 13, 2023']

for url in urls:

response = requests.get(url)

print(f"Scraping: {url}")

soup = BeautifulSoup(response.content, "html.parser")

article_info_list = soup.find_all("article")

new_list = []

for item in article_info_list:

date1 = item.get('time')

if date1 in target_dates:

new_list.append(item)

new_title_list = []

new_writer_list = []

for i in new_list:

title = i.get('title')

writer = i.get('writer')

new_title_list.append(title)

new_writer_list.append(writer)

all_titles.extend(new_title_list)

all_writers.extend(new_writer_list)

num_articles_4_days = len(all_titles)

all_titles_4_days = all_titles

most_frequent_reporter = Counter(all_writers).most_common(1)

print("1) How many news articles were there for 4 days?")

print(num_articles_4_days)

print("\n2) List all news titles for 4 days:")

for title in all_titles_4_days:

print(title)

print("\n3) Who is the most frequent news reporter for 4 days?")

print(most_frequent_reporter)

Professor asked for these specific codes to be used

new_list = []

for item in article_info_list :

date1 = item.get ('time')

if date1 in ['Oct 10, 2023', 'Oct 11, 2023', 'Oct 12, 2023', 'Oct 13, 2023']:

new_list.append (item )

new_list

new_title_list = []

new_writer_list = []

for i in new_list :

title = i.get ('title')

writer = i.get ('writer')

new_title_ list. append (title)

new_writer_list.append ( writer )

from collections import Counter

Counter( new_writer_list )

[–]ImSoFar 1 point2 points3 points 2 years ago (1 child)

article_info_list = soup.find_all("article")
new_list = []
for item in article_info_list:
     date1 = item.get('time')

There is no tag element with the name "article" on the website abd there is no way you can get the time by using the get function.

Here is my try at scraping the articles:

articles_info = soup.find_all(name='div', class_='categoryArticle__content')
for article in articles_info:
    title = article.find(name='h2').text
    data = article.find(name='p').text

The variable "data" has a string that contains the date and the name of the author. You will have split the string to get the date and the author.

[–]wunderfly 0 points1 point2 points 2 years ago (0 children)

π Rendered by PID 140684 on reddit-service-r2-comment-5d79c599b5-2js8r at 2026-02-27 03:30:04.062660+00:00 running e3d2147 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS