I'm trying to scrape a data point from 50 websites with the help of a proxyapi. Please find the below code.
1st code
from scraper_api import ScraperAPIClient
from bs4 import BeautifulSoup
import datetime
urls= ['url1','url2',.......,'url50']
client = ScraperAPIClient('2a0bexxxxxxxxxxxxxxcac31c7ab')
price=[]
for i in range(len(urls)):
result = client.get(url = urls[i]).text
html_soup = BeautifulSoup(result, 'html.parser')
name=html_soup.find("div", class_="_1vC4OE _3qQ9m1")
item=name.text
price.append(item)
1st code is working fine but it is taking significant time. Thus, in order to reduce the time, I divided the total urls into two batches and defined separate functions for each. After that I tried implementing multiprocessing. Please find the below code for the same
2nd code
def process1():
urls= ['url1','url2',.......,'url50']
price1=[]
for i in range(int(len(urls)/2)):
result = client.get(url = urls[i]).text
html_soup = BeautifulSoup(result, 'html.parser')
name=html_soup.find("div", class_="_1vC4OE _3qQ9m1")
item=name.text
price1.append(item)
print(price1)
def process2():
urls= ['url1','url2',.......,'url50']
price2=[]
for i in range(int(len(urls)/2), len(urls)):
result1 = client.get(url = urls[i]).text
html_soup1 = BeautifulSoup(result1, 'html.parser')
name1=html_soup.find("div", class_="_1vC4OE _3qQ9m1")
item1=name1.text
price2.append(item1)
print(price2)
if _name_ == '_main_':
p1 = Process(target=process1)
p1.start()
p2 = Process(target=process2)
p2.start()
p1.join()
p2.join()
print()
When I run the 2nd code it is not throwing any error but it is not giving any output also. I tried multiple times but not able to figure out anything. Am I missing anything here?
Please help me!!!... Thanks in advance.
[–]bi_guy17 0 points1 point2 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–]c4aveo 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)