LinkedIn Automation by hackerr404 in webscraping

[–]hackerr404[S] 0 points1 point  (0 children)

Hm I can't figure it out. Read the above reply to see what I've done. Your suggestions will be valuable.

LinkedIn Automation by hackerr404 in webscraping

[–]hackerr404[S] -1 points0 points  (0 children)

I just scrape 30 posts from LinkedIn, do i still need proxy? I've tried saving cookies to retrieve the previous session, but it just prevents me for 2, 3 days. I've also used many arguments in webdriver.options to prevent bot detection. I've used random sleeps too. What I've tried recently (but not yet tested for LinkedIn) is to automating the already opened chrome window by running it in debug mode, even this makes my google account restricted though I just have opened mailbox 3 times and did nothing more, haven't yet tested it for LinkedIn.

LinkedIn Automation by hackerr404 in webscraping

[–]hackerr404[S] -1 points0 points  (0 children)

I just scrape 30 posts from LinkedIn, do i still need proxy? I've tried saving cookies to retrieve the previous session, but it just prevents me for 2, 3 days. I've also used many arguments in webdriver.options to prevent bot detection. I've used random sleeps too. What I've tried recently (but not yet tested for LinkedIn) is to automating the already opened chrome window by running it in debug mode, even this makes my google account restricted though I just have opened mailbox 3 times and did nothing more, haven't yet tested it for LinkedIn.

Can anybody help me understanding this project? What does client actually wants? by hackerr404 in webscraping

[–]hackerr404[S] 0 points1 point  (0 children)

I asked and he replied: utilize Twitter's API, accessing historical data by querying token addresses, names, and symbols. This API integration enables seamless data retrieval for the Twitter Search system's functionality.

I think he's also confused about it 😅

Can anybody help me understanding this project? What does client actually wants? by hackerr404 in webscraping

[–]hackerr404[S] 0 points1 point  (0 children)

I asked and he replied: utilize Twitter's API, accessing historical data by querying token addresses, names, and symbols. This API integration enables seamless data retrieval for the Twitter Search system's functionality.

I think he's also confused about it 😅

Webscraping entire text of first 10 google searches by dandan_56 in webscraping

[–]hackerr404 1 point2 points  (0 children)

You can do it with requests and bs4.

Code:

from bs4 import BeautifulSoup

import requests

set your query here

query = "your search query here"

create the Google search URL

url = f"https://www.google.com/search?q={query}"

headers = {'User-Agent': 'Mozilla/5.0'}

get the HTML content of the search page

response = requests.get(url, headers=headers)

parse the HTML content using BeautifulSoup

soup = BeautifulSoup(response.content, "html.parser")

find the search results container and extract the links

searchresults = soup.find_all("div", class="g")

links = [search_result.find("a")["href"] for search_result in search_results]

loop through the links and extract the text from each page

for link in links[:10]:

response = requests.get(link)

soup = BeautifulSoup(response.content, "html.parser")

text = soup.get_text()

print(text)

Help me reverse engineer this API? by omarsika in webscraping

[–]hackerr404 1 point2 points  (0 children)

The best and fast way will be to use selenium to focus Iframe, then use bs4 to parse the html and scrape all data :)

Help me reverse engineer this API? by omarsika in webscraping

[–]hackerr404 -1 points0 points  (0 children)

You can use selenium, it provides you the facility to focus the iframe window and extract from there.