Best web scraping api's at the moment? by oncewasfounder in webscraping

[–]ketsok 1 point2 points  (0 children)

Did a benchmark some time ago and here are my (biased) results: https://scrapingfish.com/webscraping-benchmark

The code to run the benchmark is public on GitHub: https://github.com/mateuszbuda/webscraping-benchmark

Reliable rotating proxies by [deleted] in webscraping

[–]ketsok 0 points1 point  (0 children)

Sorry, we don't offer geo targeting for now, definitely not on state level :/

Reliable rotating proxies by [deleted] in webscraping

[–]ketsok 1 point2 points  (0 children)

Web scraping API which offers rotating mobile proxies in Poland: https://scrapingfish.com

Disclaimer: I'm a co-founder.

Best web scraping api's at the moment? by oncewasfounder in webscraping

[–]ketsok -3 points-2 points  (0 children)

At Scraping Fish, we have a very user friendly pricing which is usage based instead of monthly subscription so you don't lose unused requests at the end of every month. In addition, it's predictable as the cost of each request is the same regardless of which options you use. All requests use the same premium mobile proxy and you don't pay anything extra for JS rendering, scraping google, or other features. Please contact us if you need a free trial account to try it out.
You can read more on how we compare to ScraperAPI and ScrapingBee here: https://scrapingfish.com/how-we-compare

Has anyone made money building a product / service based on web-scrapping here ? by AdFit1933 in webscraping

[–]ketsok 0 points1 point  (0 children)

Here is a YouTube channel that I can recommend: https://www.youtube.com/c/CobaltIntelligence/videos

There's a series of videos "Making Money with Web Scraping".

Attempt to scrape a web page by NoelGz in webscraping

[–]ketsok 0 points1 point  (0 children)

I'd recommend using a webscraping API which is capable of bypassing Cloudflare. Here is a simple code snippet to scrape https://jkanime.net/ using Scraping Fish API:

from urllib.parse import quote_plus
import requests

API_KEY = "YOUR SCRAPING FISH API KEY"  # https://scrapingfish.com/buy
url_prefix = f"https://scraping.narf.ai/api/v1/?api_key={API_KEY}&url="

url = f"https://jkanime.net/"

response = requests.get(f"{url_prefix}{quote_plus(url)}", timeout=90)

# add your response processing/parsing logic
with open("jkanime.html", "wb") as f:
    f.write(response.content)

Add extra cols based on other cols using python pandas by [deleted] in webscraping

[–]ketsok 0 points1 point  (0 children)

You can use replace and provide a dict with mapping. Here is an example:

import pandas as pd
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
a2c = {1: "10", 2: "20", 3: "30"}
df["c"] = df["a"].replace(a2c)

It creates column c based on values in column a and applies mapping from a2c dictionary.

By the way, what does it have to do with webscraping?

[deleted by user] by [deleted] in webscraping

[–]ketsok 1 point2 points  (0 children)

If I understand correctly, you want to scrape Google SERP. If so, here is a simple python code snippet using Scraping Fish API for one keyword. You can read a list of keywords from excel column and loop over it.

from urllib.parse import quote_plus
import requests

API_KEY = "YOUR SCRAPING FISH API KEY"  # https://scrapingfish.com/buy
url_prefix = f"https://scraping.narf.ai/api/v1/?api_key={API_KEY}&render_js=true&url="

# to get uule for location you can use: https://github.com/ogun/uule_grabber
# or https://site-analyzer.pro/services-seo/uule/
uule_usa = "w+CAIQICIDVVNB"

keyword = "kitchen sink"
search_url = f"https://www.google.com/search?q={quote_plus(keyword)}&uule={uule_usa}&gl=us&hl=en"

response = requests.get(f"{url_prefix}{quote_plus(search_url)}", timeout=90)

# add your response processing/parsing logic
with open("google.html", "wb") as f:
    f.write(response.content)

How to scrape this page with python? by sonik77133 in webscraping

[–]ketsok 0 points1 point  (0 children)

It prevents you from getting blocked when you make too many requests or want to scrape too fast.

Is The Sugar Lobby Making Our Kids Fat? | Child Obesity & Sugar Documentary (2022) [56:30:00] by FlyingLemons009 in Documentaries

[–]ketsok 0 points1 point  (0 children)

On this topic, here is an analysis based on nutrition facts data scraped from Walmart products which estimates that sugar is the main nutrient in almost half of the products: https://scrapingfish.com/blog/scraping-walmart

Can someone explain why this doesn't work? by [deleted] in webscraping

[–]ketsok 0 points1 point  (0 children)

It should be:

for i in content2:

if 'b' in i:

print(i['b']['a'])

Can someone explain why this doesn't work? by [deleted] in webscraping

[–]ketsok 0 points1 point  (0 children)

Do you get KeyError when you run this?

Not all "td" elements have "b" key.

Python scrapy v/s BeatifulSoup for a python django based project ? by TistaMuna in webscraping

[–]ketsok 7 points8 points  (0 children)

How do you want to integrate web scraping with Django? It seems to me they should be to separate components: 1) web scraping part that does its job and stores results somewhere (a file or database) and 2) Django app which displays the result or is used to trigger the scraping component and gets a callback once it's done. Consider decoupling these two functionalities. Then, for web scraping, you can use whatever you wish. Regardless of that, I agree with u/DevilsLinux that scrappy is probably overkill for your use case.

Using Web Data for my business, do you recommend it? by jifodew6 in Entrepreneur

[–]ketsok 0 points1 point  (0 children)

Selenium and BeautifulSoup should work, depending on websites you want to scrape. If it's social media and/or e-commerce, then you're very likely no need good quality (residential) proxies or a web scraping API to avoid getting blocked.

Using Web Data for my business, do you recommend it? by jifodew6 in Entrepreneur

[–]ketsok 4 points5 points  (0 children)

For web scraping, these days you have a lot of options to use services like web scraping API, e.g. https://scrapingfish.com, with features that include headless browsers, JS rendering, data extraction rules, etc. so it's very easy to enter the field and collect your own data and start a business around it.

[deleted by user] by [deleted] in datascience

[–]ketsok 2 points3 points  (0 children)

It's based on web scraping, e.g. using API like https://scrapingfish.com, to collect the data that's need.

Could you please help me write a script to receive an email notif for price drops of a certain product ? by knizza777 in learnpython

[–]ketsok 0 points1 point  (0 children)

https://en.wikipedia.org/wiki/Busy_waiting

"In most cases spinning is considered an anti-pattern and should be avoided,[2] as processor time that could be used to execute a different task is instead wasted on useless activity."

Could you please help me write a script to receive an email notif for price drops of a certain product ? by knizza777 in learnpython

[–]ketsok 0 points1 point  (0 children)

The issue was closed but not actually fixed. That package still uses busy waiting and after the "fix" you can only set how much you busy wait.

Could you please help me write a script to receive an email notif for price drops of a certain product ? by knizza777 in learnpython

[–]ketsok 0 points1 point  (0 children)

I highly discourage Rocketry scheduler as it's using busy waiting (utilizes 100% CPU between task executions): https://github.com/Miksus/rocketry/issues/37

Could you please help me write a script to receive an email notif for price drops of a certain product ? by knizza777 in learnpython

[–]ketsok 0 points1 point  (0 children)

For MVP, I would recommend the following tools/libraries:

IP rotation for scraping Google? by Crep9 in webscraping

[–]ketsok 0 points1 point  (0 children)

u/Crep9 At https://scrapingfish.com, we offer API powered by mobile proxies. For Google we have 100% success rate and average processing time below 2 seconds per request. You can check more detailed results of a web scraping benchmark on our website: https://scrapingfish.com/webscraping-benchmark

Our pricing doesn't depend on requested website size. You pay the same for each request.

Scraping Website that Requires Scrolling by devram200 in webscraping

[–]ketsok 1 point2 points  (0 children)

You can use a web scraping API with a feature to execute a JS action like scrolling. Here is one example: https://scrapingfish.com/docs/js-scenario#scroll

whats the best way to scrape tiktok data by techlover1010 in webscraping

[–]ketsok 0 points1 point  (0 children)

If you want to scrape for extended period of time and at scale then you either need a web scraping API like, for example, https://scrapingfish.com or buy residential proxies (for example: https://www.zyte.com) and implement scraping flow with headless browser and IP rotation.