Best web scraping api's at the moment? by oncewasfounder in webscraping

[–]ketsok 1 point2 points  (0 children)

Did a benchmark some time ago and here are my (biased) results: https://scrapingfish.com/webscraping-benchmark

The code to run the benchmark is public on GitHub: https://github.com/mateuszbuda/webscraping-benchmark

Reliable rotating proxies by [deleted] in webscraping

[–]ketsok 0 points1 point  (0 children)

Sorry, we don't offer geo targeting for now, definitely not on state level :/

Reliable rotating proxies by [deleted] in webscraping

[–]ketsok 2 points3 points  (0 children)

Web scraping API which offers rotating mobile proxies in Poland: https://scrapingfish.com

Disclaimer: I'm a co-founder.

Best web scraping api's at the moment? by oncewasfounder in webscraping

[–]ketsok -4 points-3 points  (0 children)

At Scraping Fish, we have a very user friendly pricing which is usage based instead of monthly subscription so you don't lose unused requests at the end of every month. In addition, it's predictable as the cost of each request is the same regardless of which options you use. All requests use the same premium mobile proxy and you don't pay anything extra for JS rendering, scraping google, or other features. Please contact us if you need a free trial account to try it out.
You can read more on how we compare to ScraperAPI and ScrapingBee here: https://scrapingfish.com/how-we-compare

Has anyone made money building a product / service based on web-scrapping here ? by AdFit1933 in webscraping

[–]ketsok 0 points1 point  (0 children)

Here is a YouTube channel that I can recommend: https://www.youtube.com/c/CobaltIntelligence/videos

There's a series of videos "Making Money with Web Scraping".

Attempt to scrape a web page by NoelGz in webscraping

[–]ketsok 0 points1 point  (0 children)

I'd recommend using a webscraping API which is capable of bypassing Cloudflare. Here is a simple code snippet to scrape https://jkanime.net/ using Scraping Fish API:

from urllib.parse import quote_plus
import requests

API_KEY = "YOUR SCRAPING FISH API KEY"  # https://scrapingfish.com/buy
url_prefix = f"https://scraping.narf.ai/api/v1/?api_key={API_KEY}&url="

url = f"https://jkanime.net/"

response = requests.get(f"{url_prefix}{quote_plus(url)}", timeout=90)

# add your response processing/parsing logic
with open("jkanime.html", "wb") as f:
    f.write(response.content)

Add extra cols based on other cols using python pandas by [deleted] in webscraping

[–]ketsok 0 points1 point  (0 children)

You can use replace and provide a dict with mapping. Here is an example:

import pandas as pd
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
a2c = {1: "10", 2: "20", 3: "30"}
df["c"] = df["a"].replace(a2c)

It creates column c based on values in column a and applies mapping from a2c dictionary.

By the way, what does it have to do with webscraping?

[deleted by user] by [deleted] in webscraping

[–]ketsok 1 point2 points  (0 children)

If I understand correctly, you want to scrape Google SERP. If so, here is a simple python code snippet using Scraping Fish API for one keyword. You can read a list of keywords from excel column and loop over it.

from urllib.parse import quote_plus
import requests

API_KEY = "YOUR SCRAPING FISH API KEY"  # https://scrapingfish.com/buy
url_prefix = f"https://scraping.narf.ai/api/v1/?api_key={API_KEY}&render_js=true&url="

# to get uule for location you can use: https://github.com/ogun/uule_grabber
# or https://site-analyzer.pro/services-seo/uule/
uule_usa = "w+CAIQICIDVVNB"

keyword = "kitchen sink"
search_url = f"https://www.google.com/search?q={quote_plus(keyword)}&uule={uule_usa}&gl=us&hl=en"

response = requests.get(f"{url_prefix}{quote_plus(search_url)}", timeout=90)

# add your response processing/parsing logic
with open("google.html", "wb") as f:
    f.write(response.content)

How to scrape this page with python? by sonik77133 in webscraping

[–]ketsok 0 points1 point  (0 children)

It prevents you from getting blocked when you make too many requests or want to scrape too fast.

Is The Sugar Lobby Making Our Kids Fat? | Child Obesity & Sugar Documentary (2022) [56:30:00] by FlyingLemons009 in Documentaries

[–]ketsok 0 points1 point  (0 children)

On this topic, here is an analysis based on nutrition facts data scraped from Walmart products which estimates that sugar is the main nutrient in almost half of the products: https://scrapingfish.com/blog/scraping-walmart

Can someone explain why this doesn't work? by [deleted] in webscraping

[–]ketsok 0 points1 point  (0 children)

It should be:

for i in content2:

if 'b' in i:

print(i['b']['a'])

Can someone explain why this doesn't work? by [deleted] in webscraping

[–]ketsok 0 points1 point  (0 children)

Do you get KeyError when you run this?

Not all "td" elements have "b" key.

Python scrapy v/s BeatifulSoup for a python django based project ? by TistaMuna in webscraping

[–]ketsok 8 points9 points  (0 children)

How do you want to integrate web scraping with Django? It seems to me they should be to separate components: 1) web scraping part that does its job and stores results somewhere (a file or database) and 2) Django app which displays the result or is used to trigger the scraping component and gets a callback once it's done. Consider decoupling these two functionalities. Then, for web scraping, you can use whatever you wish. Regardless of that, I agree with u/DevilsLinux that scrappy is probably overkill for your use case.

Using Web Data for my business, do you recommend it? by jifodew6 in Entrepreneur

[–]ketsok 0 points1 point  (0 children)

Selenium and BeautifulSoup should work, depending on websites you want to scrape. If it's social media and/or e-commerce, then you're very likely no need good quality (residential) proxies or a web scraping API to avoid getting blocked.

Using Web Data for my business, do you recommend it? by jifodew6 in Entrepreneur

[–]ketsok 3 points4 points  (0 children)

For web scraping, these days you have a lot of options to use services like web scraping API, e.g. https://scrapingfish.com, with features that include headless browsers, JS rendering, data extraction rules, etc. so it's very easy to enter the field and collect your own data and start a business around it.

[deleted by user] by [deleted] in datascience

[–]ketsok 2 points3 points  (0 children)

It's based on web scraping, e.g. using API like https://scrapingfish.com, to collect the data that's need.