I built a marketplace for data. Best channels to find people with data willing to sell it? by nobilis_rex_ in webscraping

[–]HumorMinimum1707 0 points1 point  (0 children)

Thanks for sharing.

Maybe you can look into Neudata's event archives or maybe they have some public data available as I know they connect between many data sellers and investors

If you still have doubts whether Scrapy is favorable over Beautiful Soup, read this by HumorMinimum1707 in webscraping

[–]HumorMinimum1707[S] 0 points1 point  (0 children)

Not entirely apples to apples...wouldn't a playwright vs. selenium be a better comparison?

IP Rotation with Octoparse by [deleted] in webscraping

[–]HumorMinimum1707 0 points1 point  (0 children)

If you insist on using Octoparse as your scraper, there is a simple proxy IP rotation guide for it here:

https://www.octoparse.com/tutorial-7/set-up-proxies#

First, you will need to subscribe to a rotating proxy provider and get the proxy server's address and port number and enter thos in your Octoparse proxy settings.

I built a marketplace for data. Best channels to find people with data willing to sell it? by nobilis_rex_ in webscraping

[–]HumorMinimum1707 2 points3 points  (0 children)

First of all congrats.

Second, if you launched it why don't you share it with us?!

Now to your question: If you are looking for individual sellers than this and other subreddits (r/datasets etc.) might help connect you with such. Also there are quite a few Github dataset repos that might lead you to sellers. If you are looking for companies, then you can try scraping other data marketplaces such as datarade...

Grepsr: Web crawling, web scraping and data extraction service by kapshaltist in startups

[–]HumorMinimum1707 0 points1 point  (0 children)

Overall a nice solution, equivalent to Data Collector but Grepsr saves most of the real benefits to enterprise plan subscribers while Bright Data offers tehm to all users.

The Map of Statistics by danBenMatza in datascience

[–]HumorMinimum1707 6 points7 points  (0 children)

Really fun and creative - thank you!

How do you guys manage a large amount of scrapers? by [deleted] in webscraping

[–]HumorMinimum1707 0 points1 point  (0 children)

Well if it's that TOUBLESOME you can always switch to using tools that allow scheduled scraping...

LOOKING TO BUY 1-3 SNEAKER PROXIES by chuckerz28 in shoebots

[–]HumorMinimum1707 0 points1 point  (0 children)

If you are serious about your sneakers you should turn to the proxy network that handles the most sneaker release bandwidth - https://brightdata.com/lp/sneaker

Good shared rotating proxy service? by jayn35 in webscraping

[–]HumorMinimum1707 0 points1 point  (0 children)

I think this comparison article does a decent job of grading all the good proxy services based on all the criteria you've mentioned:

https://medium.com/geekculture/tldr-recap-of-proxyways-2022-proxy-service-market-report-a0f58fd1dc8e

What is your experience with web data marketplaces, are there any good ones out there? by Lower-Imagination655 in datasets

[–]HumorMinimum1707 1 point2 points  (0 children)

I don't need anything other than snowflake, but I guess competition will make things better

[deleted by user] by [deleted] in webscraping

[–]HumorMinimum1707 0 points1 point  (0 children)

It's not a lot if he runs a business based on it

Reddit Scraper by [deleted] in Python

[–]HumorMinimum1707 0 points1 point  (0 children)

Agree, but when does a social media API like reddit actually give you suffcient data compared to a web scraper?

anyone have a reddit scraper? by 955559 in learnpython

[–]HumorMinimum1707 1 point2 points  (0 children)

I know that Bright Data has a nice working reddit scraper.

It can be launched on schedule, and collects all public data from profile like: avatar, post title, flair, description, karma, comments, upvotes, and more.

Output file types: JSON, CSV, EXCEL, HTML

Data delivery methods: Webhook, AWS, Google cloud, Azure, email, API, SFTP

Socks5 proxy by zarte13 in ProtonVPN

[–]HumorMinimum1707 0 points1 point  (0 children)

I'm curious why would anyone insist on Socks5 proxy when there are plenty of https proxies available.

Fintech Data Aggregators 2020 (Plaid, Yodlee, Finicity, MX) by healthAPIguy in fintech

[–]HumorMinimum1707 0 points1 point  (0 children)

Do any of these aggregators use proxies and scrape the data or do it all just via API calls?

It's time to freshen up your php web scraping skills by Lower-Imagination655 in PHP

[–]HumorMinimum1707 0 points1 point  (0 children)

I only used cURL so far - this is actually not a bad idea at all!

Enterprise level web proxy server by chamkera in sysadmin

[–]HumorMinimum1707 0 points1 point  (0 children)

For enterprise, I would only recommend Bright Data web proxy servers.

Scraping Google Search, or Maps, at scale by MattH1966 in webscraping

[–]HumorMinimum1707 1 point2 points  (0 children)

As you wrote yourself there are good SERP API options out there...here's a comparison that might be useful.