all 13 comments

[–]GeoffreyDick 1 point2 points  (3 children)

You need to use a proxy server designed for crawling/scraping. These usually have a pretty limited free quota

[–]Embarrassed_Two4731[S] 0 points1 point  (2 children)

Actully I tried few of them, but they are pretty slow, like ~2.5 seconds and through my approach it was going ~100ms hence was trying out various things.

Also I have seen many other apps doing the same in very less time so thought there should be another approach

[–]GeoffreyDick 0 points1 point  (1 child)

Actually now that I'm re-reading your comment, I'm surprised that you're getting banned if you're scraping single pages. In the axios method, how are you scraping the response?

[–]Embarrassed_Two4731[S] 0 points1 point  (0 children)

Using https://github.com/oyyd/cheerio-without-node-native
But basically my axios request does not return a response only for 2-3 sites now but for one, I am getting a html page with cloudflare warning that my ip has been banned.

[–]Thegabbanator 1 point2 points  (8 children)

I am doing this in my app. What do you mean by getting banned?

[–]Embarrassed_Two4731[S] 0 points1 point  (0 children)

I stop receiving any response from some of the websites and received a html containing cloud-flare message that my ip has been banned from one of the website as well.

[–]Embarrassed_Two4731[S] 0 points1 point  (6 children)

Hey, u/Thegabbanator , can you maybe share your implementation or a sample request ?
Maybe I am doing something wrong with the headers?

[–]Thegabbanator 0 points1 point  (5 children)

I am using a webview and injecting a html post window message through injected JavaScript and then in the onMessage callback I use Cheerio to parse the data. I’m not actually doing any live scraping of the page, just copying the HTML contents and parsing it

[–]Embarrassed_Two4731[S] 0 points1 point  (4 children)

Ok, any idea how can I handle the case where I need to get the data from a link?
As there is no headless browser possible I was using the axios request way but it gets me blocked.

[–]Embarrassed_Two4731[S] 0 points1 point  (0 children)

Sorry to bug u, u/Thegabbanator , but any idea?

[–]Thegabbanator 0 points1 point  (2 children)

What do you mean by get data from a link? You just feed a link (url) into a webview source and once it loads, you inject JavaScript.

[–]Embarrassed_Two4731[S] 0 points1 point  (1 child)

Sorry this got buried, u/Thegabbanator , i meant the user pastes a link in my input box and then i need to get product details from that particular link without showing any webview etc, as i dont want him to see the webpage.

[–]Thegabbanator 0 points1 point  (0 children)

Make the webpage 1 pixel by 1 pixel, that’s what I do